How to identify varying unique fields values from a text file in UNIX?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to identify varying unique fields values from a text file in UNIX?
# 1  
Old 02-27-2017
How to identify varying unique fields values from a text file in UNIX?

Hi,

I have a huge unsorted text file. We wanted to identify the unique field values in a line and consider those fields as a primary key for a table in upstream system.

Basically, the process or script should fetch the values from each line that are unique compared to the rest of the lines in the file.

If there are 150 bytes in a line for a file that is containing around 100,000 lines and I wanted to find how many bytes on the line (150 bytes) can be formed as a primary key?

I know the file has to be sorted based on the entire 150 bytes and aftre that I am not sure how can I identify the uniqueness between lines?

Please help.

Thanks,
Mani A
# 2  
Old 02-27-2017
Can you supply some examples to better understand?
Not 150 byte lines and not 100k likes, but something to get an idea of your goal?

There are commands such as
sort -u
# 3  
Old 02-27-2017
For Ex.

My file contains the following lines:
Code:
ETL01InventoryBalances			SUCCESS
ETL02EvavivsStagingSalesOrders		SUCCESS
ETL03StagevsODSSalesOrder		        SUCCESS
ETL04EvavivsSalesOrderHeader History	SUCCESS
ETL05EvavivsSalesOrderLine History	SUCCESS
ETL07EvavivsStageRAs			        SUCCESS
ETL08StagevsODSRAs			        SUCCESS
ETL09StagetoODSIdentifierAttachments	SUCCESS
ETL10EvavitoStageWTs			        SUCCESS
ETL11StagevsODSShippingOrder		SUCCESS
ETL12StagevsODSShippingOrder Line	SUCCESS
ETL13StagevsODSShipments		        SUCCESS
ETL14StagevsODSShipmentLines		SUCCESS
ETL15StagevsODSPurchaseOrder		SUCCESS
ETL16StagevsODSPurchaseOrder Lines	SUCCESS
ETL17StagevsODSInventoryTransactions	SUCCESS
ETL18StagevsODSOrders			SUCCESS
ETL19StagevsODSOrderLines		        SUCCESS
ETL20StagevsODSShippingOrder		SUCCESS
ETL21StagevsODSShippingOrder Lines	SUCCESS
ETL22ODS Duplicate Shipments		SUCCESS
ETL23Evavi vs Stage Sales Order Lines	SUCCESS
ETL24Evavi vs ODS Sales Order Lines	SUCCESS
ETL33Source to ODS Identifier AttachmentSUCCESS
SND01Serialized ODS Shipments vs SND	SUCCESS
SND02SND vs Serialized ODS Shipments	SUCCESS
SND03WMS DMR- ERR records in ViawareSUCCESS
SND04Evavi DMR - ERR records		SUCCESS
VIA01Viaware Cost Status		        SUCCESS

Here, I need to sort them and remove duplicate records (If any) and extract them to a separate file which will further be inserted into a table where just 2 columns and this file value will populate the primary key column of that table. Before going to the table, the extract file should contain only unique records in it.
Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules when displaying sample input, sample output, and code segments.


---------- Post updated at 03:22 PM ---------- Previous update was at 03:06 PM ----------

Lets assume that I need to compare the fields that are unique in first 150 bytes of each line in the file.

sort -u will give the unique records for the entire file. But I want this uniqueness to be checked around 150 bytes of the line in the file.

Last edited by Don Cragun; 02-27-2017 at 04:28 PM.. Reason: Add CODE tags.
# 4  
Old 02-27-2017
Given that the 1st five characters of every line in your sample input is unique, you aren't going to find any duplicates. And, the file is already in sorted order. Are you trying to compare a substring of the lines in your file instead of whole lines?
# 5  
Old 02-27-2017
In simple terms,

How will you identify the column which can act as a primary key (or unique) in a file?

---------- Post updated at 03:28 PM ---------- Previous update was at 03:26 PM ----------

Thanks Don. But the data that I showed is only less than 15 of the entire content of the file. My file is a fixed length file having 150 characters in a line. I need to identify a unique pattern in the record and call that pattern as a primary key
# 6  
Old 02-27-2017
looks like the first COLUMN is the key?
The exception is the munged record with no field separator:
Code:
ETL33Source to ODS Identifier AttachmentSUCCESS

Can this be somehow pre-processed/fixed?
Is it safe to assume that the SECOND field is always 'SUCCESS'?
# 7  
Old 02-27-2017
Thanks for the response. But not really. There is no field separator as is. I have edited the file just for readability.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print line if values in fields matches number and text

datafile: 2017-03-24 10:26:22.098566|5|'No Route for Sndr:RETEK RMS 00040 /ZZ Appl:PF Func:PD Txn:832 Group Cntr:None ISA CntlNr:None Ver:003050 '|'2'|'PFI'|'-'|'EAI_ED_DeleteAll'|'EAI_ED'|NULL|NULL|NULL|139050594|ActivityLog| 2017-03-27 02:50:02.028706|5|'No Route for... (7 Replies)
Discussion started by: SkySmart
7 Replies

2. UNIX for Dummies Questions & Answers

Unique values in a row sum the next column in UNIX

Hi would like to ask you guys any advise regarding my problem I have this kind of data file.txt 111111111,20 111111111,50 222222222,70 333333333,40 444444444,10 444444444,20 I need to get this file1.txt 111111111,70 222222222,70 333333333,40 444444444,30 using this code I can... (6 Replies)
Discussion started by: reks
6 Replies

3. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

4. Shell Programming and Scripting

Identify high values "ÿ" in a text file using Unix command

I have high values (such as ÿÿÿÿ) in a text file contained in an Unix AIX server. I need to identify all the records which are having these high values and also get the position/column number in the record structure if possible. Is there any Unix command by which this can be done to : 1.... (5 Replies)
Discussion started by: devina
5 Replies

5. Shell Programming and Scripting

Getting required fields from a text file in UNIX

My data is something like as shown below. Out of this i want the details of alarms (ex: 1947147711,1947147081......) and the fields( ex :sw=tacmwafabb9:shelf=1:slot=5-2:pport=2) Once i have these details separated, i want the count of these excluding the duplicates. What is the best possible way... (7 Replies)
Discussion started by: rdhanek
7 Replies

6. Shell Programming and Scripting

comparing 2 text files to get unique values??

Hi all, I have got a problem while comparing 2 text files and the result should contains the unique values(Non repeatable). For eg: file1.txt 1 2 3 4 file2.txt 2 3 So after comaping the above 2 files I should get only 1 and 4 as the output. Pls help me out. (7 Replies)
Discussion started by: smarty86
7 Replies

7. Shell Programming and Scripting

Getting Unique values in a file

Hi, I have a file like this: Some_String_Here 123 123 123 321 321 321 3432 3221 557 886 321 321 I would like to find only the unique values in the files and get the following output: Some_String_Here 123 321 3432 3221 557 886 I am trying to get this done using awk. Can someone please... (5 Replies)
Discussion started by: Legend986
5 Replies

8. Shell Programming and Scripting

Parse apart strings of comma separated data with varying number of fields

I have a situation where I am reading a text file line-by-line. Those lines of data contain comma separated fields of data. However, each line can vary in the number of fields it can contain. What I need to do is parse apart each line and write each field of data found (left to right) into a file.... (7 Replies)
Discussion started by: 2reperry
7 Replies

9. Shell Programming and Scripting

Extracting records with unique fields from a fixed width txt file

Greetings, I would like to extract records from a fixed width text file that have unique field elements. Data is structured like this: John A Smith NY Mary C Jones WA Adam J Clark PA Mary Jones WA Fieldname / start-end position Firstname 1-10... (8 Replies)
Discussion started by: sitney
8 Replies

10. Shell Programming and Scripting

Append tabs at the end of each line in NAWK -- varying fields

Hi, I need some help in knowing how I can append tabs at the end of each line... The data looks something like this: field1, field2, field3, field4 1 2 3 4 5 I have values in field1 and field 2 in the first row and I would like to append tab on field3 and field4 for the first row..and in... (6 Replies)
Discussion started by: madhunk
6 Replies
Login or Register to Ask a Question