How to identify varying unique fields values from a text file in UNIX?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to identify varying unique fields values from a text file in UNIX?
# 8  
Old 02-27-2017
Quote:
Originally Posted by manikandan23
Thanks for the response. But not really. There is no field separator as is. I have edited the file just for readability.
Could we see the original version (for completeness sake)?
# 9  
Old 02-27-2017
I have populated only around few records from the file as below.

Please assume first 150 characters in the line may have primary keys.

Code:
ETL01InventoryBalances            SUCCESS
ETL02EvavivsStagingSalesOrders        SUCCESS
ETL03StagevsODSSalesOrder        SUCCESS
ETL04EvavivsSalesOrderHeader History    SUCCESS
ETL05EvavivsSalesOrderLine History    SUCCESS
ETL07EvavivsStageRAs            SUCCESS
ETL08StagevsODSRAs            SUCCESS
ETL09StagetoODSIdentifierAttachments    SUCCESS
ETL10EvavitoStageWTs            SUCCESS
ETL11StagevsODSShippingOrder        SUCCESS
ETL12StagevsODSShippingOrder Line    SUCCESS
ETL13StagevsODSShipments        SUCCESS
ETL14StagevsODSShipmentLines        SUCCESS
ETL15StagevsODSPurchaseOrder        SUCCESS
ETL16StagevsODSPurchaseOrder Lines    SUCCESS
ETL17StagevsODSInventoryTransactions    SUCCESS
ETL18StagevsODSOrders            SUCCESS
ETL19StagevsODSOrderLines        SUCCESS
ETL20StagevsODSShippingOrder        SUCCESS
ETL21StagevsODSShippingOrder Lines    SUCCESS
ETL22ODS Duplicate Shipments        SUCCESS
ETL23Evavi vs Stage Sales Order Lines    SUCCESS
ETL24Evavi vs ODS Sales Order Lines    SUCCESS
ETL33Source to ODS Identifier AttachmentSUCCESS
SND01Serialized ODS Shipments vs SND    SUCCESS
SND02SND vs Serialized ODS Shipments    SUCCESS
SND03WMS DMR- ERR records in Viaware    SUCCESS
SND04Evavi DMR - ERR records        SUCCESS
VIA01Viaware Cost Status        SUCCESS
ETL01InventoryBalances            SUCCESS
ETL02EvavivsStagingSalesOrdersplan      SUCCESS
ETL03StagevsODSSalesOrder        UNKNOWN
ETL04EvavivsSalesOrderHeader History    UNKNOWN
ETL05EvavivsSalesOrderLine History    UNKNOWN
ETL07EvavivsStageRAs            UNKNOWN
ETL08StagevsODSRAs            UNKNOWN
ETL09StagetoODSIdentifierAttachments    UNKNOWN
ETL10EvavitoStageWTs12            UNKNOWN
ETL21StagevsODSShippingOrder        FAILURE
ETL212StagevsODSShippingOrder Line    FAILURE
ETL23StagevsODSShipments        FAILURE
ETL24StagevsODSShipmentLines        FAILURE
ETL25StagevsODSPurchaseOrder        FAILURE
ETL76StagevsODSPurchaseOrder Lines    FAILURE
ETL77StagevsODSInventoryTransactions    FAILURE
ETL78StagevsODSOrders            FAILURE
ETL59StagevsODSOrderLines        FAILURE
ETL60StagevsODSShippingOrder        FAILURE
ETL71StagevsODSShippingOrder Lines    CHECKIN
ETL82ODS Duplicate Shipments        CHECKIN
ETL93Evavi vs Stage Sales Order Lines    CHECKIN
ETL04Evavi vs ODS Sales Order Lines    CHECKIN
ETL33Source to ODS Identifier AttachmentCHECKIN
SN005Serialized ODS Shipments vs SND    CHECKIN
SN5D2SND vs Serialized ODS Shipments    CHECKIN
SND43WMS DMR- ERR records in Viaware    CHECKIN
SND44Evavi DMR - ERR records        UNKNOWN
EVIA01Viaware Cost Status        UNKNOWN


Last edited by vgersh99; 02-27-2017 at 05:13 PM.. Reason: code tags, please!
# 10  
Old 02-27-2017
Hi manikandan23...

In post #1 you quote that your line length is 150 bytes and in post #5 it has changed to 150 characters.
Are some of these characters Unicode or pure ASCII from whitespace to '~', (tilde), perhaps including tabs?
If Unicode then the line lengths assuming your 150 characters will be greater than 150 bytes because there might be several non-ASCII characters, resulting in binary lines.
We are making an assumption that your file(s) contain pure ASCII but a snapshot of one of your files would help, put inside CODE tags as this preserves pure text mode viewing.
This User Gave Thanks to wisecracker For This Post:
# 11  
Old 02-27-2017
You have told us that the whole 150 character fixed length line is a key. You have said you need to identify a unique patter to act as a primary key. You have said that you need to identify the column which can act as a unique in a file. ... ... ...

I am very confused.

None of the lines you showed us have fixed length records. None of the lines you have shown us are 150 characters long. None of the lines you have shown us are 150 print columns wide. Two of the lines you have shown us are identical if you ignore the 1st five characters on each line. (And the command: sort -u -k1.6 file will easily get rid of that duplicated line while resorting the lines you have shown us ignoring the 1st five characters on each line.) Do you not know the format of the data you are processing?
This User Gave Thanks to Don Cragun For This Post:
# 12  
Old 02-27-2017
making some assumptions here....
Will something like this be helpful?
awk -f mani.awk myFile where mani.awk is:
Code:
BEGIN {
  tab=sprintf("\t")
}

function trim(str)
{
    sub("^([ ]*|" tab "*)", "", str);
    sub("([ ]*|" tab "*)" "$", "", str);
    return str;
}
{
  match($0, "[A-Z][A-Z]+$")
  print trim(substr($0,1,RSTART-1))
}

This User Gave Thanks to vgersh99 For This Post:
# 13  
Old 02-27-2017
Thank you so much everyone. I am really sorry for the confusion.

The file contains only ASCII and the first 150 characters (please assume this number for the sake of understanding and to make it clear) are considered to be meant for a primary key to a upstream table.

So when I parse this file, Lets say, in the input I got around 500,000 lines and the first 150 characters from those 500,000 lines could be repeating or entirely unique.

When I output my primary key file, it will be inserted into the table directly. This process should run without any exception of having a unique constraint violation or anything.

I hope it is clear now.
Again, am very sorry for all the miscommunication.

Thanks,
Mani A
# 14  
Old 02-27-2017
Quote:
Originally Posted by manikandan23
Thank you so much everyone. I am really sorry for the confusion.

The file contains only ASCII and the first 150 characters (please assume this number for the sake of understanding and to make it clear) are considered to be meant for a primary key to a upstream table.

So when I parse this file, Lets say, in the input I got around 500,000 lines and the first 150 characters from those 500,000 lines could be repeating or entirely unique.

When I output my primary key file, it will be inserted into the table directly. This process should run without any exception of having a unique constraint violation or anything.

I hope it is clear now.
Again, am very sorry for all the miscommunication.

Thanks,
Mani A
OK. So, sort -u (as suggested in post #2) should do exactly what you want. You said in post #3 that sort -u would not work, but your reasoning was not clear.

So, is there some reason why sort -u will not solve your problem?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print line if values in fields matches number and text

datafile: 2017-03-24 10:26:22.098566|5|'No Route for Sndr:RETEK RMS 00040 /ZZ Appl:PF Func:PD Txn:832 Group Cntr:None ISA CntlNr:None Ver:003050 '|'2'|'PFI'|'-'|'EAI_ED_DeleteAll'|'EAI_ED'|NULL|NULL|NULL|139050594|ActivityLog| 2017-03-27 02:50:02.028706|5|'No Route for... (7 Replies)
Discussion started by: SkySmart
7 Replies

2. UNIX for Dummies Questions & Answers

Unique values in a row sum the next column in UNIX

Hi would like to ask you guys any advise regarding my problem I have this kind of data file.txt 111111111,20 111111111,50 222222222,70 333333333,40 444444444,10 444444444,20 I need to get this file1.txt 111111111,70 222222222,70 333333333,40 444444444,30 using this code I can... (6 Replies)
Discussion started by: reks
6 Replies

3. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

4. Shell Programming and Scripting

Identify high values "ÿ" in a text file using Unix command

I have high values (such as ÿÿÿÿ) in a text file contained in an Unix AIX server. I need to identify all the records which are having these high values and also get the position/column number in the record structure if possible. Is there any Unix command by which this can be done to : 1.... (5 Replies)
Discussion started by: devina
5 Replies

5. Shell Programming and Scripting

Getting required fields from a text file in UNIX

My data is something like as shown below. Out of this i want the details of alarms (ex: 1947147711,1947147081......) and the fields( ex :sw=tacmwafabb9:shelf=1:slot=5-2:pport=2) Once i have these details separated, i want the count of these excluding the duplicates. What is the best possible way... (7 Replies)
Discussion started by: rdhanek
7 Replies

6. Shell Programming and Scripting

comparing 2 text files to get unique values??

Hi all, I have got a problem while comparing 2 text files and the result should contains the unique values(Non repeatable). For eg: file1.txt 1 2 3 4 file2.txt 2 3 So after comaping the above 2 files I should get only 1 and 4 as the output. Pls help me out. (7 Replies)
Discussion started by: smarty86
7 Replies

7. Shell Programming and Scripting

Getting Unique values in a file

Hi, I have a file like this: Some_String_Here 123 123 123 321 321 321 3432 3221 557 886 321 321 I would like to find only the unique values in the files and get the following output: Some_String_Here 123 321 3432 3221 557 886 I am trying to get this done using awk. Can someone please... (5 Replies)
Discussion started by: Legend986
5 Replies

8. Shell Programming and Scripting

Parse apart strings of comma separated data with varying number of fields

I have a situation where I am reading a text file line-by-line. Those lines of data contain comma separated fields of data. However, each line can vary in the number of fields it can contain. What I need to do is parse apart each line and write each field of data found (left to right) into a file.... (7 Replies)
Discussion started by: 2reperry
7 Replies

9. Shell Programming and Scripting

Extracting records with unique fields from a fixed width txt file

Greetings, I would like to extract records from a fixed width text file that have unique field elements. Data is structured like this: John A Smith NY Mary C Jones WA Adam J Clark PA Mary Jones WA Fieldname / start-end position Firstname 1-10... (8 Replies)
Discussion started by: sitney
8 Replies

10. Shell Programming and Scripting

Append tabs at the end of each line in NAWK -- varying fields

Hi, I need some help in knowing how I can append tabs at the end of each line... The data looks something like this: field1, field2, field3, field4 1 2 3 4 5 I have values in field1 and field 2 in the first row and I would like to append tab on field3 and field4 for the first row..and in... (6 Replies)
Discussion started by: madhunk
6 Replies
Login or Register to Ask a Question