How to identify varying unique fields values from a text file in UNIX?

02-27-2017

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

Quote:

Originally Posted by manikandan23

Thanks for the response. But not really. There is no field separator as is. I have edited the file just for readability.

Could we see the original version (for completeness sake)?

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

02-27-2017

Registered User

6, 0

Join Date: Feb 2017

Last Activity: 28 February 2017, 9:18 AM EST

Posts: 6

Thanks Given: 3

Thanked 0 Times in 0 Posts

I have populated only around few records from the file as below.

Please assume first 150 characters in the line may have primary keys.

Code:

ETL01InventoryBalances            SUCCESS
ETL02EvavivsStagingSalesOrders        SUCCESS
ETL03StagevsODSSalesOrder        SUCCESS
ETL04EvavivsSalesOrderHeader History    SUCCESS
ETL05EvavivsSalesOrderLine History    SUCCESS
ETL07EvavivsStageRAs            SUCCESS
ETL08StagevsODSRAs            SUCCESS
ETL09StagetoODSIdentifierAttachments    SUCCESS
ETL10EvavitoStageWTs            SUCCESS
ETL11StagevsODSShippingOrder        SUCCESS
ETL12StagevsODSShippingOrder Line    SUCCESS
ETL13StagevsODSShipments        SUCCESS
ETL14StagevsODSShipmentLines        SUCCESS
ETL15StagevsODSPurchaseOrder        SUCCESS
ETL16StagevsODSPurchaseOrder Lines    SUCCESS
ETL17StagevsODSInventoryTransactions    SUCCESS
ETL18StagevsODSOrders            SUCCESS
ETL19StagevsODSOrderLines        SUCCESS
ETL20StagevsODSShippingOrder        SUCCESS
ETL21StagevsODSShippingOrder Lines    SUCCESS
ETL22ODS Duplicate Shipments        SUCCESS
ETL23Evavi vs Stage Sales Order Lines    SUCCESS
ETL24Evavi vs ODS Sales Order Lines    SUCCESS
ETL33Source to ODS Identifier AttachmentSUCCESS
SND01Serialized ODS Shipments vs SND    SUCCESS
SND02SND vs Serialized ODS Shipments    SUCCESS
SND03WMS DMR- ERR records in Viaware    SUCCESS
SND04Evavi DMR - ERR records        SUCCESS
VIA01Viaware Cost Status        SUCCESS
ETL01InventoryBalances            SUCCESS
ETL02EvavivsStagingSalesOrdersplan      SUCCESS
ETL03StagevsODSSalesOrder        UNKNOWN
ETL04EvavivsSalesOrderHeader History    UNKNOWN
ETL05EvavivsSalesOrderLine History    UNKNOWN
ETL07EvavivsStageRAs            UNKNOWN
ETL08StagevsODSRAs            UNKNOWN
ETL09StagetoODSIdentifierAttachments    UNKNOWN
ETL10EvavitoStageWTs12            UNKNOWN
ETL21StagevsODSShippingOrder        FAILURE
ETL212StagevsODSShippingOrder Line    FAILURE
ETL23StagevsODSShipments        FAILURE
ETL24StagevsODSShipmentLines        FAILURE
ETL25StagevsODSPurchaseOrder        FAILURE
ETL76StagevsODSPurchaseOrder Lines    FAILURE
ETL77StagevsODSInventoryTransactions    FAILURE
ETL78StagevsODSOrders            FAILURE
ETL59StagevsODSOrderLines        FAILURE
ETL60StagevsODSShippingOrder        FAILURE
ETL71StagevsODSShippingOrder Lines    CHECKIN
ETL82ODS Duplicate Shipments        CHECKIN
ETL93Evavi vs Stage Sales Order Lines    CHECKIN
ETL04Evavi vs ODS Sales Order Lines    CHECKIN
ETL33Source to ODS Identifier AttachmentCHECKIN
SN005Serialized ODS Shipments vs SND    CHECKIN
SN5D2SND vs Serialized ODS Shipments    CHECKIN
SND43WMS DMR- ERR records in Viaware    CHECKIN
SND44Evavi DMR - ERR records        UNKNOWN
EVIA01Viaware Cost Status        UNKNOWN

Last edited by vgersh99; 02-27-2017 at 05:13 PM.. Reason: code tags, please!

manikandan23

View Public Profile for manikandan23

Find all posts by manikandan23

02-27-2017

Registered User

1,709, 666

Join Date: Jan 2013

Last Activity: 20 May 2020, 1:43 PM EDT

Location: Loughborough

Posts: 1,709

Thanks Given: 838

Thanked 666 Times in 467 Posts

Hi manikandan23...

In post #1 you quote that your line length is 150 bytes and in post #5 it has changed to 150 characters.
Are some of these characters Unicode or pure ASCII from whitespace to '~', (tilde), perhaps including tabs?
If Unicode then the line lengths assuming your 150 characters will be greater than 150 bytes because there might be several non-ASCII characters, resulting in binary lines.
We are making an assumption that your file(s) contain pure ASCII but a snapshot of one of your files would help, put inside CODE tags as this preserves pure text mode viewing.

This User Gave Thanks to wisecracker For This Post:

wisecracker

View Public Profile for wisecracker

Find all posts by wisecracker

02-27-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

You have told us that the whole 150 character fixed length line is a key. You have said you need to identify a unique patter to act as a primary key. You have said that you need to identify the column which can act as a unique in a file. ... ... ...

I am very confused.

None of the lines you showed us have fixed length records. None of the lines you have shown us are 150 characters long. None of the lines you have shown us are 150 print columns wide. Two of the lines you have shown us are identical if you ignore the 1st five characters on each line. (And the command: sort -u -k1.6 file will easily get rid of that duplicated line while resorting the lines you have shown us ignoring the 1st five characters on each line.) Do you not know the format of the data you are processing?

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

02-27-2017

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

making some assumptions here....
Will something like this be helpful?
awk -f mani.awk myFile where mani.awk is:

Code:

BEGIN {
  tab=sprintf("\t")
}

function trim(str)
{
    sub("^([ ]*|" tab "*)", "", str);
    sub("([ ]*|" tab "*)" "$", "", str);
    return str;
}
{
  match($0, "[A-Z][A-Z]+$")
  print trim(substr($0,1,RSTART-1))
}

This User Gave Thanks to vgersh99 For This Post:

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

02-27-2017

Registered User

6, 0

Join Date: Feb 2017

Last Activity: 28 February 2017, 9:18 AM EST

Posts: 6

Thanks Given: 3

Thanked 0 Times in 0 Posts

Thank you so much everyone. I am really sorry for the confusion.

The file contains only ASCII and the first 150 characters (please assume this number for the sake of understanding and to make it clear) are considered to be meant for a primary key to a upstream table.

So when I parse this file, Lets say, in the input I got around 500,000 lines and the first 150 characters from those 500,000 lines could be repeating or entirely unique.

When I output my primary key file, it will be inserted into the table directly. This process should run without any exception of having a unique constraint violation or anything.

I hope it is clear now.
Again, am very sorry for all the miscommunication.

Thanks,
Mani A

manikandan23

View Public Profile for manikandan23

Find all posts by manikandan23

02-27-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by manikandan23

OK. So, sort -u (as suggested in post #2) should do exactly what you want. You said in post #3 that sort -u would not work, but your reasoning was not clear.

So, is there some reason why sort -u will not solve your problem?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

How to identify varying unique fields values from a text file in UNIX?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print line if values in fields matches number and text

Discussion started by: SkySmart

2. UNIX for Dummies Questions & Answers

Unique values in a row sum the next column in UNIX

Discussion started by: reks

3. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Discussion started by: nashton

4. Shell Programming and Scripting

Identify high values "�" in a text file using Unix command

Discussion started by: devina

5. Shell Programming and Scripting

Getting required fields from a text file in UNIX

Discussion started by: rdhanek

6. Shell Programming and Scripting

comparing 2 text files to get unique values??

Discussion started by: smarty86

7. Shell Programming and Scripting

Getting Unique values in a file

Discussion started by: Legend986

8. Shell Programming and Scripting

Parse apart strings of comma separated data with varying number of fields

Discussion started by: 2reperry

9. Shell Programming and Scripting

Extracting records with unique fields from a fixed width txt file

Discussion started by: sitney

10. Shell Programming and Scripting

Append tabs at the end of each line in NAWK -- varying fields

Discussion started by: madhunk