Problem working with Pipe Delimited Text file

01-02-2009

Registered User

21, 0

Join Date: Oct 2008

Last Activity: 27 January 2010, 9:56 AM EST

Posts: 21

Thanks Given: 0

Thanked 0 Times in 0 Posts

Problem working with Pipe Delimited Text file

Hello all:
I have a following textfile data with name inst1.txt

HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ
DTL|H|5464-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|D|5464-1|1|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|2|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|3|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|4|02-02-2008|02-03-2008|1||JJJ
DTL|H|7032-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|D|7032-1|1|02-02-2008|02-03-2008|1|M|yyy
DTL|D|7032-1|2|02-02-2008|02-03-2008|1|M|yyy
DTL|D|7032-1|3|02-02-2008|02-03-2008|1|N|yyy
DTL|D|7032-1|4|02-02-2008|02-03-2008|1|N|yyy
DTL|H|9999-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|D|9999-1|1|02-02-2008|02-03-2008|1|N|zzz
DTL|D|9999-1|2|02-02-2008|02-03-2008|1|N|zzz
TRL|ABCD|10-13-2008 to 10-19-2008.Txt|10-19-2008|170|XYZ

Output Needed in a new file is:

HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ
DTL|H|5464-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|D|5464-1|1|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|2|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|3|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|4|02-02-2008|02-03-2008|1||JJJ
TRL|ABCD|10-13-2008 to 10-19-2008.Txt|10-19-2008|170|XYZ

Criteria: To check if the 8th column is NULL

In the original file if the 8th column is NULL then throw all the records including the File Header, File Tail and Record Header which are:

File Header: HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ
File Tail: TRL|ABCD|10-13-2008 to 10-19-2008.Txt|10-19-2008|170|XYZ

Record Header:
DTL|H|5464-1|0|02-02-2008|02-03-2008||||F|||||||||

Record Detail:
DTL|D|5464-1|1|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|2|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|3|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|4|02-02-2008|02-03-2008|1||JJJ

Record Header and Record Detail are distuingished by the 2nd column H - Header & D - Detail

Part of the solution:
nawk -F'|' '$8 == "" ' inst1.txt >null.txt

The above command checks for 8th column and throws all the records to a new file null.txt and the new file looks as:

HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ
DTL|H|5464-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|D|5464-1|1|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|2|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|3|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|4|02-02-2008|02-03-2008|1||JJJ
DTL|H|7032-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|H|9999-1|0|02-02-2008|02-03-2008||||F|||||||||
TRL|ABCD|10-13-2008 to 10-19-2008.Txt|10-19-2008|170|XYZ

The ones in Red are Record Headers corresponding to different Records which shud not appear (but they appear as 8th column is NULL for these too)

Any help/suggestion/advice would be greately appreciated.

thanks,
Ravi

ravi0435

View Public Profile for ravi0435

Find all posts by ravi0435

01-03-2009

Registered User

325, 2

Join Date: Nov 2007

Last Activity: 26 April 2020, 8:13 AM EDT

Posts: 325

Thanks Given: 0

Thanked 2 Times in 2 Posts

Code:

nawk -F'|' 'NR==1; $8=="" && $2=="D" && NR==n+1 {print s} $2=="H" {s=$0; n=NR}
                   $8=="" && $2=="D"; {end=$0} END {print end}'  inst1.txt > null.txt

Last edited by rubin; 01-03-2009 at 09:29 PM..

rubin

View Public Profile for rubin

Find all posts by rubin

01-03-2009

Registered User

21, 0

Join Date: Oct 2008

Last Activity: 27 January 2010, 9:56 AM EST

Posts: 21

Thanks Given: 0

Thanked 0 Times in 0 Posts

Helo rubin

Hello Rubin:

thanks..it worked partially....with the code you gave the following is the output:

HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ
DTL|H|5464-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|D|5464-1|1|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|2|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|3|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|4|02-02-2008|02-03-2008|1||JJJ
DTL|H|9999-1|0|02-02-2008|02-03-2008||||F|||||||||

Basically its missing the TAIL and instead has one record Header which is the header of the last Detail record.

thanks,
ravi

ravi0435

View Public Profile for ravi0435

Find all posts by ravi0435

01-03-2009

Registered User

325, 2

Join Date: Nov 2007

Last Activity: 26 April 2020, 8:13 AM EDT

Posts: 325

Thanks Given: 0

Thanked 2 Times in 2 Posts

My apologies,... my bad,

Code:

nawk -F'|' 'NR==1; $8=="" && $2=="D" && NR==n+1 {print s} $2=="H" {s=$0; n=NR}
                   $8=="" && $2=="D"; {end=$0} END {print end}'  inst1.txt > null.txt

Output from your sample:

Code:

HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ
DTL|H|5464-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|D|5464-1|1|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|2|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|3|02-02-2008|02-03-2008|1||JJJ
DTL|D|5464-1|4|02-02-2008|02-03-2008|1||JJJ
TRL|ABCD|10-13-2008 to 10-19-2008.Txt|10-19-2008|170|XYZ

Previous code also edited.

rubin

View Public Profile for rubin

Find all posts by rubin

01-04-2009

Registered User

21, 0

Join Date: Oct 2008

Last Activity: 27 January 2010, 9:56 AM EST

Posts: 21

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks Rubin that works..another problem

Thanks rubin really appreciate that...the command you sent worked for one set of files, I just noticed that there is another text file with similar data just a small difference i was trying to play around making some changes for the code you sent but i am stuck at a point(earlier D was constant for all Detail records but now all the detail records are numbered 1,2,3,4,5 i was using $2=="[1-20]+" but it doesnt work)...my apology that i didnt notice there were two different kind of files, the new file data:

Instead of 'H' its '0' and instead of 'D' its 1,2,3,4,5....
(1,2,3,4,5,...depending on how many dependents that parent record-'0' will have)

Same criteria need to check if 8th column is NULL.

HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ
DTL|0|5464-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|1|5464-1|1|02-02-2008|02-03-2008|1||JJJ
DTL|2|5464-1|2|02-02-2008|02-03-2008|1||JJJ
DTL|3|5464-1|3|02-02-2008|02-03-2008|1||JJJ
DTL|4|5464-1|4|02-02-2008|02-03-2008|1||JJJ
DTL|0|7032-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|1|7032-1|1|02-02-2008|02-03-2008|1|M|yyy
DTL|2|7032-1|2|02-02-2008|02-03-2008|1|M|yyy
DTL|3|7032-1|3|02-02-2008|02-03-2008|1|N|yyy
DTL|4|7032-1|4|02-02-2008|02-03-2008|1|N|yyy
DTL|0|9999-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|1|9999-1|1|02-02-2008|02-03-2008|1||zzz
DTL|2|9999-1|2|02-02-2008|02-03-2008|1||zzz
TRL|ABCD|10-13-2008 to 10-19-2008.Txt|10-19-2008|170|XYZ

Req - can you throw in couple of lines of explanation as i worked with it a lot but cudn't understand the following bolded ones in the code what its inteded to do.

nawk -F'|' 'NR==1; $32=="" && $2=="D" && NR==n+1 {print s} $2=="H" {s=$0; n=NR} $32=="" && $2=="D"; {end=$0} END {print end}' inst1.txt > null.txt

thanks,
Ravi

ravi0435

View Public Profile for ravi0435

Find all posts by ravi0435

01-04-2009

Registered User

325, 2

Join Date: Nov 2007

Last Activity: 26 April 2020, 8:13 AM EDT

Posts: 325

Thanks Given: 0

Thanked 2 Times in 2 Posts

You could do something like this,

Code:

nawk -F'|' 'NR==1; $8=="" && $2~/^[1-9]+/ && NR==n+1 {print s} $2==0 {s=$0; n=NR}
                   $8=="" && $2~/^[1-9]+/; {end=$0} END {print end}' input > output

Quote:

Originally Posted by ravi0435

...
Req - can you throw in couple of lines of explanation as i worked with it a lot but cudn't understand the following bolded ones in the code what its inteded to do.
....
thanks,
Ravi

$2=="H"{s=$0;n=NR} -> when a header is seen, store it (s=$0) and its record number (n=NR).

NR==n+1 -> If the next record right after its header (NR==n+1), satisfies the other two conditions ( $8=="" and $2="D" ), print the header s saved before.
The current record and the other needed ones will be printed later altogether ( $8=="" && $2=="D" ).

{end=$0} -> the variable end stores the current record, overwriting the previous one, so in the end ( END {...} ) the last record is printed.
With gawk you could simply do -> END{ print $0 }.

rubin

View Public Profile for rubin

Find all posts by rubin

01-05-2009

Registered User

21, 0

Join Date: Oct 2008

Last Activity: 27 January 2010, 9:56 AM EST

Posts: 21

Thanks Given: 0

Thanked 0 Times in 0 Posts

whats wrong in my code

Thanks Rubin ...really appreciate that.

I used this code as i dont know how far the Numbers further go(instead of ^[1-9]+ i said != "0" ):

Code:

nawk -F'|' 'NR==1; $8=="" && $2!="0" && NR==n+1 {print s} $2=="0" {s=$0; n=NR} 
                   $8=="" && $2!="0"; {end=$0} END {print end}'  input> output

I get O/P as follows( I am not pasting it twice but thats the O/P i got with my code a blank line after the 1st line - dont know why and last line repeating) :

HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ

HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ
DTL|0|5464-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|1|5464-1|1|02-02-2008|02-03-2008|1||JJJ
DTL|2|5464-1|2|02-02-2008|02-03-2008|1||JJJ
DTL|3|5464-1|3|02-02-2008|02-03-2008|1||JJJ
DTL|4|5464-1|4|02-02-2008|02-03-2008|1||JJJ
DTL|0|9999-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|1|9999-1|1|02-02-2008|02-03-2008|1||zzz
DTL|2|9999-1|2|02-02-2008|02-03-2008|1||zzz
TRL|ABCD|10-13-2008 to 10-19-2008.Txt|10-19-2008|170|XYZ
TRL|ABCD|10-13-2008 to 10-19-2008.Txt|10-19-2008|170|XYZ

O/P needed is:

HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ
DTL|0|5464-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|1|5464-1|1|02-02-2008|02-03-2008|1||JJJ
DTL|2|5464-1|2|02-02-2008|02-03-2008|1||JJJ
DTL|3|5464-1|3|02-02-2008|02-03-2008|1||JJJ
DTL|4|5464-1|4|02-02-2008|02-03-2008|1||JJJ
DTL|0|9999-1|0|02-02-2008|02-03-2008||||F|||||||||
DTL|1|9999-1|1|02-02-2008|02-03-2008|1||zzz
DTL|2|9999-1|2|02-02-2008|02-03-2008|1||zzz
TRL|ABCD|10-13-2008 to 10-19-2008.Txt|10-19-2008|170|XYZ

And also thanks for the explanation...with the explanation i played around by changing NR = 0,1,2 and placing the code back and forth but nothing worked....the O/P i obtained closer to the actual o/p is what i pasted above...whats wrong in my code..could you help correcting it ..thanks for all your time.

thanks ,
Ravi

ravi0435

View Public Profile for ravi0435

Find all posts by ravi0435

UNIX for Dummies Questions & Answers

Problem working with Pipe Delimited Text file

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Discussion started by: raja kakitapall

2. Shell Programming and Scripting

How to ignore Pipe in Pipe delimited file?

Discussion started by: rohit_shinez

3. Shell Programming and Scripting

Help with converting Pipe delimited file to Tab Delimited

Discussion started by: karumudi7

4. Shell Programming and Scripting

how to Insert values in multiple lines(records) within a pipe delimited text file in specific cols

Discussion started by: vasan2815

5. Shell Programming and Scripting

How to convert a space delimited file into a pipe delimited file using shellscript?

Discussion started by: nithins007

6. UNIX for Dummies Questions & Answers

Delete last value from pipe delimited file

Discussion started by: relentl3ss

7. Shell Programming and Scripting

convert a pipe delimited file to a':" delimited file

Discussion started by: priyanka3006

8. UNIX for Dummies Questions & Answers

Extracting from pipe delimited file.

Discussion started by: leepan2008

9. UNIX for Dummies Questions & Answers

Replacing a field in pipe delimited TEXT File

Discussion started by: ravi0435

10. Shell Programming and Scripting

How to split pipe delimited file

Discussion started by: njgirl