The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Help me with parsing this file eamani_sun Shell Programming and Scripting 2 05-16-2008 12:39 PM
Parsing xml file using Sed kapilkinha UNIX for Advanced & Expert Users 3 04-08-2008 06:43 AM
Parsing a csv file chiru_h Shell Programming and Scripting 6 02-12-2008 05:33 AM
File Parsing jsusheel Shell Programming and Scripting 5 09-25-2007 07:25 AM
parsing file through awk bbeugie Shell Programming and Scripting 13 08-22-2006 10:21 AM

Reply
 
Submit Tools LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-05-2008
Registered User
 

Join Date: Oct 2007
Posts: 35
Stumble this Post!
awk and file parsing

Hi, I have a input file like this

TH2TH2867Y NOW33332106Yo You Baby
TH2TH3867Y NOW33332106No Way Out
TH2TH9867Y NOW33332106Can't find it
TJ2TJ2872N WOW33332017sure thing alas
TJ2TJ3872N WOW33332017the sky rocks
TJ2TJ4872N WOW33332017nothing else matters
TJ2TJ5872N WOW33332017you know about it
TJ2TJ6872N WOW33331999nothing else matters
TJ2TJ7872N WOW33332017nothing else matters
TJ2TJ8872N WOW33332017No Way Out
TJ2TAW872N WOW33331999No Way Out
TJAPXC050Y NOW33331999No Way Out.
TJAT1N999Y NOW33331999still loving you.
TJBJOG575Y NOW33331999Jacka nd jill.
TJBJXG575Y NOW33331999Julie and friend

I am trying to get the output something like this-

Yo You Baby|TH2
No Way Out|TJ2
still loving you.|TJA
You got it|TJB

Here..TH2,TJ2,TJA and TJB are the distinct first 3 characters from the input.
In the input , lets say fr=substr($0,1,3) and nx=substr($0,4,3).
Basically, i want to check the line if the first 3 character(fr) = the next 3 characters(nx), then print substr($0,23,20) and the substr($0,1,3)

If they dont match, then print the first occurance of the fr with the substr($0,23,20).

Help!

Regards,
Big Gun
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 05-05-2008
 

Join Date: Nov 2007
Location: 45.48-73.63
Posts: 549
Stumble this Post!
You are on the right track. I can give you the solution, however you sould find the way by yourself.

Using awk: if A=B > print C|A

Success
Reply With Quote
  #3 (permalink)  
Old 05-05-2008
Registered User
 

Join Date: Oct 2007
Posts: 35
Stumble this Post!
I did try this-

awk 'BEGIN{OFS="|"}{fr=substr($0,1,3);nx=substr($0,4,3); if (fr == nx) print substr($0,23,20),fr}' inputfile| nawk 'BEGIN{FS="|";OFS="|"}{ sub(/[ \t]*$/, "",$1);print $1,$2}'

But this will missed out to print lines when they dont match

in my above example -
TJAPXC050Y NOW33331999still loving you.
TJAT1N999Y NOW33331999still loving you.



I should be getting
NOW33331999still loving you.|TJA
Reply With Quote
  #4 (permalink)  
Old 05-06-2008
Registered User
 

Join Date: Oct 2007
Posts: 35
Stumble this Post!
How can I also include to print the below output
NOW33331999still loving you.|TJA
from the input
TJAPXC050Y NOW33331999still loving you.
TJAT1N999Y NOW33331999still loving you.

in which case fr is not equal to nx. so i would like to print the first occurance of the line.
Reply With Quote
  #5 (permalink)  
Old 05-06-2008
Registered User
 

Join Date: Oct 2007
Posts: 35
Stumble this Post!
Posting with rephrased problem statement.

Hi, I have a input file like this

TH2TH2867Y NOW33332106Yo You Baby
TH2TH3867Y NOW33332106No Way Out
TH2TH9867Y NOW33332106Can't find it
TJ2TJ2872N WOW33332017sure thing alas
TJ2TJ3872N WOW33332017the sky rocks
TJ2TJ4872N WOW33332017nothing else matters
TJ2TJ5872N WOW33332017you know about it
TJ2TJ6872N WOW33331999nothing else matters
TJ2TJ7872N WOW33332017nothing else matters
TJ2TJ8872N WOW33332017No Way Out
TJ2TAW872N WOW33331999No Way Out
TJAPXC050Y NOW33331999No Way Out.
TJAT1N999Y NOW33331999still loving you.
TJBJOG575Y NOW33331999Jacka nd jill.
TJBJXG575Y NOW33331999Julie and friend

I am trying to get the output something like this-

Yo You Baby|TH2
sure thing alas|TJ2
No Way Out.|TJA
Jacka nd jill|TJB

Here..TH2,TJ2,TJA and TJB are the distinct first 3 characters from the input.
In the input , lets say fr=substr($0,1,3) and nx=substr($0,4,3).
Basically, i want to check the line if the first 3 character(fr) = the next 3 characters(nx),
then print substr($0,23,20) and the substr($0,1,3)

If they dont match, then print the first occurance of the fr with its associated substr($0,23,20).


I started doing domething like this..
awk 'BEGIN{OFS="|"}{fr=substr($0,1,3);nx=substr($0,4,3); if (fr == nx) print substr($0,23,20),fr}' inputfile
| nawk 'BEGIN{FS="|";OFS="|"}{ sub(/[ \t]*$/, "",$1);print $1,$2}'

But this will missed out to print lines when fr and nx dont match

in my above example - fr doesn't match with fr..
TJAPXC050Y NOW33331999No Way Out.
TJAT1N999Y NOW33331999still loving you.
TJBJOG575Y NOW33331999Jacka nd jill.
TJBJXG575Y NOW33331999Julie and friend

But I would like to get the result as below too...( the first occurance of the fr and its substr )
No Way Out.|TJA
Jacka nd jill|TJB

Help!

Regards,
Big Gun
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools
Display Modes




All times are GMT -7. The time now is 11:52 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0