The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Help me with parsing this file eamani_sun Shell Programming and Scripting 2 05-16-2008 12:39 PM
awk and file parsing devtakh Shell Programming and Scripting 4 05-06-2008 08:13 AM
Parsing xml file using Sed kapilkinha UNIX for Advanced & Expert Users 3 04-08-2008 06:43 AM
Parsing a csv file chiru_h Shell Programming and Scripting 6 02-12-2008 05:33 AM
parsing file through awk bbeugie Shell Programming and Scripting 13 08-22-2006 10:21 AM

Reply
 
Submit Tools LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 09-24-2007
Registered User
 

Join Date: Sep 2007
Posts: 3
Stumble this Post!
File Parsing

Hi All,
I have a couple of files ( ascii ) with the following data

File 1
#lport1:dc1:lport2:dc2 - All records were delimited by :
6300:ADEF12:6305:ATNE59
3411:EGFE31:3499:GDEF21
. . . .
. . . .
total of 55,000 Records

File 2
#seqno:lport1:id:dlc1:vid:lport2:nni:dc2:ci – All records delimited by :
60568:3411:98:EGFE31:965:3499:3799:GDEF21:432
. . . . . . . . .
. . . . . . . . .
total of 58,000 Records

I need to Compare lport1, dc1, lport2, dc2 values of file1 with lport1, dc1, lport2, dc2 values of file2 and if there is a match, I need to write the entire line in file2 to another file. I tried writing a Perl script under solaris 2.5.8 which took almost 6 hours to finish.
Could anyone of you help me in getting this task run pretty fast i.e, less than 15 minutes using awk/shell script..
Thanks in Advance.
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 09-24-2007
vgersh99's Avatar
Moderator
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 3,002
Stumble this Post!
Assuming:
Code:
File 2
#seqno:lport1:id:dlc1:vid:lport2:nni:dc2:ci – All records delimited by :
actually means:
Code:
File 2
#seqno:lport1:id:dc1:vid:lport2:nni:dc2:ci – All records delimited by :
nawk -f jsusheel.awk file1 file2

jsusheel.awk:
Code:
BEGIN {
   FS=OFS=":"
}
NR==FNR { f1[$1, $2, $3, $4]; next }
($2 SUBSEP $4 SUBSEP $6 SUBSEP $8) in f1
Reply With Quote
  #3 (permalink)  
Old 09-24-2007
Registered User
 

Join Date: Sep 2007
Posts: 3
Stumble this Post!
File Parsing

Hi Vgersh99,
thanks for the reply. Yes your assumption is correct. It should be dc1 instead of dlc1. Sorry for the typo error.
When i executed the awk script there was no matching output. The body starting with NR==FNR works perfect by reading all the input records from the file1. I just verified using print $0
However i do not have any clue wrt the line ($2 SUBSEP $4 SUBSEP $6 SUBSEP $8 ) in f1. Could you please help me in deciphering this line as i am not much comfortable to awk.
Also please note that a record in file1 will not match a record in file2 on a one to one basis i.e.,the first record in file1 may match 100th record in file2 and the second record in file1 may match 40123th record in file2.
Again i thank you for sparing your time...
Reply With Quote
  #4 (permalink)  
Old 09-24-2007
Registered User
 

Join Date: Jun 2007
Posts: 377
Stumble this Post!
A easy but not reasonable one

Hi,
I have an idea about your reqs, but it maybe very slow when the file contains too much records.
Just for your reference.

Input:
Code:
first.txt:
1:a:2:b
3:c:4:d
5:e:6:f
7:g:8:h

second.txt:
60568:1:98:a:965:2:3799:b:432
60568:1:98:f:965:2:3799:b:432
60568:3:98:c:965:4:3799:d:432
60568:3:98:c:965:4:3799:w:432
60568:5:98:e:965:6:3799:f:432
Output:
Code:
60568:1:98:a:965:2:3799:b:432
60568:3:98:c:965:4:3799:d:432
60568:5:98:e:965:6:3799:f:432
Code:

Code:
awk 'BEGIN{FS=":"}
{
if (NF<=4)
pre[NR]=$0
else
{
a=sprintf("%s:%s:%s:%s",$2,$4,$6,$8)
for (i in pre)
if (pre[i]==a)
print $0
}
}' first.txt second.txt
Reply With Quote
  #5 (permalink)  
Old 09-25-2007
vgersh99's Avatar
Moderator
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 3,002
Stumble this Post!
f1:
Code:
6300:ADEF12:6305:ATNE59
3411:EGFE31:3499:GDEF21
f2:
Code:
60568:3411:98:EGFE31:965:3499:3799:GDEF21:432
60568:3422:98:EGFE31:965:3499:3799:GDEF21:432
produces:
Code:
60568:3411:98:EGFE31:965:3499:3799:GDEF21:432
Looks good to me given your original description of the fields and the matching criteria.

The '($2 SUBSEP $4 SUBSEP $6 SUBSEP $8 )' is the field matching key for file2 - fields 2,4,6 and 8 'concatenated' from file2 records/line represent a matching key to be used to look up in the associative array 'f1'.
Reply With Quote
  #6 (permalink)  
Old 09-25-2007
Registered User
 

Join Date: Sep 2007
Posts: 3
Stumble this Post!
File Parsing

Hi,
Many thanks to Summer_cherry and vgresh99 for the responses.
Again these scripts consume lot of cpu utilization and takes longer
to complete. I have desided to run these scripts by midnight.
thanks a lot ...
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools
Display Modes




All times are GMT -7. The time now is 11:54 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0