The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Concatenating records from 2 files Powcmptr UNIX for Dummies Questions & Answers 4 04-02-2008 12:57 PM
Multiline Grep tolmark UNIX for Dummies Questions & Answers 4 03-13-2008 09:31 PM
Count No of Records in File without counting Header and Trailer Records guiguy Shell Programming and Scripting 2 06-07-2007 09:15 AM
Help comparing 2 files to find deleted records eja UNIX for Dummies Questions & Answers 2 04-03-2007 05:53 AM
Best approach for a 10 min extract out of several log files with timestamped records Browser_ice UNIX for Dummies Questions & Answers 3 11-15-2005 02:49 PM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1  
Old 12-20-2007
RacerX's Avatar
Registered User
 

Join Date: Oct 2007
Posts: 34
Awk Compare Files w/Multiline Records

I'm trying to compare the first column values in two different files that use a numerical value as the key and output the more meaningful value found in the second column of file1 in front of the matching line(s) in file2. My problem is that file2 has multiple records. For example given:
FILE1
1:A A CABBS:B:G:1988:9:3:3:1:7,060
2:A A CARSON:B:M:1990:21:1:0:3:9,500
3:A A DUPONT:B:M:1978:21:3:2:4:13,500
4:AAHGH TINIAN:B:G:2001:31:5:5:4:10,100
5:AAHON YVES:B:G:1994:41:5:4:3:18,795

FILE2
1:98-02-20:LAX:40:SL
1:98-02-27:LAX:40:GD
1:98-03-8:LAX:36:SL
1:98-03-13:LAX:31:GD
1:98-03-27:LAX:60:FT
1:98-04-3:LAX:45:FT
2:98-05-29:LLG:71:FT
2:98-06-6:LLG:57:FT
2:98-06-12:LLG:71:FT
3:98-05-23:LLG:62:FT
3:98-06-6:LLG:55:FT
4:98-01-6:BOS:58:GD
5:98-01-5:CHI:58:FT
5:98-01-12:CHI:39:FT
5:98-01-19:CHI:30:GD
5:98-01-28:CHI:39:FT

Desired OUTPUT
A A CABBS:1:98-02-20:LAX:40:SL
A A CABBS:1:98-02-27:LAX:40:GD
A A CABBS:1:98-03-8:LAX:36:SL
A A CABBS:1:98-03-13:LAX:31:GD
A A CABBS:1:98-03-27:LAX:60:FT
A A CABBS:1:98-04-3:LAX:45:FT
A A CARSON:2:98-05-29:LLG:71:FT
A A CARSON:2:98-06-6:LLG:57:FT
A A CARSON:2:98-06-12:LLG:71:FT
A A DUPONT:3:98-05-23:LLG:62:FT
A A DUPONT:3:98-06-6:LLG:55:FT
AAHGH TINIAN:4:98-01-6:BOS:58:GD
AAHON YVES:5:98-01-5:CHI:58:FT
AAHON YVES:5:98-01-12:CHI:39:FT
AAHON YVES:5:98-01-19:CHI:30:GD
AAHON YVES:5:98-01-28:CHI:39:FT

I have come up with the following awk program:
Code:
BEGIN {
FS = OFS = ":";
while (getline < ARGV[1]) {
   field1 = $1;
   field2 = $2;
   while (getline < ARGV[2]) {
      if ($1==field1) {
         print field2, $0;
      }
	}
}
 
}
#awk -f ~/Desktop/alt.awk ~/Desktop/file1.txt ~/Desktop/file2.txt > ~/Desktop/Output.txt
However, it only returns what i want for the first record and is done. I know i'm missing something but don't know what: array or loop or both? Any suggestions or help would be appreciated as my real files have 39,000 records and i've been going nowhere with this database project for over a week.
Reply With Quote
Forum Sponsor
  #2  
Old 12-20-2007
Registered User
 

Join Date: Oct 2007
Location: USA
Posts: 570
Quote:
Originally Posted by RacerX View Post
I'm trying to compare the first column values in two different files that use a numerical value as the key and output the more meaningful value found in the second column of file1 in front of the matching line(s) in file2. My problem is that file2 has multiple records. For example given:
FILE1
1:A A CABBS:B:G:1988:9:3:3:1:7,060
2:A A CARSON:B:M:1990:21:1:0:3:9,500
3:A A DUPONT:B:M:1978:21:3:2:4:13,500
4:AAHGH TINIAN:B:G:2001:31:5:5:4:10,100
5:AAHON YVES:B:G:1994:41:5:4:3:18,795

FILE2
1:98-02-20:LAX:40:SL
1:98-02-27:LAX:40:GD
1:98-03-8:LAX:36:SL
1:98-03-13:LAX:31:GD
1:98-03-27:LAX:60:FT
1:98-04-3:LAX:45:FT
2:98-05-29:LLG:71:FT
2:98-06-6:LLG:57:FT
2:98-06-12:LLG:71:FT
3:98-05-23:LLG:62:FT
3:98-06-6:LLG:55:FT
4:98-01-6:BOS:58:GD
5:98-01-5:CHI:58:FT
5:98-01-12:CHI:39:FT
5:98-01-19:CHI:30:GD
5:98-01-28:CHI:39:FT

Desired OUTPUT
A A CABBS:1:98-02-20:LAX:40:SL
A A CABBS:1:98-02-27:LAX:40:GD
A A CABBS:1:98-03-8:LAX:36:SL
A A CABBS:1:98-03-13:LAX:31:GD
A A CABBS:1:98-03-27:LAX:60:FT
A A CABBS:1:98-04-3:LAX:45:FT
A A CARSON:2:98-05-29:LLG:71:FT
A A CARSON:2:98-06-6:LLG:57:FT
A A CARSON:2:98-06-12:LLG:71:FT
A A DUPONT:3:98-05-23:LLG:62:FT
A A DUPONT:3:98-06-6:LLG:55:FT
AAHGH TINIAN:4:98-01-6:BOS:58:GD
AAHON YVES:5:98-01-5:CHI:58:FT
AAHON YVES:5:98-01-12:CHI:39:FT
AAHON YVES:5:98-01-19:CHI:30:GD
AAHON YVES:5:98-01-28:CHI:39:FT

I have come up with the following awk program:
Code:
BEGIN {
FS = OFS = ":";
while (getline < ARGV[1]) {
   field1 = $1;
   field2 = $2;
   while (getline < ARGV[2]) {
      if ($1==field1) {
         print field2, $0;
      }
	}
}
 
}
#awk -f ~/Desktop/alt.awk ~/Desktop/file1.txt ~/Desktop/file2.txt > ~/Desktop/Output.txt
However, it only returns what i want for the first record and is done. I know i'm missing something but don't know what: array or loop or both? Any suggestions or help would be appreciated as my real files have 39,000 records and i've been going nowhere with this database project for over a week.
This looks like a job for join provided both FILE1 and FILE2 are sorted...

Code:
join -t":" -1 1 -2 1 -o 1.2 2.1 2.2 2.3 2.4 2.5 FILE1 FILE2
Reply With Quote
  #3  
Old 12-20-2007
vgersh99's Avatar
Moderator
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 3,029
nawk -f racer.awk FILE1 FILE2
racer.awk:
Code:
BEGIN {
   FS=OFS=":"
}
FNR==NR { arr[$1]=$2; next}
$1 in arr { print arr[$1], $0 }
Reply With Quote
  #4  
Old 12-20-2007
RacerX's Avatar
Registered User
 

Join Date: Oct 2007
Posts: 34
Thanks for the replies. I decided to give vgersh99's version a try, because i am more comfortable with the awk code and it worked to perfection on my files.

You guru's are great but always make me feel like such a buffoon . As it probably took you less than five minutes to solve it while i was banging my head on the wall for over a week.

Oh well, i guess we all have to learn at our own pace....Thanks again for the help!
Reply With Quote
  #5  
Old 12-21-2007
Registered User
 

Join Date: Jun 2007
Location: Beijing China
Posts: 495
awk

HI,

Just for your reference, this one should be ok for you.

code:
Code:
nawk 'BEGIN{
FS=":"
OFS=":"
}
{
if (NR==FNR)
	a[$1]=$2
else
{
	$1=a[$1]
	print $0
}
}' file1 file2
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 07:09 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0