The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
grep -f file1 file2 vijay_0209 Shell Programming and Scripting 7 03-05-2009 05:48 AM
cat file1 read line-per-line then grep -A 15 lines down in fileb irongeekio Shell Programming and Scripting 6 01-28-2009 06:30 AM
Awk Compare File1 File2 on f2 RacerX Shell Programming and Scripting 4 10-27-2008 09:50 AM
insert file2 after line containing patternX in file1 repudi8or Shell Programming and Scripting 5 04-18-2008 01:35 PM
Awk Compare f1,f2,f3 of File1 with f1 of File2 RacerX Shell Programming and Scripting 6 11-09-2007 01:34 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 03-13-2009
gillesc_mac gillesc_mac is offline
Registered User
  
 

Join Date: Mar 2009
Posts: 5
Compare multiple fields in file1 to file2 and print line and next line

Hello,

I have two files that I need to compare and print out the line from file2 that has the first 6 fields matching the first 6 fields in file1. Complicating this are the following restrictions

1. file1 is only a few thousand lines at most and file2 is greater than 2 million
2. I need to match the first 6 fields (in order) of each line in file1 to the first 6 fields (in order) in a line in file2 and print the matched line from file2 along with the next line in file2.

Example files

file1:

...
0.54 3.2 0.45 32.9 4 0.02 9.0 4.0 (line 364)
0.6 4.0 3.99 2.0 0.85 7.0 3.84 0.05 (line 365)
...

file2:

93 28 04 73 95 11 0.4 7.9 2.30 4.05 (100(f18.3)) (line 30046)
70.1 99.4 0.35 9.943 6.1 0.27 0.654 (line 30047)
0.54 3.2 0.45 32.9 4 0.02 9.0 4.0 (54(f18.3) (line 628450)
44.8 33.2 90.3 45.2 66.3 (line 628451)

Needed result matches line 364 from file1 to line 628450 from file2 and prints lines 628450 and 628451, then goes to line 365 of file1 and searches file2 for a match to print matching first line and necessary second line from file2

Example partial output matching file1 with file2

0.54 3.2 0.45 32.9 4 0.02 9.0 4.0 (54(f18.3)
44.8 33.2 90.3 45.2 66.3

I don't really care what I use, awk, sed, perl, etc. I just need it to work.

Hopefully this make sense.

Thanks

Chris
  #2 (permalink)  
Old 03-13-2009
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,361
Quote:
Originally Posted by gillesc_mac View Post
I have two files that I need to compare and print out the line from file2 that has the first 6 fields matching the first 6 fields in file1. Complicating this are the following restrictions

1. file1 is only a few thousand lines at most and file2 is greater than 2 million
2. I need to match the first 6 fields (in order) of each line in file1 to the first 6 fields (in order) in a line in file2 and print the matched line from file2 along with the next line in file2.

You really need GNU grep for this.

Put the fields you want to search for from file1 in another file, and use the -f and -A options to grep:

Code:
cut -d ' ' -c1-6 > file3
grep -f file3 -A1 file2
  #3 (permalink)  
Old 03-13-2009
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
something along these lines.

nawk -f gil.awk file1 file2

gil.awk:
Code:
function buildIDX(   i, idx) {
    for(i=1; i<=6;i++) idx=(i==1) ? $i : idx SUBSEP $i
    return idx
}
FNR==NR {
    f1[buildIDX()]
    next
}
found && found--
{
   if (buildIDX() in f1) {
      print
      found=1
   }
}

Last edited by vgersh99; 03-13-2009 at 04:13 PM..
  #4 (permalink)  
Old 03-13-2009
gillesc_mac gillesc_mac is offline
Registered User
  
 

Join Date: Mar 2009
Posts: 5
Thank you, that was helpful...

Now I have another somewhat similar scenario

I have file1 with a field 8 that I need to match to field 1 in file2 and print the file2 line along with the next line in file2, so I was thinking of generating a file that contained the matched file2 line then doing the grep recommendation above to get both lines from file2.

I am unsure how to compare different fields in different files (note these are floating point numbers not necessarily the same string values but same numerical values, i.e. 8.54 for file1 and 8.54000 for file2)

Thanks again
  #5 (permalink)  
Old 03-13-2009
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
Quote:
Originally Posted by gillesc_mac View Post
Thank you, that was helpful...

Now I have another somewhat similar scenario

I have file1 with a field 8 that I need to match to field 1 in file2 and print the file2 line along with the next line in file2, so I was thinking of generating a file that contained the matched file2 line then doing the grep recommendation above to get both lines from file2.

I am unsure how to compare different fields in different files (note these are floating point numbers not necessarily the same string values but same numerical values, i.e. 8.54 for file1 and 8.54000 for file2)

Thanks again
Assuming the floating point precision is 2 - not tested:
Code:
nawk 'FNR==NR { f1[$8]; next } sprintf("%.2f", $1) in f1' file1 file2
  #6 (permalink)  
Old 03-13-2009
gillesc_mac gillesc_mac is offline
Registered User
  
 

Join Date: Mar 2009
Posts: 5
Thank you again, but I neglected to remember another restriction. I need to match multiple fields for example

File1 File2
$9 = $1
$1 = $3
$2 = $4
$3 = $5
$4 = $6
$5 = $7
$6 = $8
$7 = $9

But again each field is not necessarily the same precision. I tried adding additions to your script but I am just beginning to learn.

Thank you
  #7 (permalink)  
Old 03-13-2009
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
how many requirements DO you have?
Not tested.
Code:
BEGIN {
   fld1="9 1 2 3 4 5 6 7"
   fld1="1 3 4 5 6 7 8 9"

   split(fld1, fld1A)
   split(fld2, fld2A)
}
function buildIDX(fldA,   i, idx) {
    for(i=1; i in fldA ;i++) idx=(i==1) ? sprintf("%.2f",$i) : idx SUBSEP sprintf("%.2f",$i)
    return idx
}
FNR==NR {
    f1[buildIDX(fld1A)]
    next
}
found && found--

{
   if (buildIDX(fld2A) in f1) {
      print
      found=1
   }
}

Last edited by vgersh99; 03-13-2009 at 06:22 PM..
Sponsored Links
Closed Thread

Bookmarks

« awk | use of \rm -f »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 05:05 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0