awk NR==FNR output control

06-19-2011

Registered User

12, 0

Join Date: Mar 2010

Last Activity: 26 September 2012, 2:30 PM EDT

Location: Cambridge, MA

Posts: 12

Thanks Given: 10

Thanked 0 Times in 0 Posts

awk NR==FNR output control

Hi Guys,

I have two files:

f1:

A B C D E F G H

f2:

A X Y Z

f1 has 48000 lines, and f2 has 68. I have been matching f1 $3 to f2 $1, and getting f3:

A A B C D E F G

I would like f3 too look like this:

A X Y Z A B C D E F G

basically I want all of the fields for f2 to appear in the output as well.

Here's one of the things I've tried:

Code:

awk 'NR==FNR{a[NR]=$1;s=NR;next}{for(i=1;i<=s;i++) if(a[i]==$3){print a[i] "\t" $1,$2,$3}}' f1 f2 > f3

I've also tried matching f2 $1 to f1 $3 using the above. The problem is that, while I get all of the f2 fields, my awk command does not preserve the row order of f1, and I could not come up with a way to do that, so at the moment controlling the output of a f1 to f2 comparison seems to be the easiest approach.

Thanks for your help, I am certainly grateful.
Robert

heecha

View Public Profile for heecha

Find all posts by heecha

06-19-2011

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

The short answer is to process f2 first, then process f1. This will reduce your memory footprint as you'll only save 68 things in a[] rather than 48K things.

The long answer is to be a bit more clever which might also help speed things up. Your programme will loop through the entire contents of file f1 for each record in f2 (48,000 * 68) testing to see if there's a match. Instead, use the hash capabilities of awk to your advantage.

This example assumes that the 'key' (field 1 in file 2) can occur multiple times and so we must do a bit of looping for each f1 record, but the only looping needed when reading limited to the number of duplicate 'keys' that existed in f2 for the current f1 record. If f2 will not have duplicates, then the code can be simplified more, but not knowing you exact data, this general case will work for either. We also don't need to make an explicit check to see if the key in the current record matches the one saved from f2.

Code:

awk -v f2=f2 '
    BEGIN {
        while( (getline<f2) > 0 )   # read and collect records from f2
        {
            key = $1;
            ki = kidx[key]++;        # track number of duplicate keys (0 based)
            k2rec[key,ki] = $0;      # save unique record by key and dup count
        }
        close( f2 );
    }

    {
        key = $3;
        for( i = 0; i < kidx[key]; i++ )          # for each duplicate of key
            printf( "%s\t%s\n", k2rec[key,i], $0 );   # print f2 record, followed by current f1 record
    }
' <f1 >f3

Hope this makes sense.

Last edited by agama; 06-19-2011 at 12:07 PM.. Reason: Corrected printf to output f2 then f1

This User Gave Thanks to agama For This Post:

agama

View Public Profile for agama

Find all posts by agama

06-19-2011

Registered User

12, 0

Join Date: Mar 2010

Last Activity: 26 September 2012, 2:30 PM EDT

Location: Cambridge, MA

Posts: 12

Thanks Given: 10

Thanked 0 Times in 0 Posts

Thanks agama, your approach makes perfect sense. I appreciate your time and your efforts on my behalf.

Robert

---------- Post updated at 11:58 AM ---------- Previous update was at 11:11 AM ----------

Worked perfectly, thanks again for your time.

Robert

heecha

View Public Profile for heecha

Find all posts by heecha

UNIX for Dummies Questions & Answers

awk NR==FNR output control

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk: Assigning a variable to be the value of FNR at a certain line

Discussion started by: ThomasP

2. Shell Programming and Scripting

Explanation of FNR in this awk script

Discussion started by: yifangt

3. Shell Programming and Scripting

awk --> selective printout with FNR

Discussion started by: IMPe

4. Shell Programming and Scripting

Tip: alternative for NR==FNR in awk

Discussion started by: MadeInGermany

5. Shell Programming and Scripting

How to control a null output in EMC storage?

Discussion started by: prodigy06

6. Shell Programming and Scripting

Awk FNR==NR question

Discussion started by: Jazmania

7. UNIX for Dummies Questions & Answers

Multiple Column print after lookup using NR==FNR (awk)

Discussion started by: genehunter

8. Shell Programming and Scripting

error "awk: (FILENAME=- FNR=23) fatal: division by zero attempted"

Discussion started by: justbow

9. Shell Programming and Scripting

awk NR==FNR compare 2 files produce a 3rd

Discussion started by: borderblaster

10. Shell Programming and Scripting

Awk: different between NR and FNR

Discussion started by: anhtt