awk NR==FNR output control Post: 302532027

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers awk NR==FNR output control Post 302532027 by agama on Sunday 19th of June 2011 11:06:04 AM

06-19-2011

Registered User

The short answer is to process f2 first, then process f1. This will reduce your memory footprint as you'll only save 68 things in a[] rather than 48K things.

The long answer is to be a bit more clever which might also help speed things up. Your programme will loop through the entire contents of file f1 for each record in f2 (48,000 * 68) testing to see if there's a match. Instead, use the hash capabilities of awk to your advantage.

This example assumes that the 'key' (field 1 in file 2) can occur multiple times and so we must do a bit of looping for each f1 record, but the only looping needed when reading limited to the number of duplicate 'keys' that existed in f2 for the current f1 record. If f2 will not have duplicates, then the code can be simplified more, but not knowing you exact data, this general case will work for either. We also don't need to make an explicit check to see if the key in the current record matches the one saved from f2.

Code:

awk -v f2=f2 '
    BEGIN {
        while( (getline<f2) > 0 )   # read and collect records from f2
        {
            key = $1;
            ki = kidx[key]++;        # track number of duplicate keys (0 based)
            k2rec[key,ki] = $0;      # save unique record by key and dup count
        }
        close( f2 );
    }

    {
        key = $3;
        for( i = 0; i < kidx[key]; i++ )          # for each duplicate of key
            printf( "%s\t%s\n", k2rec[key,i], $0 );   # print f2 record, followed by current f1 record
    }
' <f1 >f3

Hope this makes sense.

Last edited by agama; 06-19-2011 at 12:07 PM.. Reason: Corrected printf to output f2 then f1

This User Gave Thanks to agama For This Post:

agama

View Public Profile for agama

Find all posts by agama

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk: different between NR and FNR

As I know: FNR: The ordinal number of the current record in the current file. NR: The ordinal number of the current record from the start of input. I don't understand really differency between NR and FNR. Who can explain it for me? And give me an example. Thanks

2. Shell Programming and Scripting

awk NR==FNR compare 2 files produce a 3rd

hi, i have two files, both with 3 columns, the 3rd column has common values between the two files and i want to produce a 3rd file with 4 columns. file 1 a, ,b c file 2 a, b ,d I want to compare the 3rd value and if a match print to file 3 with the 3 columns from the first file...

3. Shell Programming and Scripting

error "awk: (FILENAME=- FNR=23) fatal: division by zero attempted"

Hi , I have file : after i run this command : there are error can we print blank line if output error ?? thanks.. ^^

4. UNIX for Dummies Questions & Answers

Multiple Column print after lookup using NR==FNR (awk)

foo.txt FAMID IID AFF SEX Group AgeCat Dis1 Dis2 Dis3 Dis4 Dis5 Dis6 Dis6 AMD0001 Mayo_49542 1 2 AMD 8 1 1 1 1 1 1 1 AMD0002 Mayo_49606 1 1 AMD 3 1 1 1 1 ...

5. Shell Programming and Scripting

Awk FNR==NR question

awk -F'' 'FNR==NR {a=$2; next} {$1=a} 1' $useralias ${entries} >> ${entries}_2 Hi, Is there anyway to alter this command so that if it does not find a match it will just leave the line alone instead of replacing what it doesn't find with a blank space?

6. Shell Programming and Scripting

How to control a null output in EMC storage?

I dont want to print the output in a EMC VMAX storage if it says "The specified device was not found", however it is not letting me do it. I am trying to run this command: symaccess -sid xxxx list -type storage -devs 1234 output: The specified device was not found I just want the script...

7. Shell Programming and Scripting

Tip: alternative for NR==FNR in awk

Example: $ cat file1 2 3$ cat file2 1 2 3 4 5 6The following awk script works like a charm, NR==FNR is true for file1, the remainder runs for file2: awk ' NR==FNR {A; next} ($1 in A) ' file1 file2 2 3Now have an empty file1: >file1and run the awk script again. The result is empty...

8. Shell Programming and Scripting

awk --> selective printout with FNR

Hi everybody! need some awk-support. i want a line-selective printout of a file. wat i normally will do with ... awk ' FNR==8' sample.txt But now i need the data from line 8, 10 and the following data from line13 to 250 wich is not end of the file. I tried allready to combine it but without...

9. Shell Programming and Scripting

Explanation of FNR in this awk script

To merge mutiple *.tab files as: file1.tab rs1 A A rs2 A A rs3 C C rs4 C Cfile2.ind rs1 T T rs2 T T rs3 G G rs4 G Gand file3.tab rs1 B B rs2 B B rs3 L L rs4 L LOutput： file1.tab file2.tab file3.tab AA TT BB AA TT BB CC GG LL CC GG ...

10. Shell Programming and Scripting

Awk: Assigning a variable to be the value of FNR at a certain line

Sorry for the probably strangely worded title but I don't really know how else to put it. Background context: Post processing LAMMPS simulation data. tl;dr: I'm making two spheres collide, every defined timestep the simulation outputs a bunch of data including total energy of the particles,...

LEARN ABOUT REDHAT

amplot

AMPLOT(8)						      System Manager's Manual							 AMPLOT(8)

NAME

       amplot - visualize the behavior of Amanda

SYNOPSIS

       amplot [ -c ] [ -e ] [ -g ] [ -l ] [ -p ] [ -t T ] amdump_files

DESCRIPTION

       Amplot  reads  an  amdump  output file that Amanda generates each run (e.g.  amdump.1) and translates the information into a picture format
       that may be used to determine how your installation is doing and if any parameters need to be changed.  Amplot also prints out amdump lines
       that  it  either  does  not understand or knows to be warning or error lines and a summary of the start, end and total time for each backup
       image.

       Amplot is a shell script that executes an awk program (amplot.awk) to scan the amdump output file.  It  then  executes  a  gnuplot  program
       (amplot.g)  to  generate the graph.  The awk program is written in an enhanced version of awk, such as GNU awk (gawk version 2.15 or later)
       or nawk.

       During execution, amplot generates a few temporary files that gnuplot uses.  These files are deleted at the end of execution.

       See the amanda(8) man page for more details about Amanda.

OPTIONS

       -c     Compress amdump_files after plotting.

       -e     Extend the X (time) axis if needed.

       -g     Direct gnuplot output directly to the X11 display (default).

       -p     Direct postscript output to file YYYYMMDD.ps (opposite of -g).

       -l     Generate landscape oriented output.

       -t T   Set the right edge of the plot to be T hours.

       The amdump_files may be in various compressed formats (compress, gzip, pact, compact).

INTERPRETATION

       The figure is divided into a number of regions.	There are titles on the top that show important statistical information about the configu-
       ration  and  from  this execution of amdump.  In the figure, the X axis is time, with 0 being the moment amdump was started.  The Y axis is
       divided into 5 regions:

	      QUEUES: How many backups have not been started, how many are waiting on space in the holding disk and how many have been transferred
	      successfully to tape.

	      %BANDWIDTH: Percentage of allowed network bandwidth in use.

	      HOLDING DISK: The higher line depicts space allocated on the holding disk to backups in progress and completed backups waiting to be
	      written to tape.	The lower line depicts the fraction of the holding disk containing completed backups waiting to be written to tape
	      including the file currently being written to tape.  The scale is percentage of the holding disk.

	      TAPE: Tape drive usage.

	      %DUMPERS: Percentage of active dumpers.

       The idle period at the left of the graph is time amdump is asking the machines how much data they are going to dump.  This process can take
       a while if hosts are down or it takes them a long time to generate estimates.

AUTHOR

       Olafur Gudmundsson ogud@tis.com
       Trusted Information Systems
       formerly at University of Maryland, College Park

BUGS

       Reports lines it does not recognize, mainly error cases but some are legitimate lines the program needs to be taught about.

SEE ALSO

       amanda(8), amdump(8), gawk(1), nawk(1), awk(1), gnuplot(1), sh(1), compress(1), gzip(1)

4th Berkeley Distribution														 AMPLOT(8)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk: different between NR and FNR

Discussion started by: anhtt

2. Shell Programming and Scripting

awk NR==FNR compare 2 files produce a 3rd

Discussion started by: borderblaster

3. Shell Programming and Scripting

error "awk: (FILENAME=- FNR=23) fatal: division by zero attempted"

Discussion started by: justbow

4. UNIX for Dummies Questions & Answers

Multiple Column print after lookup using NR==FNR (awk)

Discussion started by: genehunter