Understanding / Modifying AWK command

06-20-2011

Registered User

21, 0

Join Date: Jun 2011

Last Activity: 28 June 2011, 10:22 PM EDT

Location: Brisbane, Australia

Posts: 21

Thanks Given: 10

Thanked 0 Times in 0 Posts

Understanding / Modifying AWK command

Hey all,

So I have an AWK command here

Code:

awk '{if(FNR==NR) {arr[$0]++;next} if($0 in arr) { arr[$0]--; if (arr[$0] == 0) delete arr[$0];next}{print $0 >"list2output.csv"}} END {for(i in arr){print i >"list1output.csv"}}' list1 list2

(refer to image for a more readable format)

This code was submitted to me by a fellow forum member, and at the time I just used it and it worked.

Now though, I would really like to try and understand it so that I can modify it.

Ideally what I wanted it to do was:

Grab the second field on the first line in list1,
Open list2 and search all lines for a match in the second field,
If a match is found, write the first fields from both lists to an output file in field 1 and field 2 respectively, delimited by a space and place the second field from list1 (the one we used to search) as the third field in the output file, #this would show the file size change
If a match is not found, output 'field2"not found"' to the output file, #this would show that the file was deleted
Repeat for each line in list1
When list1 is finished, go into list2 and repeat
When a match is found, do nothing #already wrote matches to the output file
When a match is not found output field2"new file" to the output file, #this would show new files
Repeat for each line in list2

Now to my knowledge (without looking at the code, only the results) the awk command compares list 1 to list 2, any changes found (size or new files) are output to outputfile1. Then file 2 is compared to file 1, any changes found are output to outputfile2.

Looking at the code however confused me as $0 is an entire line read, and from there my brain just melts.

As I did not write this code, I'm not asking for more code to be written for me (unless like me you love programming and unlike me you're experienced and actually good at it!) I would prefer to understand the current code above so perhaps I can write my own or modify it in a way to produce the results I want for ease of comparison.

Ideally I will learn how to put them into a csv file and open it up in excel which is another hurdle I'll get to eventually.

I've always hated arrays, never been good at them, but they are an important part of programming so if you can shed any light that would be appreciated (I'm also very new to AWK).

Cheers,
Michael.

Understanding / Modifying AWK command-readable_formatpng

Aussiemick

View Public Profile for Aussiemick

Find all posts by Aussiemick

06-30-2011

Registered User

945, 306

Join Date: Jun 2011

Last Activity: 1 January 2020, 5:25 PM EST

Location: South Carolina, USA

Posts: 945

Thanks Given: 32

Thanked 306 Times in 284 Posts

A week without a reply? Did you resolve this?

Code:

        if (FNR == NR)
        {
                arr[$0]++;
                next
        }

This goes through the first file adding each line into an array. It keeps a count of each lines occurrence. NR is the "record number," or the line number being processed since the default record delimiter is a new line. FNR is record number for this file. So for the first file, the two match. For the next file, NR keeps counting upwards but FNR is reset. The 'next' will skip to the next record (and all the code below), because we're done and the code below is for our second file.

Code:

        if ($0 in arr)
        {
                arr[$0]--;
                if (arr[$0] == 0)
                        delete arr[$0];
                next
        }

We're looking at our second file. We look to see if the line is matched with one in array (which holds all lines of first file). We decrease the count. If it's 0, we want to remove the "key" from the associative array, so that it's not looped through in the END statement's loop.

Code:

        {
                print $0 > "list2output.csv"
        }

Since that last block had a 'next' inside the if, this is only executed if the line was not in the first file. list2output.csv is exclusive to list2.

Code:

END {
        for (i in arr)
        {
                print i > "list1output.csv"
        }
}

What we have left in the array is put into list1output. What's left should be lines that appear in list1 more than they appear in list2.

neutronscott

View Public Profile for neutronscott

Visit neutronscott's homepage!

Find all posts by neutronscott

UNIX for Dummies Questions & Answers

Understanding / Modifying AWK command

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Modifying an awk script for syllable splitting

Discussion started by: gimley

2. Shell Programming and Scripting

awk : Need Help in Understanding a command

Discussion started by: rahul2662

3. Shell Programming and Scripting

Awk: Modifying columns based on comparison

Discussion started by: siramitsharma

4. Shell Programming and Scripting

Modifying awk code to be inside condition

Discussion started by: kristinu

5. Shell Programming and Scripting

awk script for modifying the file

Discussion started by: sonam273

6. Shell Programming and Scripting

Modifying the cd command

Discussion started by: iennetastic

7. Shell Programming and Scripting

modifying a awk line

Discussion started by: phil_heath

8. Shell Programming and Scripting

awk modifying entries on 2 lines at 2 positions

Discussion started by: gav2251

9. Shell Programming and Scripting

Modifying command for Tar.gz Files.

Discussion started by: openspark

10. Shell Programming and Scripting

need help with understanding and modifying script

Discussion started by: dakkorn