Understanding / Modifying AWK command


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Understanding / Modifying AWK command
# 1  
Old 06-20-2011
Understanding / Modifying AWK command

Hey all,

So I have an AWK command here

Code:
awk '{if(FNR==NR) {arr[$0]++;next} if($0 in arr) { arr[$0]--; if (arr[$0] == 0) delete arr[$0];next}{print $0 >"list2output.csv"}} END {for(i in arr){print i >"list1output.csv"}}' list1 list2

(refer to image for a more readable format)

This code was submitted to me by a fellow forum member, and at the time I just used it and it worked.

Now though, I would really like to try and understand it so that I can modify it.



Ideally what I wanted it to do was:
  1. Grab the second field on the first line in list1,
  2. Open list2 and search all lines for a match in the second field,
  3. If a match is found, write the first fields from both lists to an output file in field 1 and field 2 respectively, delimited by a space and place the second field from list1 (the one we used to search) as the third field in the output file, #this would show the file size change
  4. If a match is not found, output 'field2"not found"' to the output file, #this would show that the file was deleted
  5. Repeat for each line in list1
  6. When list1 is finished, go into list2 and repeat
  7. When a match is found, do nothing #already wrote matches to the output file
  8. When a match is not found output field2"new file" to the output file, #this would show new files
  9. Repeat for each line in list2
Now to my knowledge (without looking at the code, only the results) the awk command compares list 1 to list 2, any changes found (size or new files) are output to outputfile1. Then file 2 is compared to file 1, any changes found are output to outputfile2.

Looking at the code however confused me as $0 is an entire line read, and from there my brain just melts.


As I did not write this code, I'm not asking for more code to be written for me (unless like me you love programming and unlike me you're experienced and actually good at it!) I would prefer to understand the current code above so perhaps I can write my own or modify it in a way to produce the results I want for ease of comparison.

Ideally I will learn how to put them into a csv file and open it up in excel which is another hurdle I'll get to eventually.

I've always hated arrays, never been good at them, but they are an important part of programming so if you can shed any light that would be appreciated (I'm also very new to AWK).

Cheers,
Michael.
Understanding / Modifying AWK command-readable_formatpng
# 2  
Old 06-30-2011
A week without a reply? Did you resolve this?

Code:
        if (FNR == NR)
        {
                arr[$0]++;
                next
        }

This goes through the first file adding each line into an array. It keeps a count of each lines occurrence. NR is the "record number," or the line number being processed since the default record delimiter is a new line. FNR is record number for this file. So for the first file, the two match. For the next file, NR keeps counting upwards but FNR is reset. The 'next' will skip to the next record (and all the code below), because we're done and the code below is for our second file.

Code:
        if ($0 in arr)
        {
                arr[$0]--;
                if (arr[$0] == 0)
                        delete arr[$0];
                next
        }

We're looking at our second file. We look to see if the line is matched with one in array (which holds all lines of first file). We decrease the count. If it's 0, we want to remove the "key" from the associative array, so that it's not looped through in the END statement's loop.
Code:
        {
                print $0 > "list2output.csv"
        }

Since that last block had a 'next' inside the if, this is only executed if the line was not in the first file. list2output.csv is exclusive to list2.

Code:
END {
        for (i in arr)
        {
                print i > "list1output.csv"
        }
}

What we have left in the array is put into list1output. What's left should be lines that appear in list1 more than they appear in list2.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Modifying an awk script for syllable splitting

I have found this syllable splitter in awk. The code is given below. Basically the script cuts words and names into syllables. However it fails when the word contains 2 consonants which constitute a single syllable. An example is given below ashford raphael The output is as under: ... (4 Replies)
Discussion started by: gimley
4 Replies

2. Shell Programming and Scripting

awk : Need Help in Understanding a command

Hello I am working on a Change request and Stuck at a point. The below awk command is used in the function. float_test ( ) { echo | awk 'END { exit ( !( '"$1"')); }' } I understand that awk 'END' is used to add one line at the end and exit is used to end the script with an error... (4 Replies)
Discussion started by: rahul2662
4 Replies

3. Shell Programming and Scripting

Awk: Modifying columns based on comparison

Hi, I have following input in the file in which i want to club the entries based on $1. Also $11 is equal to $13 of other record(where $13 must be on higher side for any $1) then sum all other fields except $11 & $13. Final output required is as follows: INPUTFILE: ... (11 Replies)
Discussion started by: siramitsharma
11 Replies

4. Shell Programming and Scripting

Modifying awk code to be inside condition

I have the following awk script and I want to change it to be inside a condition for the file extension. ################################################################################ # abs: Returns the absolute value of a number function abs(val) { return val > 0 ? val \ ... (4 Replies)
Discussion started by: kristinu
4 Replies

5. Shell Programming and Scripting

awk script for modifying the file

I have the records in the format one row 0009714494919I MY010727408948010 NNNNNN N PUSAAR727408948010 R007YM08705 9602002 S 111+0360832-0937348 I want to get it int the format 0009714494919I MY010727408948010 NNNNNN N PUSAAR727408948010 R007YM08705 9602002 S ... (2 Replies)
Discussion started by: sonam273
2 Replies

6. Shell Programming and Scripting

Modifying the cd command

Hello everyone, I am currently doing a utility that acts like a cd command but keeps track of your change of directories. What I plan to do is just to modify the cd source code, is that even possible? Can someone please help me with this? I also need to incorporate the command with the... (3 Replies)
Discussion started by: iennetastic
3 Replies

7. Shell Programming and Scripting

modifying a awk line

Hi, I want to print specific columns (from 201 to 1001). The line that I am using is listed below. However I also want to print column 1. So column 1 and 201 to 1001. What modifcations do I need to make? Code: awk -F'\t' 'BEGIN {min = 201; max = 1001 }{for (i=min; i<=max; i++) printf... (5 Replies)
Discussion started by: phil_heath
5 Replies

8. Shell Programming and Scripting

awk modifying entries on 2 lines at 2 positions

Hi this script adds text in the correct place on one line only, in a script. awk 'BEGIN{ printf "Enter residue and chain information: " getline var < "-" split(var,a) } /-s rec:/{$7=a; } {print}' FLXDOCK but I need the same info added at position 7 on line 34 and... (1 Reply)
Discussion started by: gav2251
1 Replies

9. Shell Programming and Scripting

Modifying command for Tar.gz Files.

:) Hi, I use the following command to search for a string in all the files in the directories and sub directories. find . -type f -print | xargs grep bermun@cial.net Can someone please cite a method wherin I can find the entries from a list of 300-500 *.gz files by modifying the above... (2 Replies)
Discussion started by: openspark
2 Replies

10. Shell Programming and Scripting

need help with understanding and modifying script

hi all, i am new to UNIX. this is my first time using Ubuntu. i need to do this for my fyp. i am using an artificial neural network model to predict the yield strength of steel. the shell script used to execute this model is as shown here: #Thomas Sourmail, Cambridge University /... (4 Replies)
Discussion started by: dakkorn
4 Replies
Login or Register to Ask a Question