AWK - Difference in multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK - Difference in multiple files
# 1  
Old 02-09-2010
AWK - Difference in multiple files

Hello again,

I've run into another problem that I've been unable to solve. With everyone's help last time, the script worked perfectly! This problem takes a little more finesse, and the bash script I thought up didn't work, so I've canned it. I'd like to try awk if possible. Here's my problem:

I have a multitude of sequential files like:

Code:
a_r01.dat
a_r02.dat
a_r03.dat
a_r04.dat

That continues to a certain number (in this case 47 of these .dat files, so the last one is _r47.dat). Inside of each file, there are four columns:

Code:
.705  0.00  1.00  0
1.02  0.00  1.00  10
2.05  0.00  1.00  100
3.06  0.00  1.00  5000

Here's the tricky part. The first column in each of the .dat files is the same, and I don't really care about the second or third column. What I would like is a script that looks at a_r02.dat and a_r01.dat, computes the different in the fourth column between the two files, and prints that (along with the value of the first column) into a different file, and then continues by computing the difference of the fourth column between a_r03.dat and a_r02.dat and prints that out. I'm not sure if I've explained this well, so I'll try for an example. Suppose two files are:

a_r01.dat
Code:
.705  0.00  1.00 10
1.02  0.00  1.00 10
2.05  0.00  1.00 15
3.06  0.00  1.00 35

a_r02.dat
Code:
.705  0.00  1.00 10
1.02  0.00  1.00 20
2.05  0.00  1.00 25
3.06  0.00  1.00 60

The script should compute the difference between the fourth column of each row and print an output.dat file that looks like:

Code:
.705  0
1.02  10
2.05  15
3.06  20

After it is done, it should continue by computing the same thing for a_r03 and a_r02, all the way down the line (until it terminates after running out of files), and each time, should put the difference in a new column in the output.dat file. So after a time, the output.dat should look like (using only column headers divided by a | symbol):

Code:
Column1 | r02-r01 | r03-r02 | r04-r03 | r05-r04 |

If my math is right, if I have 10 .dat files, the output.dat should have the first column and then 9 other columns of 4th row differences (between the input .dat files).

I hope I've explained this appropriately, and please let me know if anyone has any questions. I'm hoping that awk can do this, but if it is easier using perl or bash (or any other program), please let me know and I can easily get access to it. Thank you so much for your help!
# 2  
Old 02-09-2010
If you want the output sorted you can pipe the output to sort:
Code:
awk 'NR==FNR{ a[$1]=$4; s[$1]=$1; next } {
  s[$1] = s[$1] " | " $4 - a[$1]; a[$1]=$4
}
END{for(i in s) {print s[i]}}' a_r*.dat | sort

With gawk you can avoid the sort command if you set the undocumented WHINY_USERS variable:
Code:
WHINY_USERS=1 gawk 'NR==FNR{ a[$1]=$4; s[$1]=$1; next } {
  s[$1] = s[$1] " | " $4 - a[$1]; a[$1]=$4
}
END{for(i in s) {print s[i]}}' a_r*.dat

# 3  
Old 02-09-2010
Thank you so much for the reply Franklin52!

I've tested out the script, and it seems I've not explained the problem quite right. I'm sorry that I haven't described the problem well enough. Let me try it again.

Firstly, I think I confused people with the last "code" bit in my initial post. I don't want the values separated by a " | " line, just spaces will do. I suppose I got carried away in my explanation, they were just meant as dividers so people knew that I wanted the values separated. So, that being said, the first column of the output.dat file should be exactly like the first column of all the input files.

Ultimately, what I would like to do is put the output.dat file in gnuplot and tell it to "plot 'output.dat' u 1:2 w l" and then replot 'output.dat' u 1:3 w l", and so on (just to give you an idea of what I want to do with the data).

So I would like the first column of the output.dat to be an exact copy of the first column of any of my input files (the first column is always the same). The second column of output.dat is the difference between the 4th column of a_r01.dat and a_r02.dat, the third column is the difference between a_r02 and a_r03, fourth is a_r04 - a_r03, etc and so on until I run out of .dat files.

I hope I'm not coming off as too whiny, that's not my intent at all. I really do appreciate everyone's help around here, most of those that frequent these boards have coding skills I could only dream of!
# 4  
Old 02-09-2010
Sorry I don't get it. I've changed the field separator and this is my output with 3 files:
Code:
$ cat a1.txt
.705  0.00  1.00 10
1.02  0.00  1.00 10
2.05  0.00  1.00 15
3.06  0.00  1.00 35
$ cat a2.txt
.705  0.00  1.00 10
1.02  0.00  1.00 20
2.05  0.00  1.00 25
3.06  0.00  1.00 60
$ cat a3.txt
.705  0.00  1.00 30
1.02  0.00  1.00 40
2.05  0.00  1.00 45
3.06  0.00  1.00 80
$ WHINY_USERS=1 awk 'NR==FNR{ a[$1]=$4; s[$1]=$1; next } {
  s[$1] = s[$1] " " $4 - a[$1]; a[$1]=$4
}
END{for(i in s) {print s[i]}}' a*.txt
.705 0 20
1.02 10 20
2.05 10 20
3.06 25 20

If that's not what you desire, post the desired output from the given 3 sample files.
# 5  
Old 02-09-2010
That's it! It works perfectly. I was seeing some kind of funky input for the first few lines, and I think it has to do with a bug in the code. It became significantly easier to read once the "|" was gone and I could see the cause of the bug. Thanks you very much for your help Smilie

Edit : One more quick question: Would the script change significantly if I just had it do the difference between a1.txt and all the others? Like a2 - a1, a3 - a1, a4 - a1, etc? How would that look? Thanks again!

Last edited by Eblue562; 02-09-2010 at 04:42 PM..
# 6  
Old 02-09-2010
Quote:
Originally Posted by Eblue562
That's it! It works perfectly. I was seeing some kind of funky input for the first few lines, and I think it has to do with a bug in the code. It became significantly easier to read once the "|" was gone and I could see the cause of the bug. Thanks you very much for your help Smilie

Edit : One more quick question: Would the script change significantly if I just had it do the difference between a1.txt and all the others? Like a2 - a1, a3 - a1, a4 - a1, etc? How would that look? Thanks again!
Not really, remove this command a[$1]=$4 from the code:

Code:
WHINY_USERS=1 gawk 'NR==FNR{ a[$1]=$4; s[$1]=$1; next } {
  s[$1] = s[$1] " " $4 - a[$1]
}
END{for(i in s) {print s[i]}}' a*.txt

Ensure that a1.txt must be the first file.
# 7  
Old 06-03-2010
This is realy a wonderful code.
A bit curious can you explain how this code is executing.

Thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Difference between 2 files, one with 1 column and 2nd file with multiple columns

Hi, I need to find the difference between 2 files in unix and write the result in the new file File1: A B File2: X 123 hajkd Y 345 adjfka A 123 djafjhd B 678 dsndjks Output file: X 123 hajkd Y 345 adjfka Thanks. (6 Replies)
Discussion started by: nani1984
6 Replies

2. Shell Programming and Scripting

awk, multiple files input and multiple files output

Hi! I'm new in awk and I need some help. I have a folder with a lot of files and I need that awk do something in each file and print a new file with the output. The input file name should be modified when I print the outpu files. Thanks in advance for help! :-) ciao (5 Replies)
Discussion started by: gabrysfe
5 Replies

3. Shell Programming and Scripting

compare multiple files and get the difference

Hi all, i have 50 files .data should be same in these 50 files , so my task is to find the difference. i need a logic , which finds difference between all files and print in output file with file name where it found that difference . i tried below logic , but its not giving me what i want. let... (2 Replies)
Discussion started by: deepakiniimt
2 Replies

4. Shell Programming and Scripting

Compare two files and output difference, by first field using awk.

It seems like a common task, but I haven't been able to find the solution. vitallog.txt 1310,John,Hancock 13211,Steven,Mills 122,Jane,Doe 138,Thoms,Doe 1500,Micheal,May vitalinfo.txt 12122,Jane,Thomas 122,Janes,Does 123,Paul,Kite **OUTPUT** vitalfiltered.txt 12122,Jane,Thomas... (2 Replies)
Discussion started by: charles33
2 Replies

5. Shell Programming and Scripting

perform 3 awk commands to multiple files in multiple directories

Hi, I have a directory /home/datasets/ which contains a bunch (720) of subdirectories called hour_1/ hour_2/ etc..etc.. in each of these there is a single text file called (hour_1.txt in hour_1/ , hour_2.txt for hour_2/ etc..etc..) and i would like to do some text processing in them. Each of... (20 Replies)
Discussion started by: amarn
20 Replies

6. Shell Programming and Scripting

Find file size difference in two files using awk

Hi, Could anyone help me to solve this problem? I have two files "f1" and "f2" having 2 fields in each, a) file size and b) file name. The data are almost same in both the files except for few and new additional lines. Now, I have to find out and print the output as, the difference in the... (3 Replies)
Discussion started by: royalibrahim
3 Replies

7. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

8. UNIX for Dummies Questions & Answers

best method of replacing multiple strings in multiple files - sed or awk? most simple preferred :)

Hi guys, say I have a few files in a directory (58 text files or somthing) each one contains mulitple strings that I wish to replace with other strings so in these 58 files I'm looking for say the following strings: JAM (replace with BUTTER) BREAD (replace with CRACKER) SCOOP (replace... (19 Replies)
Discussion started by: rich@ardz
19 Replies

9. Shell Programming and Scripting

extract multiple cloumns from multiple files; skip rows and include filenames; awk

Hello, I am trying to write a bash shell script that does the following: 1.Finds all *.txt files within my directory of interest 2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format) 3. skips the first 10 rows of the file 4. extracts and... (4 Replies)
Discussion started by: manishabh
4 Replies

10. Shell Programming and Scripting

Multiple search string in multiple files using awk

Hi, filenames: contains name of list of files to search in. placelist contains the names of places to be searched in all files in "filenames" for i in $(<filenames) do egrep -f placelist $i if ] then echo $i fi done >> outputfile Output i am getting: (0 Replies)
Discussion started by: pinnacle
0 Replies
Login or Register to Ask a Question