AWK - Difference in multiple files

02-09-2010

Registered User

9, 0

Join Date: Jan 2010

Last Activity: 24 June 2013, 3:46 PM EDT

Posts: 9

Thanks Given: 0

Thanked 0 Times in 0 Posts

AWK - Difference in multiple files

Hello again,

I've run into another problem that I've been unable to solve. With everyone's help last time, the script worked perfectly! This problem takes a little more finesse, and the bash script I thought up didn't work, so I've canned it. I'd like to try awk if possible. Here's my problem:

I have a multitude of sequential files like:

Code:

a_r01.dat
a_r02.dat
a_r03.dat
a_r04.dat

That continues to a certain number (in this case 47 of these .dat files, so the last one is _r47.dat). Inside of each file, there are four columns:

Code:

.705  0.00  1.00  0
1.02  0.00  1.00  10
2.05  0.00  1.00  100
3.06  0.00  1.00  5000

Here's the tricky part. The first column in each of the .dat files is the same, and I don't really care about the second or third column. What I would like is a script that looks at a_r02.dat and a_r01.dat, computes the different in the fourth column between the two files, and prints that (along with the value of the first column) into a different file, and then continues by computing the difference of the fourth column between a_r03.dat and a_r02.dat and prints that out. I'm not sure if I've explained this well, so I'll try for an example. Suppose two files are:

a_r01.dat

Code:

.705  0.00  1.00 10
1.02  0.00  1.00 10
2.05  0.00  1.00 15
3.06  0.00  1.00 35

a_r02.dat

Code:

.705  0.00  1.00 10
1.02  0.00  1.00 20
2.05  0.00  1.00 25
3.06  0.00  1.00 60

The script should compute the difference between the fourth column of each row and print an output.dat file that looks like:

Code:

After it is done, it should continue by computing the same thing for a_r03 and a_r02, all the way down the line (until it terminates after running out of files), and each time, should put the difference in a new column in the output.dat file. So after a time, the output.dat should look like (using only column headers divided by a | symbol):

Code:

Column1 | r02-r01 | r03-r02 | r04-r03 | r05-r04 |

If my math is right, if I have 10 .dat files, the output.dat should have the first column and then 9 other columns of 4th row differences (between the input .dat files).

I hope I've explained this appropriately, and please let me know if anyone has any questions. I'm hoping that awk can do this, but if it is easier using perl or bash (or any other program), please let me know and I can easily get access to it. Thank you so much for your help!

Eblue562

View Public Profile for Eblue562

Find all posts by Eblue562

02-09-2010

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

If you want the output sorted you can pipe the output to sort:

Code:

awk 'NR==FNR{ a[$1]=$4; s[$1]=$1; next } {
  s[$1] = s[$1] " | " $4 - a[$1]; a[$1]=$4
}
END{for(i in s) {print s[i]}}' a_r*.dat | sort

With gawk you can avoid the sort command if you set the undocumented WHINY_USERS variable:

Code:

WHINY_USERS=1 gawk 'NR==FNR{ a[$1]=$4; s[$1]=$1; next } {
  s[$1] = s[$1] " | " $4 - a[$1]; a[$1]=$4
}
END{for(i in s) {print s[i]}}' a_r*.dat

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

02-09-2010

Registered User

9, 0

Join Date: Jan 2010

Last Activity: 24 June 2013, 3:46 PM EDT

Posts: 9

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thank you so much for the reply Franklin52!

I've tested out the script, and it seems I've not explained the problem quite right. I'm sorry that I haven't described the problem well enough. Let me try it again.

Firstly, I think I confused people with the last "code" bit in my initial post. I don't want the values separated by a " | " line, just spaces will do. I suppose I got carried away in my explanation, they were just meant as dividers so people knew that I wanted the values separated. So, that being said, the first column of the output.dat file should be exactly like the first column of all the input files.

Ultimately, what I would like to do is put the output.dat file in gnuplot and tell it to "plot 'output.dat' u 1:2 w l" and then replot 'output.dat' u 1:3 w l", and so on (just to give you an idea of what I want to do with the data).

So I would like the first column of the output.dat to be an exact copy of the first column of any of my input files (the first column is always the same). The second column of output.dat is the difference between the 4th column of a_r01.dat and a_r02.dat, the third column is the difference between a_r02 and a_r03, fourth is a_r04 - a_r03, etc and so on until I run out of .dat files.

I hope I'm not coming off as too whiny, that's not my intent at all. I really do appreciate everyone's help around here, most of those that frequent these boards have coding skills I could only dream of!

Eblue562

View Public Profile for Eblue562

Find all posts by Eblue562

02-09-2010

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Sorry I don't get it. I've changed the field separator and this is my output with 3 files:

Code:

$ cat a1.txt
.705  0.00  1.00 10
1.02  0.00  1.00 10
2.05  0.00  1.00 15
3.06  0.00  1.00 35
$ cat a2.txt
.705  0.00  1.00 10
1.02  0.00  1.00 20
2.05  0.00  1.00 25
3.06  0.00  1.00 60
$ cat a3.txt
.705  0.00  1.00 30
1.02  0.00  1.00 40
2.05  0.00  1.00 45
3.06  0.00  1.00 80
$ WHINY_USERS=1 awk 'NR==FNR{ a[$1]=$4; s[$1]=$1; next } {
  s[$1] = s[$1] " " $4 - a[$1]; a[$1]=$4
}
END{for(i in s) {print s[i]}}' a*.txt
.705 0 20
1.02 10 20
2.05 10 20
3.06 25 20

If that's not what you desire, post the desired output from the given 3 sample files.

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

02-09-2010

Registered User

9, 0

Join Date: Jan 2010

Last Activity: 24 June 2013, 3:46 PM EDT

Posts: 9

Thanks Given: 0

Thanked 0 Times in 0 Posts

That's it! It works perfectly. I was seeing some kind of funky input for the first few lines, and I think it has to do with a bug in the code. It became significantly easier to read once the "|" was gone and I could see the cause of the bug. Thanks you very much for your help

Edit : One more quick question: Would the script change significantly if I just had it do the difference between a1.txt and all the others? Like a2 - a1, a3 - a1, a4 - a1, etc? How would that look? Thanks again!

Last edited by Eblue562; 02-09-2010 at 04:42 PM..

Eblue562

View Public Profile for Eblue562

Find all posts by Eblue562

02-09-2010

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Quote:

Originally Posted by Eblue562

Not really, remove this command a[$1]=$4 from the code:

Code:

WHINY_USERS=1 gawk 'NR==FNR{ a[$1]=$4; s[$1]=$1; next } {
  s[$1] = s[$1] " " $4 - a[$1]
}
END{for(i in s) {print s[i]}}' a*.txt

Ensure that a1.txt must be the first file.

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

06-03-2010

Registered User

29, 1

Join Date: Feb 2010

Last Activity: 15 April 2012, 6:08 PM EDT

Posts: 29

Thanks Given: 0

Thanked 1 Time in 1 Post

This is realy a wonderful code.
A bit curious can you explain how this code is executing.

Thanks

mr_harish80

View Public Profile for mr_harish80

Find all posts by mr_harish80

Shell Programming and Scripting

AWK - Difference in multiple files

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Difference between 2 files, one with 1 column and 2nd file with multiple columns

Discussion started by: nani1984

2. Shell Programming and Scripting

awk, multiple files input and multiple files output

Discussion started by: gabrysfe

3. Shell Programming and Scripting

compare multiple files and get the difference

Discussion started by: deepakiniimt

4. Shell Programming and Scripting

Compare two files and output difference, by first field using awk.

Discussion started by: charles33

5. Shell Programming and Scripting

perform 3 awk commands to multiple files in multiple directories

Discussion started by: amarn

6. Shell Programming and Scripting

Find file size difference in two files using awk

Discussion started by: royalibrahim

7. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Discussion started by: Liverpaul09

8. UNIX for Dummies Questions & Answers

best method of replacing multiple strings in multiple files - sed or awk? most simple preferred :)

Discussion started by: rich@ardz

9. Shell Programming and Scripting

extract multiple cloumns from multiple files; skip rows and include filenames; awk

Discussion started by: manishabh

10. Shell Programming and Scripting

Multiple search string in multiple files using awk

Discussion started by: pinnacle