Extract difference of two columns from different rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract difference of two columns from different rows
# 1  
Old 02-24-2010
Bug Extract difference of two columns from different rows

Hello guys,

Please help me to solve this problem. I have tried some awk commands but couldn't succeed.
I have a tab delimited file where each record is separated by ------ and 4th column of each record is same.

<INPUT FILE>
Code:
------
peon    53931587        53931821        Health=max01g42730.1
------
SDS     53931589        53931821        Health=max01g42730.1
peon    53932031        53932091        Health=max01g42730.1
SDS     53932031        53932091        Health=max01g42730.1
peon    53932827        53932904        Health=max01g42730.1
SDS     53932827        53932904        Health=max01g42730.1
peon    53933029        53933144        Health=max01g42730.1
SDS     53933029        53933144        Health=max01g42730.1
peon    53933271        53933610        Health=max01g42730.1
------
peon    53931587        53931821        Health=max01g52340.2
------
SDS     39623053        39623169        Health=max05g35620.1
peon    39623246        39623350        Health=max05g35620.1
SDS     39623246        39623350        Health=max05g35620.1
peon    39623990        39624103        Health=max05g35620.1

I want an output file where rows are added corresponding to each record. An examlpe output file is -
Code:
------
zxy    0    Health=max01g42730.1
------
zxy    210    Health=max01g42730.1
zxy    736    Health=max01g42730.1
zxy    125    Health=max01g42730.1
zxy    127    Health=max01g42730.1
------
zxy    0    Health=max01g52340.2
------
zxy    77    Health=max05g35620.1
zxy    640    Health=max05g35620.1

Here, 3rd column is same as 4th column of input file
2nd column is calculated by subtracting 3rd column of row starting with SDS
from 2nd column of row starting with peon i.e. (53932031-53931821 = 210) in case of second record. This has to be calculated for all the
SDS-peon pairs in each record. Value zero is assigned if there is only 1 row in a record i.e. SDS-peon pair is missing.

Thanks in advance.

Last edited by sam_2921; 02-24-2010 at 06:39 AM..
# 2  
Old 02-24-2010
In your given output,your definition for the output is wrong.First,clarify me in this.You're calculating the second column for the output file by subtracting the third column value of SDS row from second column of the peon row.And then,you're also subtracting 1 from the subtracted result.Am I correct?
# 3  
Old 02-24-2010
Yes, you are right Vivek but I have corrected the data now. Thanks.
# 4  
Old 02-24-2010
Can you please the output first one
Code:
zxy    0    Health=max01g42730.1

This is 0 as there is no SDS-peon pair.
and

Code:
zxy    210    Health=max01g42730.1

because there is a pair of SDS and next line of peon
so the calculation is subtract 3rd column of row starting with SDS with 2nd column of row starting with peon.

and output 3 column is 4th column of peon.


Check this one

Code:
awk '$1 ~/SDS/ { SDS=$3; last=1} $1 ~/peon/ {p=$3; if(last==1) { printf("xyz\t%d\t%s\n",$2-SDS,$4) } else { printf ("xyz\t0\t%s\n",$4) } } /\-\-/ {print "-------";last=0;}  ' file

and output is

Code:
-------
xyz     0       Health=max01g42730.1
-------
xyz     210     Health=max01g42730.1
xyz     736     Health=max01g42730.1
xyz     125     Health=max01g42730.1
xyz     127     Health=max01g42730.1
-------
xyz     -1557   Health=max01g52340.2
-------
xyz     77      Health=max05g35620.1


Last edited by chakrapani; 02-24-2010 at 07:19 AM..
# 5  
Old 02-24-2010
This should do the trick:
Code:
awk '
/---/{print;next}
/SDS/{s=$3}
/peon/{printf("zxy\t%s\t%s\n", s?$2-s:"0", $4);s=""}
' file

# 6  
Old 02-24-2010
Code:
zxy    77    Health=max05g35620.1
zxy    640    Health=max05g35620.1

Last two ... only one has pair ... so you should not get 640 ...
# 7  
Old 02-24-2010
Quote:
Originally Posted by chakrapani
Code:
zxy    77    Health=max05g35620.1
zxy    640    Health=max05g35620.1

Last two ... only one has pair ... so you should not get 640 ...
This is what I get with my code:

Code:
$ cat file
------
peon    53931587        53931821        Health=max01g42730.1
------
SDS     53931589        53931821        Health=max01g42730.1
peon    53932031        53932091        Health=max01g42730.1
SDS     53932031        53932091        Health=max01g42730.1
peon    53932827        53932904        Health=max01g42730.1
SDS     53932827        53932904        Health=max01g42730.1
peon    53933029        53933144        Health=max01g42730.1
SDS     53933029        53933144        Health=max01g42730.1
peon    53933271        53933610        Health=max01g42730.1
------
peon    53931587        53931821        Health=max01g52340.2
------
SDS     39623053        39623169        Health=max05g35620.1
peon    39623246        39623350        Health=max05g35620.1
SDS     39623246        39623350        Health=max05g35620.1
peon    39623990        39624103        Health=max05g35620.1
$
$ awk '
/---/{print;next}
/SDS/{s=$3}
/peon/{printf("zxy\t%s\t%s\n", s?$2-s:"0", $4);s=""}
' file
------
zxy     0       Health=max01g42730.1
------
zxy     210     Health=max01g42730.1
zxy     736     Health=max01g42730.1
zxy     125     Health=max01g42730.1
zxy     127     Health=max01g42730.1
------
zxy     0       Health=max01g52340.2
------
zxy     77      Health=max05g35620.1
zxy     640     Health=max05g35620.1
$

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Calculating Time difference Between two Rows in Linux

16:45:51 10051 77845 16:45:51 10051 77845 16:46:52 10051 77846 16:46:53 10051 77846 Match the last PID then subtract second line time with first line. Please help me with any command or script. working in media company on a project OS: RHEl7 tried command: awk 'function... (2 Replies)
Discussion started by: vivekn
2 Replies

2. Shell Programming and Scripting

Extract rows with different values at 2 columns

Hallo, I would need to extract only rows which has different value in the second and third column. Thank you very much for any advices Input: A 0 0 B 0 1 C 1 1 D 1 3 Output B 0 1 D 1 3 (4 Replies)
Discussion started by: kamcamonty
4 Replies

3. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

4. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns match on two rows

Hi all, I know this sounds suspiciously like a homework course; but, it is not. My goal is to take a file, and match my "ID" column to the "Date" column, if those conditions are true, add the total number of minutes worked and place it in this file, while not printing the original rows that I... (6 Replies)
Discussion started by: mtucker6784
6 Replies

5. UNIX for Dummies Questions & Answers

Finding difference in 1st field for rows of data

I have a file that has multiple lines, of grouped data, that typically all have the same values in the 1st field, however, I would like to search the 1st field for any differences and set a flag to use in an "if" statement to run some other routine. An example of the typical file is below,... (2 Replies)
Discussion started by: co21ss
2 Replies

6. Shell Programming and Scripting

Deleting all the fields(columns) from a .csv file if all rows in that columns are blanks

Hi Friends, I have come across some files where some of the columns don not have data. Key, Data1,Data2,Data3,Data4,Data5 A,5,6,,10,, A,3,4,,3,, B,1,,4,5,, B,2,,3,4,, If we see the above data on Data5 column do not have any row got filled. So remove only that column(Here Data5) and... (4 Replies)
Discussion started by: ks_reddy
4 Replies

7. Shell Programming and Scripting

Extract several columns with few rows

Hello, I want to extract several columns and rows from a huge tab delimited file for example: I want to print from from column 3 to 68 till row number 30. I have tried using cut command but it was extracting whole 3rd and 68th column. Please suggest a solution. Ryan (8 Replies)
Discussion started by: ryan9011
8 Replies

8. Shell Programming and Scripting

Extract values from a matrix given the rows and columns

Hi All, I have a huge (and its really huge!) matrix about 400GB in size (2 million rows by 1.5 million columns) . I am trying to optimize its space by creating a sparse representation of it. Miniature version of the matrix looks like this (matrix.mtx): 3.4543 65.7876 54.564 2.12344... (4 Replies)
Discussion started by: shoaibjameel123
4 Replies

9. Shell Programming and Scripting

Difference between corresponding elements of successive rows

Hi, I have a file in the following format a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a4 b4 c4 d4 I need a script to find the difference between corresponding values of successive rows. So the output would have one less row than the input file and should look like: a2-a1 b2-b1 c2-c1 d2-d1... (4 Replies)
Discussion started by: sajal.bhatia
4 Replies

10. Shell Programming and Scripting

Difference between two rows

Dears, I have a list as follows, 2 4 8 If I want to find the difference between two consecutive rows. Then I have to store the specific rows in two variables and then find the difference. Could someone tell how this can be done. Regards, (7 Replies)
Discussion started by: JimJim
7 Replies
Login or Register to Ask a Question