Linux - Calculations between multiple rows of data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Linux - Calculations between multiple rows of data
# 1  
Old 01-25-2016
Linux - Calculations between multiple rows of data

Morning All,

I am needing assistance with a calculation process, which performs calculations on a group of records.

Here is a breakdown of my requirement:

Code:
Col1 = Always same value.
Col2 = Grouping Column, and will have the same value for 5/6/7 records for example.
Col3 = Date
Col4 = Amount 1
Col5 = Amount 2
Col6 = Amount 3 (to be derived as part of this code)
Col7 = Amount 4 (to be derived as part of this code)

Here is an example piece of input data:

Quote:
Col1|Col2|Col3|Col4|Col5
ABC|00000001|15-Dec-15|13,400|110,000|
ABC|00000001|31-Jan-16|13,490|80,000|
ABC|00000001|29-Feb-16|13,500|77,000|
ABC|00000001|31-May-16|40,200|37,000|
ABC|00000001|31-Aug-16|42,000|0|
ABC|00000002|15-Dec-15|13,400|110,000|
ABC|00000002|31-Jan-16|13,490|80,000|
ABC|00000002|29-Feb-16|13,500|77,000|
ABC|00000002|31-May-16|40,200|37,000|
ABC|00000002|31-Aug-16|42,000|0|
I need to perform inter row calculations for each group of data (Col2 e.g. All the 00000001 records together):
1. Take the first group of records (all 00000001 values in Col2):
- If the record is the first record in the array, Amount 3 needs to be set to 100.
- For all other records, it needs to be: "(Col5 current record / Col5 First record)*100"
Code:
 E.g. Second record -- "(80,000/110,000)*100"

Code:
 Third Record -- "(77,000/110,000)*100"

And so on....

Output Result:

Quote:
Col1|Col2|Col3|Col4|Col5|Col6
ABC|00000001|15-Dec-15|13,400|110,000|100 ---> First record in grouping, therefore defaulted to 100.
ABC|00000001|31-Jan-16|13,490|80,000|72.72727273 ---> This is calculated by (80,000/110,000)*100
ABC|00000001|29-Feb-16|13,500|77,000|70 ---> This is calculated by (77,000/110,000)*100
ABC|00000001|31-May-16|40,200|37,000|33.63636364 ---> This is calculated by (37,000/110,000)*100
ABC|00000001|31-Aug-16|42,000|0|0 ---> This is calculated by (0/110,000)*100
ABC|00000002|15-Dec-15|13,400|110,000|100 ---> First record in grouping, therefore defaulted to 100.
ABC|00000003|31-Jan-16|13,490|80,000|72.72727273 ---> This is calculated by (80,000/110,000)*100
ABC|00000004|29-Feb-16|13,500|77,000|70 ---> This is calculated by (77,000/110,000)*100
ABC|00000005|31-May-16|40,200|37,000|33.63636364 ---> This is calculated by (37,000/110,000)*100
ABC|00000006|31-Aug-16|42,000|0|0 ---> This is calculated by (0/110,000)*100
2. Using the data derived above, I now need to find the midpoint values:
- I need to take the current row col6 value and add it together with the row below col6 value (unless it is the last record of the group, and therefore it will be 0), and then divide by 2 to get the midpoint "(Col6 current record + Col6 below record)/2"
E.g. First Record --
Code:
(100+72.72727273)/2

Second Record --
Code:
(72.72727273+70)/2

And so on....
Output Result:


Quote:
Col1|Col2|Col3|Col4|Col5|Col6|Col7
ABC|00000001|15-Dec-15|13,400|110,000|100|86.36363636
ABC|00000001|31-Jan-16|13,490|80,000|72.72727273|71.36363636
ABC|00000001|29-Feb-16|13,500|77,000|70|51.81818182
ABC|00000001|31-May-16|40,200|37,000|33.63636364|16.81818182
ABC|00000001|31-Aug-16|42,000|0|0|0
ABC|00000002|15-Dec-15|13,400|110,000|100|86.36363636
ABC|00000003|31-Jan-16|13,490|80,000|72.72727273|71.36363636
ABC|00000004|29-Feb-16|13,500|77,000|70|51.81818182
ABC|00000005|31-May-16|40,200|37,000|33.63636364|16.81818182
ABC|00000006|31-Aug-16|42,000|0|0|0
Any help would be much appreciated. I am sure I could perform the above via a normal Linux Loop function, however this will be running over large volumes of data, and therefore I am sure something like Awk will be more efficient.

Last edited by RichZR; 01-25-2016 at 07:06 AM.. Reason: Amending quotationing
# 2  
Old 01-25-2016
Hello RichZR,

Thank you for using code tags as per forum rulesSmilie. You could use code tags for sample Inputs too which you have shown us in you post.
Following may help you in same.
For your 1st requirement following may help you:
Code:
awk -F"|" 'NR==1{$(NF+1)="col6";print;next} NR==2{$(NF)=100;VAL=$(NF-1);print;next} {$(NF)=$(NF-1) * 100/VAL;print}' OFS="|"  Input_file > Output_file

Output will be as follows.
Code:
Col1|Col2|Col3|Col4|Col5|col6
ABC|00000001|15-Dec-15|13,400|110,000|100
ABC|00000001|31-Jan-16|13,490|80,000|72.7273
ABC|00000001|29-Feb-16|13,500|77,000|70
ABC|00000001|31-May-16|40,200|37,000|33.6364
ABC|00000001|31-Aug-16|42,000|0|0
ABC|00000002|15-Dec-15|13,400|110,000|100
ABC|00000002|31-Jan-16|13,490|80,000|72.7273
ABC|00000002|29-Feb-16|13,500|77,000|70
ABC|00000002|31-May-16|40,200|37,000|33.6364
ABC|00000002|31-Aug-16|42,000|0|0

Now for your 2nd requirement you could try as follows. Let's say we have taken output of 1st requirement into a file named Output_file.
Code:
awk -F"|" 'NR==1{$(NF+1)="col7";print;next} {B[++i]=$0;A[i]=$NF} END{for(i=1;i<NR;i++){VAL=(A[i] + A[i+1])/2;print B[i] FS VAL}}' OFS="|"   Output_file

Output will be as follows.
Code:
Col1|Col2|Col3|Col4|Col5|col6|col7
ABC|00000001|15-Dec-15|13,400|110,000|100|86.3637
ABC|00000001|31-Jan-16|13,490|80,000|72.7273|71.3637
ABC|00000001|29-Feb-16|13,500|77,000|70|51.8182
ABC|00000001|31-May-16|40,200|37,000|33.6364|16.8182
ABC|00000001|31-Aug-16|42,000|0|0|50
ABC|00000002|15-Dec-15|13,400|110,000|100|86.3637
ABC|00000002|31-Jan-16|13,490|80,000|72.7273|71.3637
ABC|00000002|29-Feb-16|13,500|77,000|70|51.8182
ABC|00000002|31-May-16|40,200|37,000|33.6364|16.8182
ABC|00000002|31-Aug-16|42,000|0|0|0

Thanks,
R. Singh
# 3  
Old 01-25-2016
In one go:
Code:
awk -F\| '
NR == 1         {print $0 "|Col6|Col7"
                 next
                }
$2 != LK        {LK = $2
                 LV = $5
                 IX = 1
                }
                {PC = $5 / LV * 100
                 if (NR > 2) printf "|%s\n", IX?0:(LP + PC) / 2
                 LP = PC
                 IX = 0
                }
                {printf "%s%s", $0, LP
                }
END             {printf "|%s\n", (LP + PC) / 2
                }
' file
Col1|Col2|Col3|Col4|Col5|Col6|Col7
ABC|00000001|15-Dec-15|13,400|110,000|100|86.3636
ABC|00000001|31-Jan-16|13,490|80,000|72.7273|71.3636
ABC|00000001|29-Feb-16|13,500|77,000|70|51.8182
ABC|00000001|31-May-16|40,200|37,000|33.6364|16.8182
ABC|00000001|31-Aug-16|42,000|0|0|0
ABC|00000002|15-Dec-15|13,400|110,000|100|86.3636
ABC|00000002|31-Jan-16|13,490|80,000|72.7273|71.3636
ABC|00000002|29-Feb-16|13,500|77,000|70|51.8182
ABC|00000002|31-May-16|40,200|37,000|33.6364|16.8182
ABC|00000002|31-Aug-16|42,000|0|0|0

This User Gave Thanks to RudiC For This Post:
# 4  
Old 01-25-2016
Quote:
Originally Posted by RudiC
In one go:
Code:
awk -F\| '
NR == 1         {print $0 "|Col6|Col7"
                 next
                }
$2 != LK        {LK = $2
                 LV = $5
                 IX = 1
                }
                {PC = $5 / LV * 100
                 if (NR > 2) printf "|%s\n", IX?0:(LP + PC) / 2
                 LP = PC
                 IX = 0
                }
                {printf "%s%s", $0, LP
                }
END             {printf "|%s\n", (LP + PC) / 2
                }
' file
Col1|Col2|Col3|Col4|Col5|Col6|Col7
ABC|00000001|15-Dec-15|13,400|110,000|100|86.3636
ABC|00000001|31-Jan-16|13,490|80,000|72.7273|71.3636
ABC|00000001|29-Feb-16|13,500|77,000|70|51.8182
ABC|00000001|31-May-16|40,200|37,000|33.6364|16.8182
ABC|00000001|31-Aug-16|42,000|0|0|0
ABC|00000002|15-Dec-15|13,400|110,000|100|86.3636
ABC|00000002|31-Jan-16|13,490|80,000|72.7273|71.3636
ABC|00000002|29-Feb-16|13,500|77,000|70|51.8182
ABC|00000002|31-May-16|40,200|37,000|33.6364|16.8182
ABC|00000002|31-Aug-16|42,000|0|0|0

Thanks Rudi - This worked a treat! SmilieSmilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to transpose pieces of data in a column to multiple rows?

Hello Everyone, I am very new to the world of regular expressions. I am trying to use grep/sed for the following: Input file is something like this and there are multiple such files: abc 1 2 3 4 5 ***END*** abc 6 7 8 9 ***END*** abc 10 (2 Replies)
Discussion started by: shellnewuser
2 Replies

2. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

3. Shell Programming and Scripting

Create Multiple UNIX Files for Multiple SQL Rows output

Dear All, I am trying to write a Unix Script which fires a sql query. The output of the sql query gives multiple rows. Each row should be saved in a separate Unix File. The number of rows of sql output can be variable. I am able save all the rows in one file but in separate files. Any... (14 Replies)
Discussion started by: Rahul_Bhasin
14 Replies

4. UNIX for Dummies Questions & Answers

Suggestion to convert data in rows to data in columns

Hello everyone! I have a huge dataset looking like this: nameX nameX 0 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ............... nameY nameY 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ..... nameB nameB 0 1 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ..... (can be several thousands of codes) and I need... (8 Replies)
Discussion started by: kush
8 Replies

5. UNIX for Dummies Questions & Answers

help with doing calculations on data

Dear All, I have a long list like this: 337 375 364 389 443 578 1001 20100 . . . . etc I would like to substract each value from the first entry which in this case is 337 and report it in a separate column. So the expected output looks like 337 0 (10 Replies)
Discussion started by: pawannoel
10 Replies

6. UNIX for Dummies Questions & Answers

Formatting Multiple fields on 1 line to multiple rows

I'm trying extract a number of filename fields from a log file and copy them out as separate rows in a text file so i can load them into a table. I'm able to get the filenames but the all appear on one line. I tried using the cut command with the -d (delimiter) option but cant seem to make it... (1 Reply)
Discussion started by: Sinbad-66
1 Replies

7. Shell Programming and Scripting

Split single rows to multiple rows ..

Hi pls help me out to short out this problem rm PAB113_011.out rm: PAB113_011.out: override protection 644 (yes/no)? n If i give y it remove the file. But i added the rm command as a part of ksh file and i tried to remove the file. Its not removing and the the file prompting as... (7 Replies)
Discussion started by: sri_aue
7 Replies

8. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

9. UNIX for Dummies Questions & Answers

Converting rows into multiple-rows

Hi every one; I have a file with 22 rows and 13 columns which includes floating numbers. I want to parse the file so that every five columns in the row would be a new record (row). For example, the first line in the old file should be converted into three lines with first two lines contain 5... (6 Replies)
Discussion started by: PHL
6 Replies

10. Shell Programming and Scripting

extract multiple cloumns from multiple files; skip rows and include filenames; awk

Hello, I am trying to write a bash shell script that does the following: 1.Finds all *.txt files within my directory of interest 2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format) 3. skips the first 10 rows of the file 4. extracts and... (4 Replies)
Discussion started by: manishabh
4 Replies
Login or Register to Ask a Question