Sum value from selected lines script (awk,perl)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sum value from selected lines script (awk,perl)
# 1  
Old 10-10-2009
Sum value from selected lines script (awk,perl)

Hello.
I face this (2 side) problem.

Some lines with this structure.

...........
12345678 4
12345989 13
12346356 205
12346644 74
12346819 22
.........


The first field (timestamp) is growing (or at least equal).

1)Sum the second fields if the first_field/500 are equals.
2)Sum the second fields if the difference between first fields is less than 500.
(sliding window)

In the example presented.

1) Becouse 12345678/500 and 12345989/500 both result 24691 sum=4+13
We cannot group the 3rd line so sum=205
And we group the 4th and 5th line so sum=74+22

2) We group the 1st and 2nd line becouse 12345989 - 12345678 < 500
For analogy we group the 2nd and 3th, the 3rd and 4th,
and the 3rd,4th and 5th becouse 12346819 (of the 5th line) - 12346356 (of the 3th line) < 500


Is there any (perl,awk,etc...) way to do it?

Thanks

Paolo
# 2  
Old 10-10-2009
In Perl it is very simple to do.

But before i work for you, i would want to know what you have tried so far ?!

You should try, and ask for clarifications/advices if you have some difficulties -- which is always good to learn.
# 3  
Old 10-10-2009
I know little awk and some elements of perl.


awk '{if ($1/500 > last_time_frame) { sum = $2 } else { sum+=$2;print sum };last_time_frame=$1/500;print sum}' AAAA.txt

No way Smilie
# 4  
Old 10-10-2009
Sorry, but the problem is not clear enough.

Quote:
Originally Posted by paolfili
...
2)Sum the second fields if the difference between first fields is less than 500.
(sliding window)
What's the length of the sliding window ?

- Is it just 2 (1st & 2nd, 2nd & 3rd, 3rd & 4th, ...) ?
- Or is it 3 (1st, 2nd & 3rd; 2nd, 3rd & 4th; ...) ?

Hopefully, it's not a cartesian product, i.e.

1st vs. (2nd, 3rd, 4th, ... , last_row)
2nd vs. (1st, 3rd, 4th, ... , last_row)
3rd vs. (1st, 2nd, 4th, ... , last_row)
...
last_row vs. (1st, 2nd, 3rd, ..., last-1_row)

Quote:
...
1) Becouse 12345678/500 and 12345989/500 both result 24691 sum=4+13
We cannot group the 3rd line so sum=205
And we group the 4th and 5th line so sum=74+22
- Ok, and what do you want to do with the sum ?
- Do you want to display it ? Or do nothing with it (highly unlikely) ?
- If you want to display it, then how ? The total against each row ? Or the total against the first row only ? Or against the second row only ?

Quote:
...
and the 3rd,4th and 5th becouse 12346819 (of the 5th line) - 12346356 (of the 3th line) < 500
This begs the first counter-question. Why compare the 3rd, 4th and 5th (considering that you have been comparing two-at-a-time all this while) ?
So again, what's the length of the sliding window ?

I guess a very simple example of your input file should help here. So, let's say your input file is as follows:

Code:
$
$ cat f1
100 1
200 2
300 3
400 4
500 5
600 6
700 7
$

What do you want your output to look like ?

tyler_durden
# 5  
Old 10-10-2009
???

Quote:
Originally Posted by durden_tyler
Sorry, but the problem is not clear enough.



What's the length of the sliding window ?

- Is it just 2 (1st & 2nd, 2nd & 3rd, 3rd & 4th, ...) ?
- Or is it 3 (1st, 2nd & 3rd; 2nd, 3rd & 4th; ...) ?

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
The lenght is whatever the data requires 1,1000,1000000 of data samples
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Hopefully, it's not a cartesian product, i.e.

1st vs. (2nd, 3rd, 4th, ... , last_row)
2nd vs. (1st, 3rd, 4th, ... , last_row)
3rd vs. (1st, 2nd, 4th, ... , last_row)
...
last_row vs. (1st, 2nd, 3rd, ..., last-1_row)

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
No cartesian product
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

- Ok, and what do you want to do with the sum ?
- Do you want to display it ? Or do nothing with it (highly unlikely) ?
- If you want to display it, then how ? The total against each row ? Or the total against the first row only ? Or against the second row only ?

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Only print the value.
For the other work I' m on my own.;-)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

This begs the first counter-question. Why compare the 3rd, 4th and 5th (considering that you have been comparing two-at-a-time all this while) ?
So again, what's the length of the sliding window ?

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
TimeFrame concept.
Events in a Time Frame (ex.500 microseconds)
I need to sum events in a :
1)STATIC time frame evironment.
2)DYNAMIC time frame environment.(what in Digital Signal Processing area is defined as Sliding Windows)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

I guess a very simple example of your input file should help here. So, let's say your input file is as follows:

Code:
$
$ cat f1
100 1
200 2
300 3
400 4
500 5
600 6
700 7
$

What do you want your output to look like ?

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
1 case)
sum=1+2+3+4
sum=5+6+7

2 case)
sum=1+2+3+4
sum=2+3+4+5+6
sum=3+4+5+6+7
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

tyler_durden
>>>>>>>>>>>>>>

Paolo
# 6  
Old 10-11-2009
Thanks for the clarification.

Code:
$ 
$ cat f1
100 1   
200 2   
300 3   
400 4   
500 5   
600 6   
700 7   
$       
$ # Case 1
$ ##
$ perl -lane 'chomp;
>             if (int($F[0]/500) != $prev){print "Sum=$s"; $s = $F[1]}
>             else {$s += $F[1]}                                      
>             $prev = int($F[0]/500);                                 
>             END {print "Sum=$s"}' f1                                
Sum=10                                                                
Sum=18                                                                
$                                                                     
$                                                                     
$ # Case 2                                                            
$ ##
$ perl -lne 'chomp; push @x,$_;
>            END {
>              for($i=0; $i<=$#x; $i++){
>                ($x1,$x2) = split/ /,$x[$i];
>                $s = $x2;
>                for ($j=$i+1; $j<=$#x; $j++) {
>                  ($y1,$y2) = split/ /,$x[$j];
>                  if ($y1 - $x1 < 500) {$s += $y2}
>                  else {last}
>                }
>                print "Sum=$s";
>              }
>            }' f1
Sum=15
Sum=20
Sum=25
Sum=22
Sum=18
Sum=13
Sum=7
$
$

# 7  
Old 10-11-2009
Using awk:

Case1:
Code:
awk '{ sum1[int($1/500)]+=$2 } END { for (i in sum1) print "Sum1 "sum1[i] } ' infile

Case2:
Code:
awk 'BEGIN{
       min=1
     }
     { time[NR]=$1
       val[NR]=sum2[NR]=$2
       i=min
       while (time[NR]-time[i]>=500)
         i++
       min=i
       for (i=min;i<NR;i++)
         sum2[NR]+=val[i]
     }
     END {
       for (i in sum2)
         print "Sum2: "sum2[i]
     }' infile

Case1+2 combined:
Code:
awk 'BEGIN{
       min=1
     }
     { sum1[int($1/500)]+=$2
       time[NR]=$1
       val[NR]=sum2[NR]=$2
       i=min
       while (time[NR]-time[i]>=500)
         i++
       min=i
       for (i=min;i<NR;i++)
         sum2[NR]+=val[i]
     }
     END {
       for (i in sum1)
         print "Sum1 "sum1[i]
       print ""
       for (i in sum2)
         print "Sum2: "sum2[i]
     }' infile

Original testset:
Code:
Sum1 17
Sum1 205
Sum1 96

Sum2: 4
Sum2: 17
Sum2: 218
Sum2: 279
Sum2: 301

Additional testset:
Code:
Sum1 10
Sum1 18

Sum2: 1
Sum2: 3
Sum2: 6
Sum2: 10
Sum2: 15
Sum2: 20
Sum2: 25

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script count lines and sum numbers from multiple files

I want to count the number of lines, I need this result be a number, and sum the last numeric column, I had done to make this one at time, but I need to make this for a crontab, so, it has to be an script, here is my lines: It counts the number of lines: egrep -i String file_name_201611* |... (5 Replies)
Discussion started by: Elly
5 Replies

2. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ... (6 Replies)
Discussion started by: asjaiswal
6 Replies

3. Shell Programming and Scripting

Summing over specific lines and replacing the lines with the sum using sed, awk

Hi friends, This is sed & awk type question. I have a text file which has numbers spread all over the file. I want to sum the series of numbers whenever i find it and produce an output file with the sum. For example ###start of input text file #### abc def ghi 1 2 3 4 kjld random... (3 Replies)
Discussion started by: kaaliakahn
3 Replies

4. Shell Programming and Scripting

AWK script - extracting min and max values from selected lines

Hi guys! I'm new to scripting and I need to write a script in awk. Here is example of file on which I'm working ATOM 4688 HG1 PRO A 322 18.080 59.680 137.020 1.00 0.00 ATOM 4689 HG2 PRO A 322 18.850 61.220 137.010 1.00 0.00 ATOM 4690 CD ... (18 Replies)
Discussion started by: grincz
18 Replies

5. Shell Programming and Scripting

awk script for getting the selected records from a file.

Hello, I have attached one file named file.txt . I have to create a file using the awk script with the records in which 38th position is P and not V . ex it should have 00501 HOLTSVILLE NYP00501 and it should not include 00501 I R S SERVICE CENTER ... (3 Replies)
Discussion started by: sonam273
3 Replies

6. Shell Programming and Scripting

trying to print selected fields of selected lines by AWK

I am trying to print 1st, 2nd, 13th and 14th fields of a file of line numbers from 29 to 10029. I dont know how to put this in one code. Currently I am removing the selected lines by awk 'NR==29,NR==10029' File1 > File2 and then doing awk '{print $1, $2, $13, $14}' File2 > File3 Can... (3 Replies)
Discussion started by: ananyob
3 Replies

7. Shell Programming and Scripting

Perl script to find particular field and sum it

Hi, I have a file with format a b c d e 1 1 2 2 2 1 2 2 2 3 1 1 1 1 2 1 1 1 1 4 1 1 1 1 6 in column e i want to find all similar fields ( with perl script )and sum it how many are there for instance in format above. 2 - 2 times 4 - 1 time 6 - 1 time what i use is ... (14 Replies)
Discussion started by: Learnerabc
14 Replies

8. Shell Programming and Scripting

shell script(Preferably awk or sed) to print selected number of columns from each row

Hi Experts, The question may look very silly by seeing the title, but please have a look at it clearly. I have a text file where the first 5 columns in each row were supposed to be attributes of a sample(like sample name, number, status etc) and the next 25 columns are parameters on which... (3 Replies)
Discussion started by: ks_reddy
3 Replies

9. Shell Programming and Scripting

Sum of all lines in file without roundup with awk

Hi, I have a file and I want to sum all the numbers in it. Example of the file: 0.6714359 -3842.59553830551 I used your forum (https://www.unix.com/shell-programming-scripting/74293-how-get-sum-all-lines-file.html) and found a script, what worked for me: awk '{a+=$0}END{print a}'... (8 Replies)
Discussion started by: mario8eren
8 Replies

10. UNIX for Dummies Questions & Answers

extracting selected few lines through perl

How can I extract few lines(like 10 to 15, top 10 and last 10) from a file using perl. I do it with sed, head and tail in unix scripting. I am new to perl. Appreciate your help. (2 Replies)
Discussion started by: paruthiveeran
2 Replies
Login or Register to Ask a Question

Featured Tech Videos