Summing columns over group of lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Summing columns over group of lines
# 1  
Old 04-24-2013
Linux Summing columns over group of lines

I have an input file that looks like:
Code:
ID1 V1 ID2 V2 P1 P2 P3 P4 ..... n no. of columns
1 1 1 1 1.0000 1.0000 1.0000 1.0000
1 1 1 2 0.9999 0.8888 0.7777 0.6666
1 2 1 1 0.8888 0.7777 0.6666 0.5555
1 2 1 2 0.7777 0.6666 0.5555 0.4444
2 1 1 1 0.6666 0.5555 0.4444 0.3333
2 1 1 2 0.5555 0.4444 0.3333 0.2222
2 2 1 1 0.4444 0.3333 0.2222 0.1111
2 2 1 2 0.3333 0.2222 0.1111 0.1234

I would like to pick each field from column 5 i.e. P1 over each group of four lines and add them. The output needs to look like
Code:
ID1 ID2 P1 P2 P3 P4 .....n columns
1 1 3.6664 3.3331 ...... so on
2 1 1.9998 1.5554 ....... so on

Is there a way to do this using awk scripts ???

Last edited by vbe; 04-24-2013 at 08:40 AM.. Reason: please use code for your code and data Thanks
# 2  
Old 04-24-2013
The short answer is yes.
But, I'm not sure I understand your requirements. If you want help creating an awk script to perform this taks, please answer the following questions:
  1. Do you want input fields 2 and 4 to be removed from every input line?
  2. If the value in input field 1 is not a constant in each set of four input lines, what happens?
    1. Is that set skipped? If so, should an error be printed?
    2. Should the value from the first line in the set be printed?
    3. Should the value from the last line in the set be printed?
  3. If the value in input field 3 is not a constant in each set of four input lines, what happens?
    1. Is that set skipped? If so, should an error be printed?
    2. Should the value from the first line in the set be printed?
    3. Should the value from the last line in the set be printed?
  4. Is the number of fields a constant for a given input file?
# 3  
Old 04-24-2013
Quote:
Originally Posted by Don Cragun
The short answer is yes.
But, I'm not sure I understand your requirements. If you want help creating an awk script to perform this taks, please answer the following questions:
  1. Do you want input fields 2 and 4 to be removed from every input line?
  2. If the value in input field 1 is not a constant in each set of four input lines, what happens?
    1. Is that set skipped? If so, should an error be printed?
    2. Should the value from the first line in the set be printed?
    3. Should the value from the last line in the set be printed?
  3. If the value in input field 3 is not a constant in each set of four input lines, what happens?
    1. Is that set skipped? If so, should an error be printed?
    2. Should the value from the first line in the set be printed?
    3. Should the value from the last line in the set be printed?
  4. Is the number of fields a constant for a given input file?
Hi Don, thanks for taking the time. Below, are the answers to your questions:

1. Do you want input fields 2 and 4 to be removed from every input line?
Yes - they need to be removed

2. If the value in input field 1 is not a constant in each set of four input lines, what happens?
-Is that set skipped? If so, should an error be printed?
-Should the value from the first line in the set be printed?
-Should the value from the last line in the set be printed?
Basically, values in fields 1 and 3 will remain constant for every set of 4 lines. Hence for every group of four lines, I need the values in the first line for these fields.


3. If the value in input field 3 is not a constant in each set of four input lines, what happens?
-Is that set skipped? If so, should an error be printed?
-Should the value from the first line in the set be printed?
-Should the value from the last line in the set be printed?
As stated above... the values in input fields 1 and 3 will remain constant for every set of four lines

4.Is the number of fields a constant for a given input file?
Yes the number of fields is constant for a given input file.


The following is the code that I am working with right now - though it isnt working and does not have all the features I need
awk '{for(j=5;j<=NF;j++) {!(NR%4){sum+=$j}{printf("%04d ", sum/2)}} {print "\n"}}'


Thanks again!!
# 4  
Old 04-24-2013
The following awk script is a little more complex than you requested. It allows processing of multiple input files, prints an end of file separator if more than one input file is given, and prints a warning if there are lines left at the end of a file that don't make up a complete 4 line set.

Since you said all values in a 4 line set are constant in fields 1 and 3, I used the values in the last line of the set instead of in the 1st line of the set (it saved me from needing to create two more variables). If you really need the 1st line's values instead of the last line's values, it won't be hard to change this script to do that.

As always, if you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of awk.
Code:
awk '
FNR == 1 {
        # Check for incomplete set at end of previous file.
        if(l) {
                printf("%d line(s) skipped at end of %s.\n", l, file)
                for(i = 5; i <= n; i++) s[i] = 0
        }
        # Print file trailer if more than 1 file has been seen.
        if(nf++) printf("================== End of data from file %s\n", file)
        # Process headers: Print output headers, determine field count.
        printf("%s %s ", $1, $3)
        for(i = 5; i <= NF; i++) printf("%s%s", $i, i == NF ? "\n" : " ")
        n = NF          # set number of fields for this file
        l = 0           # set number of lines in current set
        file = FILENAME # save filename for diagnostics
        next
}
{       for(i = 5; i <= n; i++) s[i] += $i}
++l == 4 {
        l = 0
        printf("%d %d ", $1, $3)
        for(i = 5; i <= NF; i++) {
                printf("%6.4f%s", s[i], i == NF ? "\n" : " ")
                s[i] = 0
        }
}
END {   if(l) printf("%d line(s) skipped at end of %s.\n", l, file)
        if(nf > 1) printf("================== End of data from file %s\n", file)
}' input

This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 04-24-2013
Many thanks Don !! This certainly is more complex than what I was trying to do - but it does the job perfectly.

Thanks again!!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Summing per group in a loop

I want to sum and average all other columns by first column GR1 1 4 7 GR1 2 5 8 GR1 3 6 9 GR2 11 14 17 GR2 13 16 19 GR3 1 3 5 GR3 2 4 6 For a limited number of columns I can do... (2 Replies)
Discussion started by: jianp83
2 Replies

2. Shell Programming and Scripting

Summing columns in line

I have a file with the following format AAAAA 1.34B 0.76B 0.00B 0.00B 0.00B 0.00B 0.00B 0.00B 0.00B 0.00B 0.00B 0.00B 0.00B 0.00B 0.90B 0.00B 0.00B 0.46B 0.00B 0.03B 0.00B ... (4 Replies)
Discussion started by: ncwxpanther
4 Replies

3. Shell Programming and Scripting

Please Help!!!! Awk for summing columns based on selected column value

a,b,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,bb,cc,dd,ee,ff,gg,hh,ii a thru ii are digits and strings.... The awk needed....if coloumn 9 == i (coloumn 9 is string ), output the sum of x's(coloumn 22 ) in all records and sum of y's (coloumn 23 ) in all records in a file (records.txt).... (6 Replies)
Discussion started by: BrownBob
6 Replies

4. Homework & Coursework Questions

HELP with Unix scripts in summing columns in a file

1. The problem statement, all variables and given/known data: Hi guys, i'm a new guy here, and it's my first time creating a unix script. can you guys help me out here? i'd really appreciate it. :( Here's my problem: This is the file i'm using, it has 6 columns, the first three columns are... (12 Replies)
Discussion started by: ramneim
12 Replies

5. Homework & Coursework Questions

HELP with Unix scripts in summing columns in a file.

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: Hi guys, i'm a new guy here, and it's my first time creating a unix script. can you guys help me out here? i'd... (3 Replies)
Discussion started by: ramneim
3 Replies

6. Shell Programming and Scripting

Summing over specific lines and replacing the lines with the sum

Hi friends, This is sed & awk type question. It is slightly different from my previous question. I have a text file which has numbers spread all over the file. I want to sum the series of numbers (but no more than 10 numbers in series) whenever i find it and produce an output file with the... (4 Replies)
Discussion started by: kaaliakahn
4 Replies

7. Shell Programming and Scripting

Summing over specific lines and replacing the lines with the sum using sed, awk

Hi friends, This is sed & awk type question. I have a text file which has numbers spread all over the file. I want to sum the series of numbers whenever i find it and produce an output file with the sum. For example ###start of input text file #### abc def ghi 1 2 3 4 kjld random... (3 Replies)
Discussion started by: kaaliakahn
3 Replies

8. Shell Programming and Scripting

Summing values in columns

Basically I have to process a text file which has been sorted this way: John 12 John 13 John 10 John 900 Peter 20 Peter 30 Peter 32 The first column is a name, and the second an arbitrary value, both delimited by a space. How can I sum them up such that it would become: John 935... (2 Replies)
Discussion started by: Dwee
2 Replies

9. Shell Programming and Scripting

Awk: Summing values with group criteria

Hi Guys, I have a text file with ";" like separator F1;F2;F3;F4;F5 444;100041;IT;GLOB;1800000000 444;100041;TM;GLOB;1000000000 444;10300264;IT;GLOB;2000000000 444;10300264;IT;GLOB;2500000000 I have to sum the cullums F5 for same F2 and F3 collums The result must be: ... (7 Replies)
Discussion started by: gianluca2
7 Replies

10. Shell Programming and Scripting

Summing the columns of a file

Hi All, I have a file like - num.txt 12, 34, 65, line1 34, 65, 89, line2 43, 65, 77, line3 I want to do two things - 1. Add first three columns of each line and print the line with largest value. i.e. (12+34+65) for 1st line and so on. 2. Add middle column of each line i.e.... (3 Replies)
Discussion started by: asahlot
3 Replies
Login or Register to Ask a Question