Visit The New, Modern Unix Linux Community


Sum specified values (columns) per row


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sum specified values (columns) per row
# 1  
Sum specified values (columns) per row

Hello out there,

file.txt:
Code:
comp51820_c1_seq1	42	N	0:0:0:0:0:0	1:0:0:0:0:0	0:0:0:0:0:0	3:0:0:0:0:0	0:0:0:0:0:0
comp51820_c1_seq1	43	N	0:0:0:0:0:0	0:1:0:0:0:0	0:0:0:0:0:0	0:3:0:0:0:0	0:0:0:0:0:0
comp51820_c1_seq1	44	N	0:0:4:0:3:1	0:0:1:9:0:0	10:0:0:0:0:0	0:3:3:2:2:6	2:2:2:5:60:3
comp51820_c1_seq1	45	N	0:4:0:0:5:0	0:1:8:0:0:0	0:0:0:0:9:0	0:3:0:0:6:0	0:0:0:0:13:0

I would like to compare the sums of the colon-separated values in columns 4-8. Specifically, I would like to print the lines for which the sum of each column (4 through 8) is at least 8. Caveat: I need to ignore the fifth value in each column (this excludes line four in file.txt).

Desired output:
Code:
comp51820_c1_seq1	44	N	0:0:4:0:3:1	0:0:1:9:0:0	10:0:0:0:0:0	0:3:3:2:2:6	2:2:2:5:60:3

So far I have tried to use awk to sum columns, but I'm not sure how to compare multiple sums per line. Plus it seems there must be a better way but google hasn't given me an answer so far.
Code:
awk 'BEGIN { FS = ":| " } ; { col1+=$4+$5+$6+$7+$8+$9 }

# 2  
awk -f path.awk myInputFile where path.awk is:
Code:
{
  f=0
  for(i=4;i<=NF;i++) {
    n=split($i,a,":")
    s=0
    for(j=1;j<n;j++) s+=a[j]
    if (s>=8) {
       f++
       break
    }
  }
}
f

# 3  
If I'm not mistaken, this seems to print all files with at least one value >=8. I get the output:
Code:
comp51820_c1_seq1	44	N	0:0:4:0:3:1	0:0:1:9:0:0	10:0:0:0:0:0	0:3:3:2:2:6	2:2:2:5:60:3
comp51820_c1_seq1	45	N	0:4:0:0:5:0	0:1:8:0:0:0	0:0:0:0:9:0	0:3:0:0:6:0	0:0:0:0:13:0

However I would like to print only lines where ALL columns 4-8 have sum >=8, EXCLUDING the fifth value.
# 4  
ah, ok:
Code:
{  
    f=0   
    for(i=4;i<=NF;i++) { 
       n=split($i,a,":")
       s=0  
       for(j=1;j<n;j++) s+=a[j] 
       if (s>=8)  f++
   } 
}
 f==(NF-4)

# 5  
Thank you, this works. Would you mind also showing how I would do it if I did not want to exclude the fifth value? So just print lines where the sum of ALL values in each of columns 4-8 is at least 8? I can't tell which part of the code addresses this part of it.
# 6  
Quote:
Originally Posted by pathunkathunk
Thank you, this works. Would you mind also showing how I would do it if I did not want to exclude the fifth value? So just print lines where the sum of ALL values in each of columns 4-8 is at least 8? I can't tell which part of the code addresses this part of it.
Code:
for(j=1;j<=n;j++) s+=a[j]

# 7  
This works for the example file, but when I try it with my larger file there are some issues. For example if these lines are the input:
Code:
comp51820_c1_seq1	405	N	0:29:0:0:0:0	0:51:0:0:0:0	0:57:0:0:0:0	0:6:0:0:0:0	0:37:0:0:0:0
comp51820_c1_seq1	406	N	0:0:0:29:0:0	0:0:0:51:0:0	0:0:0:57:0:0	0:0:0:6:0:0	0:0:0:37:0:0
comp51820_c1_seq1	407	N	0:0:0:31:0:0	0:0:0:48:0:0	0:0:0:59:0:0	0:0:0:8:0:0	0:0:0:45:0:0
comp51820_c1_seq1	408	N	0:31:0:0:0:0	0:51:0:0:0:0	0:60:0:0:0:0	0:9:0:0:0:0	0:48:0:0:0:0
comp51820_c1_seq1	409	N	0:1:0:0:0:0	0:51:0:0:0:0	0:60:0:0:0:0	0:9:0:0:0:0	0:48:0:0:0:0

The output I get is:
Code:
-bash-4.1$ cat path.awk 
{   
	f=0   
	for(i=4;i<=NF;i++) {     
		n=split($i,a,":")     
		s=0     
		for(j=1;j<=n;j++) s+=a[j]     
		if (s>=8)        
		f++   } 
} 
f==(NF-4)
-bash-4.1$ 
-bash-4.1$ awk -f path.awk file.txt
comp51820_c1_seq1	405	N	0:29:0:0:0:0	0:51:0:0:0:0	0:57:0:0:0:0	0:6:0:0:0:0	0:37:0:0:0:0
comp51820_c1_seq1	406	N	0:0:0:29:0:0	0:0:0:51:0:0	0:0:0:57:0:0	0:0:0:6:0:0	0:0:0:37:0:0
comp51820_c1_seq1	409	N	0:1:0:0:0:0	0:51:0:0:0:0	0:60:0:0:0:0	0:9:0:0:0:0	0:48:0:0:0:0

This output is incorrect because the sum of $7 is < 8 in the first two lines, and the sum of $4 < 8 in the third line.

Expected output is:
Code:
comp51820_c1_seq1	407	N	0:0:0:31:0:0	0:0:0:48:0:0	0:0:0:59:0:0	0:0:0:8:0:0	0:0:0:45:0:0
comp51820_c1_seq1	408	N	0:31:0:0:0:0	0:51:0:0:0:0	0:60:0:0:0:0	0:9:0:0:0:0	0:48:0:0:0:0

The latest version also don't seem to work with the original example file:
Code:
-bash-4.1$ cat file2.txt 
comp51820_c1_seq1	42	N	0:0:0:0:0:0	1:0:0:0:0:0	0:0:0:0:0:0	3:0:0:0:0:0	0:0:0:0:0:0
comp51820_c1_seq1	43	N	0:0:0:0:0:0	0:1:0:0:0:0	0:0:0:0:0:0	0:3:0:0:0:0	0:0:0:0:0:0
comp51820_c1_seq1	44	N	0:0:4:0:3:1	0:0:1:9:0:0	10:0:0:0:0:0	0:3:3:2:2:6	2:2:2:5:60:3
comp51820_c1_seq1	45	N	0:4:0:0:5:0	0:1:8:0:0:0	0:0:0:0:9:0	0:3:0:0:6:0	0:0:0:0:13:0
-bash-4.1$ 
-bash-4.1$ awk -f path.awk file2.txt
-bash-4.1$ awk -f path2.awk file2.txt
comp51820_c1_seq1	44	N	0:0:4:0:3:1	0:0:1:9:0:0	10:0:0:0:0:0	0:3:3:2:2:6	2:2:2:5:60:3
-bash-4.1$ 
-bash-4.1$ cat path2.awk 
{   
	f=0   
	for(i=4;i<=NF;i++) {     
		n=split($i,a,":")     
		s=0     
		for(j=1;j<n;j++) s+=a[j]     
		if (s>=8)        
		f++   } 
} 
f==(NF-4)
-bash-4.1$ 
-bash-4.1$ cat path.awk 
{   
	f=0   
	for(i=4;i<=NF;i++) {     
		n=split($i,a,":")     
		s=0     
		for(j=1;j<=n;j++) s+=a[j]     
		if (s>=8)        
		f++   } 
} 
f==(NF-4)


Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #477
Difficulty: Medium
The first hard disk (IBM 350) was developed in 1956 by IBM and had a capacity of 3.75MB.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

I have a file abc.csv, from which I need column 24(PurchaseOrder_TotalCost) to get the sum_of_amounts with date and row count into another file say output.csv abc.csv- UTF-8,,,,,,,,,,,,,,,,,,,,,,,,, ... (6 Replies)
Discussion started by: Tahir_M
6 Replies

2. Shell Programming and Scripting

Do replace operation and awk to sum multiple columns if another column has duplicate values

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (12 Replies)
Discussion started by: as7951
12 Replies

3. UNIX for Beginners Questions & Answers

Group by columns and add sum in new columns

Dear Experts, I have input file which is comma separated, has 4 columns like below, BRAND,COUNTRY,MODEL,COUNT NIKE,USA,DUMMY,5 NIKE,USA,ORIGINAL,10 PUMA,FRANCE,DUMMY,20 PUMA,FRANCE,ORIGINAL,15 ADIDAS,ITALY,DUMMY,50 ADIDAS,ITALY,ORIGINAL,50 SPIKE,CHINA,DUMMY,1O And expected output add... (2 Replies)
Discussion started by: ricky1991
2 Replies

4. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns satisfy the condition

HI All, I'm embedding SQL query in Script which gives following output: Assignee Group Total ABC Group1 17 PQR Group2 5 PQR Group3 6 XYZ Group1 10 XYZ Group3 5 I have saved the above output in a file. How do i sum up the contents of this output so as to get following output: ... (4 Replies)
Discussion started by: Khushbu
4 Replies

5. Shell Programming and Scripting

Add sum of columns and max as new row

Hi, I am a new bie i need some help with respect to shell onliner; I have data in following format Name FromDate UntilDate Active Changed Touched Test 28-03-2013 28-03-2013 1 0.6667 100 Test2 28-03-2013 03-04-2013 ... (1 Reply)
Discussion started by: gangaraju6
1 Replies

6. UNIX for Dummies Questions & Answers

Unique values in a row sum the next column in UNIX

Hi would like to ask you guys any advise regarding my problem I have this kind of data file.txt 111111111,20 111111111,50 222222222,70 333333333,40 444444444,10 444444444,20 I need to get this file1.txt 111111111,70 222222222,70 333333333,40 444444444,30 using this code I can... (6 Replies)
Discussion started by: reks
6 Replies

7. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns match on two rows

Hi all, I know this sounds suspiciously like a homework course; but, it is not. My goal is to take a file, and match my "ID" column to the "Date" column, if those conditions are true, add the total number of minutes worked and place it in this file, while not printing the original rows that I... (6 Replies)
Discussion started by: mtucker6784
6 Replies

8. UNIX for Dummies Questions & Answers

Select 2 columns and transpose row by row

Hi, I have a tab-delimited file as follows: 1 1 2 2 3 3 4 4 a a b b c c d d 5 5 6 6 7 7 8 8 e e f f g g h h 9 9 10 10 11 11 12 12 i i j j k k l l 13 13 14 14 15 15 16 16 m m n n o o p p The output I need is: 1 1 a a 5 5 e e 9 9 i i 13... (5 Replies)
Discussion started by: mvaishnav
5 Replies

9. Shell Programming and Scripting

Sum up values of columns in 4 files using shell script

I am new to shell script.I have records like below in 4 different files which have about 10000 records each, all records unique and sorted based on column 2. 1 2 3 4 5 6 --------------------------- SR|1010478|000044590|1|0|0| SR|1014759|000105790|1|0|0| SR|1016609|000108901|1|0|0|... (2 Replies)
Discussion started by: reach.sree@gmai
2 Replies

10. Shell Programming and Scripting

Sum of values coming in a row

Hi, my requirement is to sum values in a row. eg: input is: sum,value1,value2,value3,.....,value N Required Output: sum,<summation of N values> Please help me... (5 Replies)
Discussion started by: MrGopal666
5 Replies

Featured Tech Videos