awk to calculate fields only if match is found


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to calculate fields only if match is found
# 1  
Old 05-21-2016
awk to calculate fields only if match is found

Trying to combine the matching $5 values between file1 and file2. If a match is found then the last $6 value in the match and the sum of $7 are outputted to a new file. The awk below I hope is a good start. Thank you Smilie.

file1
Code:
chr12    9221325    9221448    chr12:9221325-9221448    A2M    1    125
chr12    9221325    9221448    chr12:9221325-9221448    A2M    2    125
chr12    9221800    9221999    chr12:9221800-9221999    A2M1    1    120
chr12    9221800    9221999    chr12:9221800-9221999    A2M1    2    120
chr12    9221800    9221999    chr12:9221800-9221999    A2M1    3    130

file2
Code:
chr12    9221325    9221448    chr12:9221325-9221448    A2M    1    120
chr12    9221325    9221448    chr12:9221325-9221448    A2M    2    120
chr12    9221800    9221999    chr12:9221800-9221999    A2M1    1    125
chr12    9221800    9221999    chr12:9221800-9221999    A2M1    2    125
chr12    9221800    9221999    chr12:9221800-9221999    A2M1    3    135

awk tried with error
Code:
awk '{sum[$4]+=$6; count[$7]++} 
>     END{for(k in sum) printf "%s %.1f\n",  k, sum[k]/count[k]}' file1 file2 > out
awk: cmd. line:2: (FILENAME=file2 FNR=5) fatal: division by zero attempted

desired output
Code:
chr12    9221325    9221448    chr12:9221325-9221448    A2M    2    250
chr12    9221800    9221999    chr12:9221800-9221999     A2M1    3    385


Last edited by cmccabe; 05-21-2016 at 01:02 PM.. Reason: fixed format
# 2  
Old 05-21-2016
Hello cmccabe,

Could you please try following and let me know if this helps.
1st: If you need to have exact count like file2 of 2nd last field(where I am assuming like it could be anything/any number, though your Input_file shows only in increment order or each occurrence of the field 4th). So always it will print the latest value of the 4th field's 2nd last field here, without taking care what it is.
Code:
awk 'FNR==NR{A[$4];next} ($4 in A){Q=$NF;P=$(NF-1);$(NF-1)=$NF="";E[$4]=$0 OFS P OFS (W[$4]+=Q)} END{for(i in E){print E[i]}}' file1 file2

Output will be as follows.
Code:
chr12 9221325 9221448 chr12:9221325-9221448 A2M   2 240
chr12 9221800 9221999 chr12:9221800-9221999 A2M1   3 385

2nd: In case your 2nd last field shows the number of occurrences of 4th field, then following could help you.
Code:
awk 'FNR==NR{A[$4];next} ($4 in A){Q=$NF;$(NF-1)=$NF="";++S[$4];E[$4]=$0 OFS S[$4] OFS (W[$4]+=Q)} END{for(i in E){print E[i]}}' file1 file2

Output will be as follows.
Code:
chr12 9221325 9221448 chr12:9221325-9221448 A2M   2 240
chr12 9221800 9221999 chr12:9221800-9221999 A2M1   3 385

Now coming on to the confusion which I have by reading your post, if you see carefully you have shown Output_file's last field(which is SUM of the last fields), first line it shows 250 so it looks you are taking sum from file1 BUT on same time second line shows 385 which shows SUM should be from file2. So above solutions are taking SUM of values in file2 not in file1. Please try above ones and let me know how it goes then.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 05-21-2016
The first awk is always the case. The only thing that may change is the output (depending on the case). How can I print only $5, $6, $7 ? Thank you for your help, iworks great Smilie.

Code:
A2M   2 250
A2M1   3 370

# 4  
Old 05-21-2016
Maybe something like:
Code:
awk '
FNR == NR {
	a[$5]
	next
}
$5 in a {
	last6[$5] = $6
	sum7[$5] += $7
}
END {	for(i in last6)
		print i, last6[i], sum7[i]
}' file[12]

will do what you want, producing the output:
Code:
A2M1 3 385
A2M 2 240

(Note that doing it this way the order of the output lines is random.)

And, as always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 05-22-2016
Quote:
Originally Posted by cmccabe
The first awk is always the case. The only thing that may change is the output (depending on the case). How can I print only $5, $6, $7 ? Thank you for your help, iworks great Smilie.
Code:
A2M   2 250
A2M1   3 370

Hello cmccabe,

A little change in my previous code, could you please try following and let me know if this helps.
Code:
awk 'FNR==NR{A[$4];next} ($4 in A){Q=$NF;P=$(NF-1);$(NF-1)=$NF="";E[$4]=$5 OFS P OFS (W[$4]+=Q)} END{for(i in E){print E[i]}}' file1 file2

Output will be as follows.
Code:
A2M 2 240
A2M1 3 385

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 05-22-2016
Thank you both very much, works great Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk match two fields in two files

Hi, I have two TEST files t.xyz and a.xyz which have three columns each. a.xyz have more rows than t.xyz. I will like to output rows at which $1 and $2 of t.xyz match $1 and $2 of a.xyz. Total number of output rows should be equal to that of t.xyz. It works fine, but when I apply it to large... (6 Replies)
Discussion started by: geomarine
6 Replies

2. Shell Programming and Scripting

awk to update file based on match in 3 fields

Trying to use awk to store the value of $5 in file1 in array x. That array x is then used to search $4 of file1 to find aa match (I use x to skip the header in file1). Since $4 can have multiple strings in it seperated by a , (comma), I split them and iterate througn each split looking for a match.... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. UNIX for Beginners Questions & Answers

Match Fields between two files, print portions of each file together when matched in ([g]awk)'

I've written an awk script to compare two fields in two different files and then print portions of each file on the same line when matched. It works reasonably well, but every now and again, I notice some errors and cannot seem to figure out what the issue may be and am turning to you for help. ... (2 Replies)
Discussion started by: jvoot
2 Replies

4. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

5. Shell Programming and Scripting

awk to output match and mismatch with count using specific fields

In the below awk I am trying output to one file those lines that match between $2,$3,$4 of file1 and file2 with the count in (). I am also trying to output those lines that are missing between $2,$3,$4 of file1 and file2 with the count of in () each. Both input files are tab-delimited, but the... (7 Replies)
Discussion started by: cmccabe
7 Replies

6. Shell Programming and Scripting

awk partial string match and add specific fields

Trying to combine strings that are a partial match to another in $1 (usually below it). If a match is found than the $2 value is added to the $2 value of the match and the $3 value is added to the $3 value of the match. I am not sure how to do this and need some expert help. Thank you :). file ... (2 Replies)
Discussion started by: cmccabe
2 Replies

7. Shell Programming and Scripting

awk to match keyword and return matches and unique fields

Trying to use awk to find a keyword and return the matches in the row, but also $1 and $2, which are the unique id's, but they only appear once. Thank you :). file name 31 Index Chromosomal Position Gene Inheritance 122 2106725 TSC2 AD 124 2115481 TSC2 AD 121 2105400 TSC2 AD... (6 Replies)
Discussion started by: cmccabe
6 Replies

8. Shell Programming and Scripting

awk Help -- If match found return the count

Hi All, I need to get the count of records in the file, if the passing parameter matches with the list of records in the file. Below is my example source file: Test1.dat 20120913 20120913 20120912 20120912 20120912 20120912 20120912 20120913 20120913 20120912 In my script I am... (5 Replies)
Discussion started by: bbc17484
5 Replies

9. UNIX for Dummies Questions & Answers

awk display the match and 2 lines after the match is found.

Hello, can someone help me how to find a word and 2 lines after it and then send the output to another file. For example, here is myfile1.txt. I want to search for "Error" and 2 lines below it and send it to myfile2.txt I tried with grep -A but it's not supported on my system. I tried with awk,... (4 Replies)
Discussion started by: eurouno
4 Replies

10. Shell Programming and Scripting

AWK break string into fields + pattern match

I am trying to break a string into separate fields and print the field that matches a pattern. I am using awk at the moment and have gotten this far: awk '{for(i=1;i<=NF;++i)print "\t" $i}' longstring This breaks the string into fields and prints each field on a separate line. I want to add... (2 Replies)
Discussion started by: Moxy
2 Replies
Login or Register to Ask a Question