awk to print array that occurs the most with matching value in another field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to print array that occurs the most with matching value in another field
# 1  
Old 06-15-2017
awk to print array that occurs the most with matching value in another field

In the below awk I am splitting $7 on the : and then counting each line or NM_xxxx. If the $1 value is the same for each line then print the $7 that occurs the most with the matching $1 value. The awk seems close but I am not sure what is going on. I included a description as well as to what I think is going on. Thank you Smilie.

awk
Code:
awk -F'[\t:]' '{count[$1 "\t" $7]++} END {for (word in count) print word, count[word]}' file

description
Code:
awk -F'[\t:]'   ---- regex for FS `\t` and split `:`
'{count[$7]++}  ---- count each `line in $7` and read into array count
{for (word in count)   ---- start loop using array count and read each line in array word
print $1, word, count[word]}    ---- print desired fields `$1, [word] (I only printed count[word] to confirm, it is not needed)

file
Code:
A2M 2   18171   33210   coding  na  NM_000014.5:c.2998A>G   c.2998A>G
A2M 2   18172   33211   coding  na  NM_000014.5:c.2915G>A   c.2915G>A
A2M 2   18173   33212   coding  na  NM_000014.4:c.2125+1_2126-1del  c.2125+1_2126-1del
A2M 2   18174   33213   coding  na  NM_000014.5:c.2111G>A   c.2111G>A
A2M 2   402328  390084  coding  na  NM_000014.5:c.2126-6_2126-2delCCATA
A4GALT  53947   2692    17731   coding  na  NM_017436.5:c.548T>A    c.548T>A
A4GALT  53947   2693    17732   coding  na  NM_017436.5:c.752C>T    c.752C>T
A4GALT  53947   2694    17733   coding  na  NM_017436.6:c.783G>A    c.783G>A
A4GALT  53947   2695    17734   coding  na  NM_017436.6:c.560G>A    c.560G>A
A4GALT  53947   2696    17735   coding  na  NM_017436.6:c.240_242delCTT
A4GALT  53947   2697    17736   coding  na  NM_017436.6:c.1029dupC  c.1029dupC
A4GALT  53947   39437   48036   coding  na  NM_017436.6:c.631C>G    c.631C>G

current output
Code:
A2M	NM_000014.4 1
A2M	NM_000014.5 4
	 3
A4GALT	NM_017436.5 2
A4GALT	NM_017436.6 5

desired output
Code:
A2M NM_000014.5
A4GALT NM_017436.6


Last edited by cmccabe; 06-15-2017 at 11:30 AM.. Reason: fixed format
# 2  
Old 06-15-2017
Code:
awk '
{if (++c[$7 ":" $8] > c[$7]) {c[$7]=c[$7 ":" $8] ; o[$7]=$1 " " $7 "." $8}}
END {
   for (i in o) print o[i];
}
' FS="[\t.:]" infile


Last edited by rdrtx1; 06-15-2017 at 03:12 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to average field if matching string in another

In the awk below I am trying to get the average of the sum of $7 if the string in $4 matches in the line below it. The --- in the desired out is not needed, it is just to illustrate the calculation. The awk executes and produces the current out. I am not sure why the middle line is skipped and the... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

awk to add text to matching pattern in field

In the awk I am trying to add :p.=? to the end of each $9 that matches the pattern NM_. The below executes andis close but I can not seem to figure out why the :p.=? repeats in the split as in the green in the current output. I have added comments as well. Thank you :). file ... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

awk to update field using matching value in file1 and substring in field in file2

In the awk below I am trying to set/update the value of $14 in file2 in bold, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;... (2 Replies)
Discussion started by: cmccabe
2 Replies

4. Shell Programming and Scripting

awk to combine all matching fields in input but only print line with largest value in specific field

In the below I am trying to use awk to match all the $13 values in input, which is tab-delimited, that are in $1 of gene which is just a single column of text. However only the line with the greatest $9 value in input needs to be printed. So in the example below all the MECP2 and LTBP1... (0 Replies)
Discussion started by: cmccabe
0 Replies

5. Shell Programming and Scripting

awk Array matching and replacing from master file.

I have an awk related question that I was hoping you all could help with. I am given 2 input files named OLDFILE and NEWFILE, and a Master file named MASTERFILE. They can be seen below. OLDFILE: a a a a a f g 4 5 7 8 1 2 3 (1 Reply)
Discussion started by: tiktak292
1 Replies

6. Shell Programming and Scripting

awk help - matching a field with certail values

Hello there, I have a file with few fields separated by ":". I wrote a below awk to manipulate this file: awk 'BEGIN { FS=OFS=":" }\ NR != 1 && $2 !~ /^98/ && $8 !~ /^6/{print $0}' $in_file > $out_file What I wanted was that if $8 field contains any of the values - 6100, 6110, 6200 -... (2 Replies)
Discussion started by: juzz4fun
2 Replies

7. Shell Programming and Scripting

Printing entire field, if at least one row is matching by AWK

Dear all, I have been trying to print an entire field, if the first line of the field is matching. For example, my input looks something like this. aaa ddd zzz 123 987 126 24 0.650 985 354 9864 0.32 0.333 4324 000 I am looking for a pattern,... (5 Replies)
Discussion started by: Chulamakuri
5 Replies

8. Shell Programming and Scripting

AWK : Add Fields of lines with matching field

Dear All, I would like to add values of a field, if the lines match in a certain field. Then I would like to divide the sum though the number of lines that have a matched field. This is the Input: Input: Test1 5 Test1 10 Test2 2 Test2 5 Test2 13 Test3 4 Output: Test1 7.5 Test1 7.5... (6 Replies)
Discussion started by: DerSeb
6 Replies

9. UNIX for Dummies Questions & Answers

[awk] print from field n to field x

Hi, I'm trying to print every line from first field to the fourth from a file containing more. $ cat input a b c d e f g a b c d e f gI'm trying awk '{for (i=1; i <= NF-3; i++) print $i}' awkTest.datbut it printsa b c d a b c dSo, I easily guess I'm wrong. :) Of course, I want:a b... (5 Replies)
Discussion started by: daPeach
5 Replies

10. Shell Programming and Scripting

Print matching field using awk

Hi All, I have a string like below: str="Hold=True Map=False 'This will map the data' Run=Yes Modify=False" I want to print the field Run=Yes and retrive the value "Yes". I cannot use simple awk command because the position of the "Run" will be different at different times. Is there a way... (6 Replies)
Discussion started by: deepakgang
6 Replies
Login or Register to Ask a Question