Unix/Linux Go Back    

Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

awk to print array that occurs the most with matching value in another field

Shell Programming and Scripting


Thread Tools Search this Thread Display Modes
Old Unix and Linux 06-15-2017
cmccabe cmccabe is offline
Registered User
Join Date: Nov 2013
Last Activity: 17 October 2017, 1:27 PM EDT
Location: Chicago
Posts: 1,182
Thanks: 709
Thanked 14 Times in 13 Posts
awk to print array that occurs the most with matching value in another field

In the below awk I am splitting $7 on the : and then counting each line or NM_xxxx. If the $1 value is the same for each line then print the $7 that occurs the most with the matching $1 value. The awk seems close but I am not sure what is going on. I included a description as well as to what I think is going on. Thank you Linux.


awk -F'[\t:]' '{count[$1 "\t" $7]++} END {for (word in count) print word, count[word]}' file


awk -F'[\t:]'   ---- regex for FS `\t` and split `:`
'{count[$7]++}  ---- count each `line in $7` and read into array count
{for (word in count)   ---- start loop using array count and read each line in array word
print $1, word, count[word]}    ---- print desired fields `$1, [word] (I only printed count[word] to confirm, it is not needed)


A2M 2   18171   33210   coding  na  NM_000014.5:c.2998A>G   c.2998A>G
A2M 2   18172   33211   coding  na  NM_000014.5:c.2915G>A   c.2915G>A
A2M 2   18173   33212   coding  na  NM_000014.4:c.2125+1_2126-1del  c.2125+1_2126-1del
A2M 2   18174   33213   coding  na  NM_000014.5:c.2111G>A   c.2111G>A
A2M 2   402328  390084  coding  na  NM_000014.5:c.2126-6_2126-2delCCATA
A4GALT  53947   2692    17731   coding  na  NM_017436.5:c.548T>A    c.548T>A
A4GALT  53947   2693    17732   coding  na  NM_017436.5:c.752C>T    c.752C>T
A4GALT  53947   2694    17733   coding  na  NM_017436.6:c.783G>A    c.783G>A
A4GALT  53947   2695    17734   coding  na  NM_017436.6:c.560G>A    c.560G>A
A4GALT  53947   2696    17735   coding  na  NM_017436.6:c.240_242delCTT
A4GALT  53947   2697    17736   coding  na  NM_017436.6:c.1029dupC  c.1029dupC
A4GALT  53947   39437   48036   coding  na  NM_017436.6:c.631C>G    c.631C>G

current output

A2M	NM_000014.4 1
A2M	NM_000014.5 4
A4GALT	NM_017436.5 2
A4GALT	NM_017436.6 5

desired output

A2M NM_000014.5
A4GALT NM_017436.6

Last edited by cmccabe; 06-15-2017 at 10:30 AM.. Reason: fixed format
Sponsored Links
Old Unix and Linux 06-15-2017
rdrtx1 rdrtx1 is offline
Registered User
Join Date: Sep 2012
Last Activity: 23 October 2017, 6:55 PM EDT
Location: Houston, Texas, USA
Posts: 971
Thanks: 0
Thanked 332 Times in 314 Posts

awk '
{if (++c[$7 ":" $8] > c[$7]) {c[$7]=c[$7 ":" $8] ; o[$7]=$1 " " $7 "." $8}}
   for (i in o) print o[i];
' FS="[\t.:]" infile

Last edited by rdrtx1; 06-15-2017 at 02:12 PM..
Sponsored Links

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk to combine all matching fields in input but only print line with largest value in specific field cmccabe Shell Programming and Scripting 0 12-29-2016 02:40 PM
Compare file1 for matching line in file2 and print the difference in matching lines RasB15 Shell Programming and Scripting 2 11-07-2013 09:04 AM
Perl - use search keywords from array and search a file and print 3rd field when matched chidori Shell Programming and Scripting 11 12-14-2012 12:37 PM
Replace field when only "-" occurs on a random basis akshaykr2 Shell Programming and Scripting 4 12-04-2009 11:08 AM
Print matching field using awk deepakgang Shell Programming and Scripting 6 07-29-2009 07:03 AM

All times are GMT -4. The time now is 01:50 AM.