Unix/Linux Go Back    

Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

awk to print array that occurs the most with matching value in another field

Shell Programming and Scripting


Thread Tools Search this Thread Display Modes
Old Unix and Linux 06-15-2017   -   Original Discussion by cmccabe
cmccabe cmccabe is offline
Registered User
Join Date: Nov 2013
Last Activity: 17 November 2017, 8:12 AM EST
Location: Chicago
Posts: 1,188
Thanks: 713
Thanked 14 Times in 13 Posts
awk to print array that occurs the most with matching value in another field

In the below awk I am splitting $7 on the : and then counting each line or NM_xxxx. If the $1 value is the same for each line then print the $7 that occurs the most with the matching $1 value. The awk seems close but I am not sure what is going on. I included a description as well as to what I think is going on. Thank you Linux.


awk -F'[\t:]' '{count[$1 "\t" $7]++} END {for (word in count) print word, count[word]}' file


awk -F'[\t:]'   ---- regex for FS `\t` and split `:`
'{count[$7]++}  ---- count each `line in $7` and read into array count
{for (word in count)   ---- start loop using array count and read each line in array word
print $1, word, count[word]}    ---- print desired fields `$1, [word] (I only printed count[word] to confirm, it is not needed)


A2M 2   18171   33210   coding  na  NM_000014.5:c.2998A>G   c.2998A>G
A2M 2   18172   33211   coding  na  NM_000014.5:c.2915G>A   c.2915G>A
A2M 2   18173   33212   coding  na  NM_000014.4:c.2125+1_2126-1del  c.2125+1_2126-1del
A2M 2   18174   33213   coding  na  NM_000014.5:c.2111G>A   c.2111G>A
A2M 2   402328  390084  coding  na  NM_000014.5:c.2126-6_2126-2delCCATA
A4GALT  53947   2692    17731   coding  na  NM_017436.5:c.548T>A    c.548T>A
A4GALT  53947   2693    17732   coding  na  NM_017436.5:c.752C>T    c.752C>T
A4GALT  53947   2694    17733   coding  na  NM_017436.6:c.783G>A    c.783G>A
A4GALT  53947   2695    17734   coding  na  NM_017436.6:c.560G>A    c.560G>A
A4GALT  53947   2696    17735   coding  na  NM_017436.6:c.240_242delCTT
A4GALT  53947   2697    17736   coding  na  NM_017436.6:c.1029dupC  c.1029dupC
A4GALT  53947   39437   48036   coding  na  NM_017436.6:c.631C>G    c.631C>G

current output

A2M	NM_000014.4 1
A2M	NM_000014.5 4
A4GALT	NM_017436.5 2
A4GALT	NM_017436.6 5

desired output

A2M NM_000014.5
A4GALT NM_017436.6

Last edited by cmccabe; 06-15-2017 at 11:30 AM.. Reason: fixed format
Sponsored Links
Old Unix and Linux 06-15-2017   -   Original Discussion by cmccabe
rdrtx1 rdrtx1 is offline
Registered User
Join Date: Sep 2012
Last Activity: 22 November 2017, 8:09 PM EST
Location: Houston, Texas, USA
Posts: 983
Thanks: 0
Thanked 338 Times in 319 Posts

awk '
{if (++c[$7 ":" $8] > c[$7]) {c[$7]=c[$7 ":" $8] ; o[$7]=$1 " " $7 "." $8}}
   for (i in o) print o[i];
' FS="[\t.:]" infile

Last edited by rdrtx1; 06-15-2017 at 03:12 PM..
Sponsored Links

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk to combine all matching fields in input but only print line with largest value in specific field cmccabe Shell Programming and Scripting 0 12-29-2016 03:40 PM
Compare file1 for matching line in file2 and print the difference in matching lines RasB15 Shell Programming and Scripting 2 11-07-2013 10:04 AM
Perl - use search keywords from array and search a file and print 3rd field when matched chidori Shell Programming and Scripting 11 12-14-2012 01:37 PM
Replace field when only "-" occurs on a random basis akshaykr2 Shell Programming and Scripting 4 12-04-2009 12:08 PM
Print matching field using awk deepakgang Shell Programming and Scripting 6 07-29-2009 08:03 AM

All times are GMT -4. The time now is 02:36 PM.