Output counts of all matching strings lessthan a number using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Output counts of all matching strings lessthan a number using awk
# 1  
Old 06-11-2016
Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you Smilie.

file
Code:
chr5    77316500    77316628    chr5:77316500-77316628    AP3B1    62    152
chr5    77316500    77316628    chr5:77316500-77316628    AP3B1    63    153
chr16    14041460    14042214    chr16:14041460-14042214    ERCC4    333    19
chr16    14041460    14042214    chr16:14041460-14042214    ERCC4    334    19
chr16    14041460    14042214    chr16:14041460-14042214    ERCC4    335    19
chr15    31196856    31198110    chr15:31196856-31198110    FAN1    5    62
chr15    31196856    31198110    chr15:31196856-31198110    FAN1    6    62

desired output
Code:
AP3B1 0
ERCC4 3    
FAN1 0

awk with current output
Code:
awk '{sum[$5]+=$7 < 20; count[$5]++}  
    END{for(k in sum) printf "%s %.1f\n",  k, sum[k]/count[k]}' file
AP3B1 0.0
ERCC4 1.0
FAN1 0.0

# 2  
Old 06-11-2016
Code:
awk \
'count[$5]=="" { count[$5]=0 } 
       $7 < 20 { count[$5]++ } 
END{
              for(k in count) 
                 printf "%s %d\n",  k, count[k]
}' file

This User Gave Thanks to stomp For This Post:
# 3  
Old 06-11-2016
Thank you very much.

How does each $5 with a $7 greater than 20 get distinguished and counted from a $5 with a $7 less than 20. The code works great I am just trying to learn. Thank you Smilie.
# 4  
Old 06-11-2016
On line 2 it checks if $7 < 20. If it is it adds one to the number of counts < 20 of the corrsponding $5 value: count[$5].
This User Gave Thanks to stomp For This Post:
# 5  
Old 06-11-2016
Hello cmccabe,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR{if($7<20){B[$5]++};C[$5]=B[$5];next}  ($5 in C){printf("%s %01d\n",$5,C[$5]);delete C[$5];}' Input_file  Input_file

Output will be as follows.
Code:
AP3B1 0
ERCC4 3
FAN1 0

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 06-11-2016
Another modification to the first post:
Code:
awk '{count[$5]+=($7 < 20)}  
    END{for(k in count) printf "%s %.1f\n",  k, count[k]}' file

This User Gave Thanks to Scrutinizer For This Post:
# 7  
Old 06-12-2016
All the awk codes work great and thank you for the explanations, I really appreciate it Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

(g)awk: Matching strings from one file in another file between two strings

Hello all, I can get close to what I am looking for but cannot seem to hit it exactly and was wondering if I could get your help. I have the following sample from textfile with many thousands of lines: File 1 PS001,001 HLK PS002,004 L<G PS004,002 XNN PS004,006 BVX PS004,006 ZBX=... (7 Replies)
Discussion started by: jvoot
7 Replies

2. UNIX for Dummies Questions & Answers

1st time awk user strings not matching right....

So I was given a file,and I want to count how many occurrences happen with a specific string. I have two, that could have up to 3 different outcomes. Now my trouble I believe starts with this string, "news.cais.net" but why? as of now my output is this... accepted rejected ... (3 Replies)
Discussion started by: squidGreen
3 Replies

3. Shell Programming and Scripting

Counts not matching in file

I can not figure out why there are 56,548 unique entries in test.bed. However, perl and awk see only 56,543 and that # is what my analysis see's as well. What happened to the 5 missing? Thank you :). The file is attached as well. cmccabe@DTV-A5211QLM:~/Desktop/NGS/bed/bedtools$wc -l... (2 Replies)
Discussion started by: cmccabe
2 Replies

4. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Discussion started by: chrissycc
8 Replies

5. Shell Programming and Scripting

Shell script to search a pattern in a directory and output number of find counts

I need a Shell script which take two inputs which are 1) main directory where it has to search and 2) pattern to search within main directory all files (.c and .h files) It has to print number of pattern found in main directory & each sub directory. main dir --> Total pattern found = 5 |... (3 Replies)
Discussion started by: vivignesh
3 Replies

6. Shell Programming and Scripting

Compare strings between 2 arrays and print number in AWK

Hi to everyone, Please some help over here. Hi have array a with 6 elements and array b with 3 elements as shown inside BEGIN{} statement. I need help to get the correct sintax (the part in red) to compare if string from array b is in array a and print the number related for each match.... (3 Replies)
Discussion started by: Ophiuchus
3 Replies

7. Shell Programming and Scripting

awk if statement matching all lines starting w/ a number

Does awk have a syntax for a range of numbers or is this the best way? if ($1 >= 0 && $1 <= 9 ) (7 Replies)
Discussion started by: Arsenalman
7 Replies

8. Shell Programming and Scripting

AWK- delimiting the strings and matching the fields

Hello, I am newbie in awk. I have just started learning it. 1) I have input file which looks like: {4812 4009 1602 2756 306} {4814 4010 1603 2757 309} {8116 9362 10779 } {10779 10121 9193 10963 10908} {1602 2756 306 957 1025} {1603 2757 307} and so on..... 2) In output: a)... (10 Replies)
Discussion started by: kajolo
10 Replies

9. Shell Programming and Scripting

Varying number of awk search strings

I've created an awk script that handles a varying number of search strings handed to it as command line parameters ($1 $2 etc). There may be 1, or 2 or 3 or more. A simplified version of the script is: awk -v TYP="$1 $2 $3 $4 $5 $6" ' BEGIN { CTYP = split (TYP,TYPP," ") } ... (2 Replies)
Discussion started by: CarlosNC
2 Replies

10. UNIX for Dummies Questions & Answers

How to grep / zgrep to output ONLY the matching filename and line number?

Hi all, I am trying to zgrep / grep list of files so that it displays only the matching filename:line number and does not display the whole line, like: (echo "1.txt";echo "2.txt") | xargs zgrep -no STRING If I use -o option, it displays the matching STRING and if not used, displays the... (3 Replies)
Discussion started by: vvaidyan
3 Replies
Login or Register to Ask a Question