Code for exact match to count occurrence


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Code for exact match to count occurrence
# 1  
Old 04-08-2016
Code for exact match to count occurrence

Hi all,

I have an input file as below. I would like to count the occurrence of pattern matching 8th field for each line.

Input:
Code:
field_01        field_02        field_03        field_04        field_05       field_06 field_07        field_08
TA      T       TA      T       TA      TA      TA      T
CAA     CAA     CAA     C       CAA     CAA     CAA     C
CG      C       C       CG      C       C       C       C
T       T       T       T       T       T       T       T
AT      AT      AT      AT      A       A       A       AT
AT      A       A       A       AT      AT      AT      A
CGA     CGA     C       CGA     C       CGA     CGA     C
GTTAC   GTTAC   G       GTTAC   GTTAC   GTTAC   G       G
CAT     CAT     C       CAT     C       C       C       C
AG      AG      AG      AG      AG      AG      AG      A
AG      A       A       A       AG      AG      AG      A

The command I used is:
Code:
awk 'BEGIN{print "count"} NR>1  {print gsub($8,"")-1}' input.txt

And, the output I got was as follows;
Code:
count
7
7
7
7
4
7
7
7
7
7
7

But the output I want is;
Code:
count
2
1
5
7
4
3
2
2
4
0
3

What did I do wrongly? I have tried to modify a few times. But all did not work.

Thanks in advance.
# 2  
Old 04-08-2016
The gsub statement is just replacing and counting single letters, so it does not look at fields and whether the pattern is the entire field...

Try a loop instead:
Code:
awk 'BEGIN{print "count"} NR>1{t=0; for(i=1; i<=7; i++) if($i==$8) t++; print t}'  file

or a more general version using the last field rather than the 8th field:
Code:
awk 'BEGIN{print "count"} NR>1{t=0; for(i=1; i<=NF-1; i++) if($i==$NF) t++; print t}' file



--
To use gsub() with GNU awk you could try using word boundaries:
Code:
awk 'BEGIN{print "count"} NR>1 {print gsub("\\<" $8 "\\>","")-1}' file

With regular awk, you would need something like this:
Code:
awk 'BEGIN{print "count"} NR>1 {print gsub("(^|[[:blank:]])" $8 "([[:blank:]]|$)","")-1}' file

But perhaps a simple loop would be a cleaner approach.

Last edited by Scrutinizer; 04-08-2016 at 01:49 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 04-08-2016
Dear Scrutinizer,

Thanks for your reply.

I am trying to count the occurrences matching the content of 8th field and 9th field respectively.

Input:
Code:
field_01	field_02	field_03	field_04	field_05	field_06	field_07	field_08	field_09
TA	T	TA	T	TA	TA	TA	T	TA
CAA	CAA	CAA	C	CAA	CAA	CAA	C	CAA
CG	C	C	CG	C	C	C	C	CG
T	T	T	T	T	T	T	T	TA
AT	AT	AT	AT	A	A	A	AT	A
AT	A	A	A	AT	AT	AT	A	AT
CGA	CGA	C	CGA	C	CGA	CGA	C	CGA
GTTAC	GTTAC	G	GTTAC	GTTAC	GTTAC	G	G	GTTAC
CAT	CAT	C	CAT	C	C	C	C	CAT
AG	AG	AG	AG	AG	AG	AG	A	AG
AG	A	A	A	AG	AG	AG	A	AG

I used this command:
Code:
 awk 'BEGIN{print "count1" "\t" "count2"} NR>1 
{t=0; for(i=1; i<=7; i++) 
if($i==$8) t++; 
else if($i==$9) t++; printf("%s\t",t)} print ""}' input.txt

It does not work. There are error;
Code:
awk: cmd. line:1: BEGIN{print "count1" "\t" "count2"} NR>1 {t=0; for(i=1; i<=7; i++) if($i==$8) t++; else if($i==$9) t++; printf("%s\t",t)} print ""}
awk: cmd. line:1:                                                                                                                           ^ syntax error
awk: cmd. line:1: BEGIN{print "count1" "\t" "count2"} NR>1 {t=0; for(i=1; i<=7; i++) if($i==$8) t++; else if($i==$9) t++; printf("%s\t",t)} print ""}
awk: cmd. line:1:

The output I want is;
Code:
count1      count2
2       5
1       6
5       2
7       0
4       3
3       4
2       5
2       5
4       3
0       7
3       4

What did I do wrong? Have been trying to find out the error.
# 4  
Old 04-08-2016
Hi,
  • You need to use a different variable..
  • The error comes from an extra } that should not be there (there should be a semicolon or a newline):printf("%s\t",t)} -> printf("%s\t",t);
  • The opening brace is in the wrong place:
Code:
NR>1 
{t=0

should be:
Code:
NR>1 {
  t=0

Try:
Code:
awk '
  BEGIN {
    OFS="\t"
    print "count1, count2"
  } 
  NR>1 {
    t=u=0
    for(i=1; i<=7; i++) {
      if($i==$8) t++
      if($i==$9) u++
    } 
    print t,u 
  }
' file


Last edited by Scrutinizer; 04-08-2016 at 04:04 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed print from last occurrence match until the end of last occurrence match

Hi, i have file file.txt with data like: START 03:11:30 a 03:11:40 b END START 03:13:30 eee 03:13:35 fff END jjjjjjjjjjjjjjjjjjjjj START 03:14:30 eee 03:15:30 fff END ggggggggggg iiiiiiiiiiiiiiiiiiiiiiiii I want the below output START (13 Replies)
Discussion started by: Jyotshna
13 Replies

2. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

Egrep - Only Match First Occurrence

echo 'String#1 and String#2' | egrep -o -m 1 'String#.{1}' String#1 String#2 I'm trying to just match the first occurrence of 'String#' + 1 character. I thought the "-m 1" switch would do that for me. Instead I get both occurrences. Can somebody provide some insight? Thanks! (5 Replies)
Discussion started by: sudo
5 Replies

4. UNIX for Dummies Questions & Answers

[Solved] Replace first occurrence after match

hey guys, i have been trying to work this thing out with sed with no luck :confused: i m looking for a way to replace only the first occurrence after a match for example : Cat Realized what you gotta do Dog Realized what you gotta do Sheep Realized what you gotta do Wolf Realized... (6 Replies)
Discussion started by: boaz733
6 Replies

5. Shell Programming and Scripting

How to count the occurrence of a number?

Hi, I have a file which contained a set of numbers like Col1 col2 col3 col4 1 sa 13 0 2 sb 14 0 3 sc 15 9 4 sd 16 -9 5 sd 20 -2 6 sd 20 4 Here in last column I need to count the zeros, positive values and negative values, please help me to do that. (2 Replies)
Discussion started by: Shenbaga.d
2 Replies

6. Shell Programming and Scripting

SED to replace exact match, not first occurrence.

Lets say I have file.txt: (Product:Price:QuantityAvailable) (: as delimiter) Chocolate:5:5 Banana:33:3 I am doing a edit/update function. I want to change the Quantity Available, so I tried using the SED command to replace 5, but my Price which is also 5 is changed instead. (for the Banana... (13 Replies)
Discussion started by: andylbh
13 Replies

7. Shell Programming and Scripting

How to find lines that match exact input and count?

I am writing a package manager in BASH and I would like a small snippet of code that finds lines that match exact input and count them. For example, my file contains: xyz xyz-lib2.0+ xyz-lib2.0 xyz-lib1.5 and "grep -c xyz" returns 4. The current function is: # $1 is the package name.... (3 Replies)
Discussion started by: cooprocks123e
3 Replies

8. Shell Programming and Scripting

exact string match ; search and print match

I am trying to match a pattern exactly in a shell script. I have tried two methods awk '/\<mpath${CURR_MP}\>/{print $1 $2}' multipath perl -ne '/\bmpath${CURR_MP}\b/ and print' /var/tmp/multipath Both these methods require that I use the escape character. I am guessing that is why... (8 Replies)
Discussion started by: bash_in_my_head
8 Replies

9. Shell Programming and Scripting

count a occurrence

I am looking to get a output of "2 apple found" from the awk command below. black:34104 tomonorisoejima$ cat tomo apple apple black:34104 tomonorisoejima$ awk '/apple/ {count++}END{print count " apple found"}' tomo 1 apple found black:34104 tomonorisoejima$ (5 Replies)
Discussion started by: soemac
5 Replies

10. Shell Programming and Scripting

Print last occurrence if first field match

Hi All, I have an input below. If the term in the 1st column is equal, print the last row which 1st column is equal.In the below example, it's " 0001 k= 27 " and " 0004 k= 6 " (depicted in bold). Those terms in 1st column which are not repetitive are to be printed as well. Can any body help me... (9 Replies)
Discussion started by: Raynon
9 Replies
Login or Register to Ask a Question