Code for exact match to count occurrence

04-08-2016

Registered User

24, 0

Join Date: Mar 2014

Last Activity: 2 November 2016, 11:22 PM EDT

Posts: 24

Thanks Given: 12

Thanked 0 Times in 0 Posts

Code for exact match to count occurrence

Hi all,

I have an input file as below. I would like to count the occurrence of pattern matching 8th field for each line.

Input:

Code:

field_01        field_02        field_03        field_04        field_05       field_06 field_07        field_08
TA      T       TA      T       TA      TA      TA      T
CAA     CAA     CAA     C       CAA     CAA     CAA     C
CG      C       C       CG      C       C       C       C
T       T       T       T       T       T       T       T
AT      AT      AT      AT      A       A       A       AT
AT      A       A       A       AT      AT      AT      A
CGA     CGA     C       CGA     C       CGA     CGA     C
GTTAC   GTTAC   G       GTTAC   GTTAC   GTTAC   G       G
CAT     CAT     C       CAT     C       C       C       C
AG      AG      AG      AG      AG      AG      AG      A
AG      A       A       A       AG      AG      AG      A

The command I used is:

Code:

awk 'BEGIN{print "count"} NR>1  {print gsub($8,"")-1}' input.txt

And, the output I got was as follows;

Code:

But the output I want is;

Code:

What did I do wrongly? I have tried to modify a few times. But all did not work.

Thanks in advance.

huiyee1

View Public Profile for huiyee1

Find all posts by huiyee1

04-08-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

The gsub statement is just replacing and counting single letters, so it does not look at fields and whether the pattern is the entire field...

Try a loop instead:

Code:

awk 'BEGIN{print "count"} NR>1{t=0; for(i=1; i<=7; i++) if($i==$8) t++; print t}'  file

or a more general version using the last field rather than the 8th field:

Code:

awk 'BEGIN{print "count"} NR>1{t=0; for(i=1; i<=NF-1; i++) if($i==$NF) t++; print t}' file

--
To use gsub() with GNU awk you could try using word boundaries:

Code:

awk 'BEGIN{print "count"} NR>1 {print gsub("\\<" $8 "\\>","")-1}' file

With regular awk, you would need something like this:

Code:

awk 'BEGIN{print "count"} NR>1 {print gsub("(^|[[:blank:]])" $8 "([[:blank:]]|$)","")-1}' file

But perhaps a simple loop would be a cleaner approach.

Last edited by Scrutinizer; 04-08-2016 at 01:49 AM..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

04-08-2016

Registered User

24, 0

Join Date: Mar 2014

Last Activity: 2 November 2016, 11:22 PM EDT

Posts: 24

Thanks Given: 12

Thanked 0 Times in 0 Posts

Dear Scrutinizer,

Thanks for your reply.

I am trying to count the occurrences matching the content of 8th field and 9th field respectively.

Input:

Code:

field_01	field_02	field_03	field_04	field_05	field_06	field_07	field_08	field_09
TA	T	TA	T	TA	TA	TA	T	TA
CAA	CAA	CAA	C	CAA	CAA	CAA	C	CAA
CG	C	C	CG	C	C	C	C	CG
T	T	T	T	T	T	T	T	TA
AT	AT	AT	AT	A	A	A	AT	A
AT	A	A	A	AT	AT	AT	A	AT
CGA	CGA	C	CGA	C	CGA	CGA	C	CGA
GTTAC	GTTAC	G	GTTAC	GTTAC	GTTAC	G	G	GTTAC
CAT	CAT	C	CAT	C	C	C	C	CAT
AG	AG	AG	AG	AG	AG	AG	A	AG
AG	A	A	A	AG	AG	AG	A	AG

I used this command:

Code:

 awk 'BEGIN{print "count1" "\t" "count2"} NR>1 
{t=0; for(i=1; i<=7; i++) 
if($i==$8) t++; 
else if($i==$9) t++; printf("%s\t",t)} print ""}' input.txt

It does not work. There are error;

Code:

awk: cmd. line:1: BEGIN{print "count1" "\t" "count2"} NR>1 {t=0; for(i=1; i<=7; i++) if($i==$8) t++; else if($i==$9) t++; printf("%s\t",t)} print ""}
awk: cmd. line:1:                                                                                                                           ^ syntax error
awk: cmd. line:1: BEGIN{print "count1" "\t" "count2"} NR>1 {t=0; for(i=1; i<=7; i++) if($i==$8) t++; else if($i==$9) t++; printf("%s\t",t)} print ""}
awk: cmd. line:1:

The output I want is;

Code:

count1      count2
2       5
1       6
5       2
7       0
4       3
3       4
2       5
2       5
4       3
0       7
3       4

What did I do wrong? Have been trying to find out the error.

huiyee1

View Public Profile for huiyee1

Find all posts by huiyee1

04-08-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Hi,

You need to use a different variable..
The error comes from an extra } that should not be there (there should be a semicolon or a newline):printf("%s\t",t)} -> printf("%s\t",t);
The opening brace is in the wrong place:

Code:
NR>1 
{t=0

should be:
Code:
NR>1 {
  t=0

Try:

Code:

awk '
  BEGIN {
    OFS="\t"
    print "count1, count2"
  } 
  NR>1 {
    t=u=0
    for(i=1; i<=7; i++) {
      if($i==$8) t++
      if($i==$9) u++
    } 
    print t,u 
  }
' file

Last edited by Scrutinizer; 04-08-2016 at 04:04 AM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

UNIX for Dummies Questions & Answers

Code for exact match to count occurrence

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed print from last occurrence match until the end of last occurrence match

Discussion started by: Jyotshna

2. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

Discussion started by: cmccabe

3. Shell Programming and Scripting

Egrep - Only Match First Occurrence

Discussion started by: sudo

4. UNIX for Dummies Questions & Answers

[Solved] Replace first occurrence after match

Discussion started by: boaz733

5. Shell Programming and Scripting

How to count the occurrence of a number?

Discussion started by: Shenbaga.d

6. Shell Programming and Scripting

SED to replace exact match, not first occurrence.

Discussion started by: andylbh

7. Shell Programming and Scripting

How to find lines that match exact input and count?

Discussion started by: cooprocks123e

8. Shell Programming and Scripting

exact string match ; search and print match

Discussion started by: bash_in_my_head

9. Shell Programming and Scripting

count a occurrence

Discussion started by: soemac

10. Shell Programming and Scripting

Print last occurrence if first field match

Discussion started by: Raynon