Different counts between programs and commands


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Different counts between programs and commands
# 1  
Old 01-18-2016
Different counts between programs and commands

In the attached file if I do a count for "aaaaaaaaaaaa" in either notepad++ or excel I get 118,456.

However, when I do either
Code:
 grep -o aaaaaaaaaaaa 12A.txt | wc -w

or
Code:
awk '{ 
     for (i=1;i<=NF;i++)
         if ( $i == "aaaaaaaaaaaa")
         c++
     }
END{
print c}' 12A.txt

I get 116,441. I'm not sure which is right or if they is a better way? Thank you Smilie.

The search string varies (aaaaaaaaaaaa) but the input format (file to count) is always the same.

example of file
Code:
>hg19_refGene_NM_000016 range=chr1:76190032-76229363 5'pad=0 3'pad=0 strand=+ repeatMasking=none
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
>hg19_refGene_NM_000028 range=chr1:100316045-100389579 5'pad=0 3'pad=0 strand=+ repeatMasking=none
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa
>hg19_refGene_NM_000029 range=chr1:230838272-230850336 5'pad=0 3'pad=0 strand=- repeatMasking=none
>hg19_refGene_NM_000036 range=chr1:115215720-115238239 5'pad=0 3'pad=0 strand=- repeatMasking=none
aaaaaaaaaaaa
aaaaaaaaaaaa
aaaaaaaaaaaa

edit:

there are 2015 AAAAAAAAAAAA
and 116,441 aaaaaaaaaaaa

for a total of 118,456 the awk and the grep take the case into account where as the programs do not.

I use:
Code:
perl -076 -nE 'chomp; s/(.+)// && say qq{>$1}; s/\s//g; say $1 while /(a{12})/gi' sequences.txt > 12A.txt

to make the file that is counted. Since that is case insensitive I guess I need to use a command that will count no matter the case. Maybe I can | into a count? Thank you Smilie

Last edited by cmccabe; 01-18-2016 at 03:54 PM.. Reason: added edit
# 2  
Old 01-18-2016
Hi,
excel and notepad++ are insensitive case by default and your file contains 'AAAAAAAAAAAA'.

Regards.
This User Gave Thanks to disedorgue For This Post:
# 3  
Old 01-18-2016
Hello cmccabe,

You was close in your awk script, could you please add IGNORECASE and set it to TRUE as follows, hope this helps you to get exact count.
Code:
awk 'BEGIN
     {
	IGNORECASE = 1;
     }
     { 
     for (i=1;i<=NF;i++)
         if ( $i == "aaaaaaaaaaaa")
         c++
     }
END{
print c}' 12A.txt

Also IGNORECASE should work in GNU awk, if you doesn't have that then you could so following.
Code:
awk '
     { 
     for (i=1;i<=NF;i++)
         if ( tolower($i) == "aaaaaaaaaaaa")
         c++
     }
END{
print c}' 12A.txt

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 4  
Old 01-18-2016
Thank you both Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Difference between inbuilt suid programs and user defined root suid programs under bash shell?

Hey guys, Suppose i run passwd via bash shell. It is a suid program, which temporarily runs as root(owner) and modifies the user entries. However, when i write a C file and give 4755 permission and root ownership to the 'a.out' file , it doesn't run as root in bash shell. I verified this by... (2 Replies)
Discussion started by: syncmaster
2 Replies

2. UNIX for Dummies Questions & Answers

Get counts from List

This is very easy , but I`m struggling .. please help modify my script, I want to count the number of h and n , from the second column group by the first. The second column is binary, can only have h and n. a h a h a n a n a h b h b h b h b h b h c n c h c h c h c h (2 Replies)
Discussion started by: jianp83
2 Replies

3. Shell Programming and Scripting

counts for every 1000 interval

Hi, I have a file with 4 million rows. Each row has certain number ranging between 1 to 30733090. What I want is to count the rows between each 1000 intervals. 1-1000 4000 1001-2000 2469 ... ... ... ... last 1000 interval Thanks, (7 Replies)
Discussion started by: Diya123
7 Replies

4. Shell Programming and Scripting

Get counts for multiple files

How do get the counts by excluding header and tailer. wc -l customer_data*.0826 31 customer_data_1.0826 57 customer_data_2.0826 456 customer_data_3.0826 668 customer_data_4.0826 789 customer_data_5.0826 2344 customer_data_6.0826 13457 customer_data_7.0826... (6 Replies)
Discussion started by: zooby
6 Replies

5. UNIX for Dummies Questions & Answers

Are programs like sys_open( ) ,sys_read( ) et al examples of system level programs ?

Are the programs written on schedulers ,thread library , process management, memory management, et al called systems programs ? How are they different from the programs that implement functions like open() , printf() , scanf() , read() .. they have a prefix sys_open, sys_close, sys_read etc , right... (1 Reply)
Discussion started by: vishwamitra
1 Replies

6. UNIX for Dummies Questions & Answers

counts

To start I have a table that has ticketholders. Each ticket holder has a unique number and each ticket holder is associated to a so called household number. You can have multiple guests w/i a household. I would like to create 3 flags (form a, for a household that has 1-4 gst) form b 5-8 gsts... (3 Replies)
Discussion started by: sbr262
3 Replies

7. Programming

Separating commands/programs with ;

Hi i have encountered a problem and i have tried many different things but my brain just has some limitations lol well anyways i was trying to make this program work down below so i can process multiple commands just by separating them with ;. I would apeciate if someone could just make it work kuz... (2 Replies)
Discussion started by: dush_19
2 Replies

8. Shell Programming and Scripting

i need c programs for linux commands

hi i am doing a small project on linux shell so if anyone can provide me some programs which can ipmlement commands like copy,date,calendar ....thank u (1 Reply)
Discussion started by: lit_joel
1 Replies

9. UNIX for Dummies Questions & Answers

counts

How can i do a simple record count in my shell script? i just want to count the number of records i receive from a specific file. (11 Replies)
Discussion started by: k@ssidy
11 Replies
Login or Register to Ask a Question