Data counting


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data counting
# 1  
Old 01-26-2012
Data counting

I have a large tab delimited text file with 10 columns
for example
Code:
chrM  412  A  A  75   0  25  2     ..,AGAATt       II
chrM  413  G  G  72   0  25  4     ..t,,Aag     IIIH
chrM  414  C  C  75   0  25  4     ...a,..     III2
chrM  415  C  T  75  75  25  4     TTTt,,,ATC     III7

At column 9, I want to count the occurrence of each of the character. The characters I need to count for each line is .(point) , (comma), A/a, T/t, C/c, G/c.

So that I could get a file as below.
Code:
value at column 2  Count of "."  Count of ","   Count of "A/a"  Count of "G/g"   Count of "C/c"  Count of "T/t" 
412                         2              1                       3                   1                     0                   2
413                          2              2                       2                  1                     0                   1
414 
415
...
....
.....
for all the numbers in column 2 of the input file

Please let me know the best way to do this using awk or sed.
# 2  
Old 01-26-2012
Try:
Code:
perl -anle '$,=" ";print $F[1],map {$_?$_:"0"}($F[8]=~s/\.//g,$F[8]=~s/,//g,$F[8]=~s/a//ig,$F[8]=~s/g//ig,$F[8]=~s/c//ig,$F[8]=~s/t//ig)' file

Shorter:
Code:
perl -anle '$,=" ";$_=$F[8];print $F[1],map {$_?$_:"0"}(s/\.//g,s/,//g,s/a//ig,s/g//ig,s/c//ig,s/t//ig)' file


Last edited by bartus11; 01-26-2012 at 07:40 PM..
# 3  
Old 01-26-2012
Try this: (in Awk)

Code:
awk '{d=gsub("[.]","",$9);a=gsub("[Aa]","",$9);g=gsub("[Gg]","",$9);c=gsub("[Cc]","",$9);Co=gsub("[,]","",$9);t=gsub("[Tt]","",$9); print $2" "d " "a" "g" "c" "Co" "t }' file

# 4  
Old 01-26-2012
Something similar...
Code:
awk 'BEGIN{
        l=split(". , A G C T",arr," ")
        printf "Column 2 ";
        for(i=1;i<=l;i++)
                printf("Count of \"%s/%s\" ", arr[i], tolower(arr[i]))
        printf("\n")
        
}
{
        c=gsub(/\./,"",$9); printf("%-9s%-15s",$2,c);
        for(i=2;i<=l;i++){
                x=$9;c=gsub(arr[i],"",x);
                c+=gsub(tolower(arr[i]),"",x)
                printf("%-15s", c)
        } printf("\n") 
}'  infile

If you have more characters, just append it to the array in the BEGIN block.

--ahamed

Last edited by ahamed101; 01-26-2012 at 08:20 PM..
# 5  
Old 01-27-2012
Code:
awk 'BEGIN{print "Value\t.\t,\tA/a\tG/g\tC/c\tT/t"}
    { s=tolower($9);l=split(s,a,"");
      for (i=1;i<=l;i++) b[a[i]]++;
      print $2,b["."]+0,b[","]+0,b["a"]+0,b["g"]+0,b["c"]+0,b["t"]+0;
      delete a;delete b}' OFS="\t" infile

Value   .       ,       A/a     G/g     C/c     T/t
412     2       1       3       1       0       2
413     2       2       2       1       0       1
414     5       1       1       0       0       0
415     0       3       1       0       1       5

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK counting interval / histogram data

My data looks like this: frame phi psi 0 68.466774 -58.170494 1 75.128593 -51.646816 2 76.083946 -64.300102 3 77.578056 -76.464218 4 63.180199 -76.067680 5 77.203979 -58.560757 6 66.574913 -60.000214 7 73.218269 -70.978203 8 70.956879 -76.096558 9 65.538872 -76.716568... (19 Replies)
Discussion started by: chrisjorg
19 Replies

2. UNIX for Dummies Questions & Answers

counting?

Hi all, I promise this is my very last dumb question.. but how to you count how many unique names you have. My dataset is: >Bac1 afdsgrr >Bac4 egege >Bac8 dgrjh >Bac1 afdsgrr >Bac1 afdsgrr >Bac8 dgrjh What i want to know is that how many unique names there is, so the output would... (3 Replies)
Discussion started by: Iifa
3 Replies

3. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

4. Shell Programming and Scripting

Counting

Hi, The following output shows how many pmon process are started by users named : oracle or yoavb $ ps -ef |grep pmon |grep -v grep |grep -v ipmon oracle 11268 1 0 Sep 2 ? 36:00 ora_pmon_qerp oracle 17496 1 0 Oct 11 ? 8:58 ora_pmon_bcv oracle 15081 1 0 ... (5 Replies)
Discussion started by: yoavbe
5 Replies

5. Shell Programming and Scripting

Counting average data per hour

Hi i have log like this : Actually i will process the data become Anybody can help me ? (6 Replies)
Discussion started by: justbow
6 Replies

6. Shell Programming and Scripting

Extract data based on match against one column data from a long list data

My input file: data_5 Ali 422 2.00E-45 102/253 140/253 24 data_3 Abu 202 60.00E-45 12/23 140/23 28 data_1 Ahmad 256 7.00E-45 120/235 140/235 22 data_4 Aman 365 8.00E-45 15/65 140/65 20 data_10 Jones 869 9.00E-45 65/253 140/253 18... (12 Replies)
Discussion started by: patrick87
12 Replies

7. Shell Programming and Scripting

Counting

Hi, I want to count how many rows are in a file for a specific column. eg. K NM K NM K NM K JK K NM K JK K NM so the file is tab-delimited. I want to count how many rows are in column 2 and how many NMs there are. I used awk awk '{OFS="\t"}; {count++} {print i,... (3 Replies)
Discussion started by: phil_heath
3 Replies

8. UNIX for Dummies Questions & Answers

counting in unix

my script: count=0while test $count -lt 10do#do something for 0,1,2...9 count=$(($count+1))doneIt doesnt work. Can anyone tell me what im doing wrong?? thanks (11 Replies)
Discussion started by: JamieMurry
11 Replies

9. Shell Programming and Scripting

Counting with Awk

I need "awk solution" for simple counting! File looks like: STUDENT GRADE student1 A student2 A student3 B student4 A student5 B Desired Output: GRADE No.of Students A 3 B 2 Thanks for awking! (4 Replies)
Discussion started by: saint2006
4 Replies

10. Shell Programming and Scripting

Help with counting files please

Hi all. If I have a unix directory with multiple files, lets say, I have some with .dat extensions, some with .txt extensions, etc etc. How in a script would I provide a count of all the different file types (so, the different extensions, I guess) in the directory?? So if I had: test.dat... (6 Replies)
Discussion started by: gerard1
6 Replies
Login or Register to Ask a Question