Count specific characters at specific column positions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Count specific characters at specific column positions
# 1  
Old 12-04-2012
Count specific characters at specific column positions

Hi all, I need help.

I have an input text file (input.txt) like this:

Code:
21	GTGCAACACCGTCTTGAGAGG	50
21	GACCGAGACAGAATGAAAATC	73
21	CGGGTCTGTAGTAGCAAACGC	108
21	CGAAAAATGAACCCCTTTATC	220
21	CGTGATCCTGTTGAAGGGTCG	259

Now I need to count A/T/G/C numbers at each character location in column 2, in this case is always 21 characters, but can be variable.

Output (output.txt) will need to be:

Code:
A	0	1	1	1	3	3	1	2	0	3	1	1	2	1	1	2	3	2	3	0	0
T	0	0	1	0	1	1	1	1	2	0	1	2	0	1	0	1	1	1	1	2	0
G	2	3	2	2	1	0	1	1	1	1	3	0	1	1	1	2	1	2	0	2	2
C	3	0	1	2	0	1	2	1	2	1	0	1	2	1	2	0	0	0	1	1	3

I can do this in Excel, but my file is way bigger than Excel can handle.

Thanks!

Last edited by Scott; 12-04-2012 at 01:07 PM.. Reason: Please use code tags
# 2  
Old 12-04-2012
awk -f thie.awk myFile
where thie.awk is:
Code:
BEGIN {
  if (!chars) chars="A T G C"
  nchars=split(chars, charsA, FS)
}
{
  width=length($2)
  for(i=1;i<=width;i++)
   arr[substr($2,i,1),i]++
}
END {
  for(i=1;i<=nchars;i++) {
    printf("%s", charsA[i])
    for(j=1;j<=width;j++)
      printf("%s%d%s", OFS, arr[charsA[i],j], (j==width)?ORS:"")
  }
}

This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 12-04-2012
Quote:
Originally Posted by vgersh99
awk -f thie.awk myFile
where thie.awk is:
Code:
BEGIN {
  if (!chars) chars="A T G C"
  nchars=split(chars, charsA, FS)
}
{
  width=length($2)
  for(i=1;i<=width;i++)
   arr[substr($2,i,1),i]++
}
END {
  for(i=1;i<=nchars;i++) {
    printf("%s", charsA[i])
    for(j=1;j<=width;j++)
      printf("%s%d%s", OFS, arr[charsA[i],j], (j==width)?ORS:"")
  }
}

Hi vgersh99,

You solved my problem.

I tested your codes and compared them to my Excel count with a file of 800K rows. Both had same output.

Really appreciated your help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search and replace specific positions of specific lines

Hi, I have a file with hundreds of lines. I want to search for particular lines starting with 4000, search and replace the 137-139 position characters; which will be '000', with '036'. Can all of this be done without opening a temp file and then moving that temp file to the original file name. ... (7 Replies)
Discussion started by: dsid
7 Replies

2. Shell Programming and Scripting

Overwrite specific column in xml file with the specific column from adjacent line

I have an xml file dumped from rrd file, that I want to "patch" so the xml file doesn't contain any blank hole in the resulting graph of the rrd file. Here is the file. <!-- 2015-10-12 14:00:00 WIB / 1444633200 --> <row><v> 4.0419731265e+07 </v><v> 4.5045912770e+06... (2 Replies)
Discussion started by: rk4k
2 Replies

3. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

4. Shell Programming and Scripting

Count specific column values

Hi all: quick question! I have the following data that resembles some thing like this: i am tired tired am i what is up hello people cool I want to count (or at least isolate) all of the unique elements in the 2nd column. I have tried this: cut -f 2 | uniq 'input' which does... (3 Replies)
Discussion started by: owwow14
3 Replies

5. Shell Programming and Scripting

Can't figure out how to find specific characters in specific columns

I am trying to find a specific set of characters in a long file. I only want to find the characters in column 265 for 4 bytes. Is there a search for that? I tried cut but couldn't get it to work. Ex. I want to find '9999' in column 265 for 4 bytes. If it is in there, I want it to print... (12 Replies)
Discussion started by: Drenhead
12 Replies

6. Shell Programming and Scripting

How to count occurrences in a specific column

Hi, I need help to count the number of occurrences in $3 of file1.txt. I only know how to count by checking one by one and the code is like this: awk '$3 ~ /aku hanya poyo/ {++c} END {print c}' FS="\t" file1.txt But this is not wise to do as i have hundreds of different occurrences in that... (10 Replies)
Discussion started by: redse171
10 Replies

7. UNIX for Dummies Questions & Answers

Unix command to count the number of files with specific characters in name

Hey all, I'm looking for a command that will search a directory (and all subdirectories) and give me a file count for the number of files that contain specific characters within its filename. e.g. I want to find the number of files that contain "-a.jpg" in their name. All the searching I've... (6 Replies)
Discussion started by: murphysm
6 Replies

8. Shell Programming and Scripting

Assigning a specific format to a specific column in a text file using awk and printf

Hi, I have the following text file: 8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm 9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm 10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm 11 T1mapping_flip15 ok 128 108 30... (2 Replies)
Discussion started by: goodbenito
2 Replies

9. Shell Programming and Scripting

Insert a text from a specific row into a specific column using SED or AWK

Hi, I am having trouble converting a text file. I have been working for this whole day now, still i couldn't make it. Here is how the text file looks: _______________________________________________________ DEVICE STATUS INFORMATION FOR LOCATION 1: OPER STATES: Disabled E:Enabled ... (5 Replies)
Discussion started by: Issemael
5 Replies

10. Shell Programming and Scripting

count characters in specific records

I have a text file which represents a http flow like this: HTTP/1.1 200 OK Date: Fri, 23 Jan 2009 17:16:24 GMT Server: Apache Last-Modified: Fri, 23 Jan 2009 17:08:03 GMT Accept-Ranges: bytes Cache-Control: max-age=540 Expires: Fri, 23 Jan 2009 17:21:31 GMT Vary: Accept-Encoding ... (1 Reply)
Discussion started by: littleboyblu
1 Replies
Login or Register to Ask a Question