Count specific characters at specific column positions | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Count specific characters at specific column positions

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 12-04-2012
thienxho thienxho is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 15 February 2013, 12:03 PM EST
Posts: 4
Thanks: 2
Thanked 0 Times in 0 Posts
Count specific characters at specific column positions

Hi all, I need help.

I have an input text file (input.txt) like this:


Code:
21	GTGCAACACCGTCTTGAGAGG	50
21	GACCGAGACAGAATGAAAATC	73
21	CGGGTCTGTAGTAGCAAACGC	108
21	CGAAAAATGAACCCCTTTATC	220
21	CGTGATCCTGTTGAAGGGTCG	259

Now I need to count A/T/G/C numbers at each character location in column 2, in this case is always 21 characters, but can be variable.

Output (output.txt) will need to be:


Code:
A	0	1	1	1	3	3	1	2	0	3	1	1	2	1	1	2	3	2	3	0	0
T	0	0	1	0	1	1	1	1	2	0	1	2	0	1	0	1	1	1	1	2	0
G	2	3	2	2	1	0	1	1	1	1	3	0	1	1	1	2	1	2	0	2	2
C	3	0	1	2	0	1	2	1	2	1	0	1	2	1	2	0	0	0	1	1	3

I can do this in Excel, but my file is way bigger than Excel can handle.

Thanks!

Last edited by Scott; 12-04-2012 at 01:07 PM.. Reason: Please use code tags
Sponsored Links
    #2  
Old 12-04-2012
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Advisor  
Forum Advisor
 
Join Date: Feb 2005
Last Activity: 21 December 2014, 2:04 PM EST
Location: Foxborough, MA
Posts: 7,679
Thanks: 156
Thanked 591 Times in 555 Posts
awk -f thie.awk myFile
where thie.awk is:

Code:
BEGIN {
  if (!chars) chars="A T G C"
  nchars=split(chars, charsA, FS)
}
{
  width=length($2)
  for(i=1;i<=width;i++)
   arr[substr($2,i,1),i]++
}
END {
  for(i=1;i<=nchars;i++) {
    printf("%s", charsA[i])
    for(j=1;j<=width;j++)
      printf("%s%d%s", OFS, arr[charsA[i],j], (j==width)?ORS:"")
  }
}

The Following User Says Thank You to vgersh99 For This Useful Post:
thienxho (12-04-2012)
Sponsored Links
    #3  
Old 12-04-2012
thienxho thienxho is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 15 February 2013, 12:03 PM EST
Posts: 4
Thanks: 2
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by vgersh99 View Post
awk -f thie.awk myFile
where thie.awk is:

Code:
BEGIN {
  if (!chars) chars="A T G C"
  nchars=split(chars, charsA, FS)
}
{
  width=length($2)
  for(i=1;i<=width;i++)
   arr[substr($2,i,1),i]++
}
END {
  for(i=1;i<=nchars;i++) {
    printf("%s", charsA[i])
    for(j=1;j<=width;j++)
      printf("%s%d%s", OFS, arr[charsA[i],j], (j==width)?ORS:"")
  }
}

Hi vgersh99,

You solved my problem.

I tested your codes and compared them to my Excel count with a file of 800K rows. Both had same output.

Really appreciated your help.
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Can't figure out how to find specific characters in specific columns Drenhead Shell Programming and Scripting 12 11-21-2012 05:27 PM
How to count occurrences in a specific column redse171 Shell Programming and Scripting 10 10-01-2012 06:51 PM
Unix command to count the number of files with specific characters in name murphysm UNIX for Dummies Questions & Answers 6 05-07-2010 05:12 AM
Assigning a specific format to a specific column in a text file using awk and printf goodbenito Shell Programming and Scripting 2 04-30-2010 10:25 AM
count characters in specific records littleboyblu Shell Programming and Scripting 1 02-05-2009 05:10 AM



All times are GMT -4. The time now is 04:07 PM.