Unix/Linux Go Back    


Shell Programming and Scripting Unix shell scripting - KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and shell scripts and shell scripting languages here.

Count specific characters at specific column positions

Shell Programming and Scripting


Closed Linux or Unix Question    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 12-04-2012
thienxho thienxho is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 15 February 2013, 12:03 PM EST
Posts: 4
Thanks: 2
Thanked 0 Times in 0 Posts
Count specific characters at specific column positions

Hi all, I need help.

I have an input text file (input.txt) like this:


Code:
21	GTGCAACACCGTCTTGAGAGG	50
21	GACCGAGACAGAATGAAAATC	73
21	CGGGTCTGTAGTAGCAAACGC	108
21	CGAAAAATGAACCCCTTTATC	220
21	CGTGATCCTGTTGAAGGGTCG	259

Now I need to count A/T/G/C numbers at each character location in column 2, in this case is always 21 characters, but can be variable.

Output (output.txt) will need to be:


Code:
A	0	1	1	1	3	3	1	2	0	3	1	1	2	1	1	2	3	2	3	0	0
T	0	0	1	0	1	1	1	1	2	0	1	2	0	1	0	1	1	1	1	2	0
G	2	3	2	2	1	0	1	1	1	1	3	0	1	1	1	2	1	2	0	2	2
C	3	0	1	2	0	1	2	1	2	1	0	1	2	1	2	0	0	0	1	1	3

I can do this in Excel, but my file is way bigger than Excel can handle.

Thanks!

Last edited by Scott; 12-04-2012 at 12:07 PM.. Reason: Please use code tags
Sponsored Links
    #2  
Old Unix and Linux 12-04-2012
vgersh99's Unix or Linux Image
vgersh99 vgersh99 is offline Forum Advisor  
Forum Advisor
 
Join Date: Feb 2005
Last Activity: 28 August 2015, 1:39 AM EDT
Location: Foxborough, MA
Posts: 7,871
Thanks: 176
Thanked 667 Times in 624 Posts
awk -f thie.awk myFile
where thie.awk is:

Code:
BEGIN {
  if (!chars) chars="A T G C"
  nchars=split(chars, charsA, FS)
}
{
  width=length($2)
  for(i=1;i<=width;i++)
   arr[substr($2,i,1),i]++
}
END {
  for(i=1;i<=nchars;i++) {
    printf("%s", charsA[i])
    for(j=1;j<=width;j++)
      printf("%s%d%s", OFS, arr[charsA[i],j], (j==width)?ORS:"")
  }
}

The Following User Says Thank You to vgersh99 For This Useful Post:
thienxho (12-04-2012)
Sponsored Links
    #3  
Old Unix and Linux 12-04-2012
thienxho thienxho is offline
Registered User
 
Join Date: Dec 2012
Last Activity: 15 February 2013, 12:03 PM EST
Posts: 4
Thanks: 2
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by vgersh99 View Post
awk -f thie.awk myFile
where thie.awk is:

Code:
BEGIN {
  if (!chars) chars="A T G C"
  nchars=split(chars, charsA, FS)
}
{
  width=length($2)
  for(i=1;i<=width;i++)
   arr[substr($2,i,1),i]++
}
END {
  for(i=1;i<=nchars;i++) {
    printf("%s", charsA[i])
    for(j=1;j<=width;j++)
      printf("%s%d%s", OFS, arr[charsA[i],j], (j==width)?ORS:"")
  }
}

Hi vgersh99,

You solved my problem.

I tested your codes and compared them to my Excel count with a file of 800K rows. Both had same output.

Really appreciated your help.
Sponsored Links
Closed Linux or Unix Question

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Can't figure out how to find specific characters in specific columns Drenhead Shell Programming and Scripting 12 11-21-2012 04:27 PM
How to count occurrences in a specific column redse171 Shell Programming and Scripting 10 10-01-2012 05:51 PM
Unix command to count the number of files with specific characters in name murphysm UNIX for Dummies Questions & Answers 6 05-07-2010 04:12 AM
Assigning a specific format to a specific column in a text file using awk and printf goodbenito Shell Programming and Scripting 2 04-30-2010 09:25 AM
count characters in specific records littleboyblu Shell Programming and Scripting 1 02-05-2009 04:10 AM



All times are GMT -4. The time now is 03:27 AM.