replacement and "reading frames"


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting replacement and "reading frames"
# 1  
Old 06-29-2010
replacement and "reading frames"

Hi,

I'm working with DNA sequences, which as you might know are composed of "codons", which are the "words" if you like, and all codons are 3 letters long. So something I frequently need in my scripts, is pattern searching that respects this typical word size.

E.g. : searching "TAC" in the string "ACGTACGTACGT" with sed would return 2 matches, while there is only one codon "TAC". Respecting the "reading frame" means that words can only start at positions 0, 3, 6, 9, etc.

As the sequences can be enormously long, is there an efficient way that command-line tools like sed or awk can handle this kind of reading ? GAWK for example has a FIELDWIDTHS variable, but it needs a list of fieldwidths, just specifying "3" doesn't work, does it ?

Otherwise, I have to turn to more specific bioinformatic software, which is less easy to incorporate in my scripts.

Thank you in advance for any thoughts on this.
Jos
# 2  
Old 06-29-2010
Do you just need the count?
Code:
$ echo "ACGTACGTACGT" | perl -nle 'while(/(.{3})/g){ $total++ if($1 eq "TAC");} END{print $total}'
1
$ echo "ACGTACGTACGTTAC" | perl -nle 'while(/(.{3})/g){ $total++ if($1 eq "TAC");} END{print $total}'
2

# 3  
Old 06-29-2010
Code:
$ echo "ACGTACGTACGT" | fold -w3 | grep -c TAC
1

From a file:
Code:
fold -w3 file | grep -c TAC


Last edited by Scrutinizer; 06-29-2010 at 09:54 AM..
# 4  
Old 06-29-2010
This might work too:
Code:
perl -0777 -ne '@m=/.{3}TAC/g;push @m,"1" if (/^TAC/); print $#m+1' file

# 5  
Old 06-29-2010
Thanks folks !

Actually, counting is just one of the things I'd like to do frequently ; replacing is another.

I can see quite easily how I could do this using the "fold" command (thanks for bringing it to my attention!)

Code:
echo "ACGTACGTACGT" | fold -w3 | sed -e 's/TAC/NNN/g' | tr -d '\n'

Maybe the perl solutions are quicker ? I'll have to test that, but I'll have to know perl a bit better.

Thanks again !
jos
# 6  
Old 06-29-2010
Depending on what your overall goal is, you might want to take a look at BioPerl.
# 7  
Old 06-29-2010
Yes, I know BioPerl exists, I even installed it once.

For the moment, I hardly know perl, and simple tasks like pattern matching, changing file formats, etc are things one can do with awk, sed. But yes, eventuelly I'll get to try bioperl.

jos
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

4. Shell Programming and Scripting

"Encrypting" using number replacement

Hi, I have a task of "encrypting" a file by replacing the numbers with another set of numbers. Sort of swapping the values, say: 1 = 5 2 = 8 3 = 7 4 = 1 5 = 9 and so on.. so if i have 12345, my output should be 58719. problem is i get 98719 since after swapping 1 to 5, my sed... (7 Replies)
Discussion started by: agentgrecko
7 Replies

5. AIX

"Frames" and "Words" in fcstat output

What are "Frames" and "Words" in the fcstat output? vio1:/home/padmin:# fcstat fcs0 <snip> Transmit Statistics Receive Statistics ------------------- ------------------ Frames: 122844229 363445456 Words: 50940091904 171210861568 <snip> The... (1 Reply)
Discussion started by: kah00na
1 Replies

6. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

7. Shell Programming and Scripting

cat $como_file | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g'

hi All, cat file_name | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g' Can this be done by using sed or awk alone (4 Replies)
Discussion started by: harshakusam
4 Replies

8. UNIX for Dummies Questions & Answers

Vi - "The replacement pattern is too long"

Hi, I am trying to replace a value in a script with another value. I am performing a vi command from another script. vi - ${conf_path}/CANCEL_CD_PART2.txt<<! :%s/RANGE/${btch_range}/g :wq ! 'RANGE' is the current value that the parm in the other script has (PARM1=RANGE), along with... (3 Replies)
Discussion started by: hern14
3 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies
Login or Register to Ask a Question