Loop with Perl (string search)


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Loop with Perl (string search)
# 1  
Old 03-22-2016
Loop with Perl (string search)

I am using a perl script to reverse and complement sequences if a string is found. The script works as expected as standalone but I would like to use it in my bash file. However, I am not getting my expected result.
My test.txt file
Code:
>Sample_72
CCCCTCGCGACCTGGATGTGTCGGCGTTTGTCATGTTTTCGTCGATGGCCGGGCTGGTCGGATCGTCSREVERGGGCCAGGCCAAAA
>Sample_2
CCCCTCGCGAJOHNCCTGGATGTGTCGGCGTTTGTCATGTTTTCGTCGATGGCCGGGCTGGTCGGATCGTCSREVERGGGCCAGGCCAAAA

Standalone Perl script:
Code:
perl -ple '/JOHN/ and y/ACGT/TGCA/ and $_ = reverse unless /^>/' test.txt

I get the expected output:
Code:
>Sample_72
CCCCTCGCGACCTGGATGTGTCGGCGTTTGTCATGTTTTCGTCGATGGCCGGGCTGGTCGGATCGTCSREVERGGGCCAGGCCAAAA
>Sample_2
TTTTGGCCTGGCCCREVERSGACGATCCGACCAGCCCGGCCATCGACGAAAACATGACAAACGCCGACACATCCAGGNHOJTCGCGAGGGG

Bash file:
Code:
#!/bin/bash
Primers=( NULL 'JOHN' )
for y in {1..1}
do
	perl -ple '/${Primers[$y]}/ and y/ACGT/TGCA/ and $_ = reverse unless /^>/' test.txt
done

Unexpected output:
Code:
>Sample_72
TTTTGGCCTGGCCCREVERSGACGATCCGACCAGCCCGGCCATCGACGAAAACATGACAAACGCCGACACATCCAGGTCGCGAGGGG
>Sample_2
TTTTGGCCTGGCCCREVERSGACGATCCGACCAGCCCGGCCATCGACGAAAACATGACAAACGCCGACACATCCAGGNHOJTCGCGAGGGG

in the bash file, the script reverses and complements the sequences regardless of the presence or absence of the string (JOHN)
I guess I still do not understand the proper use of slashes
# 2  
Old 03-23-2016
try:
Code:
#!/bin/bash
Primers=( NULL 'JOHN' )
for y in {1..1}
do
        perl -ple '/'"${Primers[$y]}"'/ and y/ACGT/TGCA/ and $_ = reverse unless /^>/' test.txt
done


Last edited by rdrtx1; 03-23-2016 at 12:33 AM..
This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 03-23-2016
The single quotes prevent the shell from expanding the shell array variable ${Primers[$y]}.
You can try:
Code:
perl -ple "/${Primers[$y]}/"' and y/ACGT/TGCA/ and $_ = reverse unless /^>/' test.txt

or pass the variable:
Code:
perl -sple '$_=~$match and y/ACGT/TGCA/ and $_ = reverse unless /^>/' -- -match="${Primers[$y]}"

--
Note that repeatedly calling an external program, like awk or perl from inside a shell loop is less efficient and detrimental to performance. It could be done more efficiently by looping inside these programs themselves...
# 4  
Old 03-23-2016
Thanks a lot!
# 5  
Old 03-23-2016
Quote:
Originally Posted by rdrtx1
try:
Code:
#!/bin/bash
Primers=( NULL 'JOHN' )
for y in {1..1}
do
        perl -ple '/'${Primers[$y]}'/ and y/ACGT/TGCA/ and $_ = reverse unless /^>/' test.txt
done

Note that by doing this anything that is in ${Primers[$y]} will be unprotected by quotes from the shell and will be subject to field splitting an globbing..

To illustrate (in an empty directory):
Code:
$ touch foo bar
$ var="a   b*"
$ echo "$var"
a   b*
$ echo ''$var''
a bar


Last edited by Scrutinizer; 03-23-2016 at 12:42 AM..
# 6  
Old 03-23-2016
Hi Xterra,

Perl is worthwhile to learn, especially if you are working with scientific data like fasta files. It does not require the help of the shell to do, efficiently, what it is required.
If you were to share a bit more of the "big picture" of what you are trying to do as a whole, maybe we could help you to just use Perl, instead.
# 7  
Old 03-23-2016
Aia

Thanks! I am chasing genes. I sequenced whole genomes and built large contigs (Example.txt). I am fishing out the contigs (sequences with > as identifiers) containing the genes of interest using conserved DNA stretches (Bait). Then, I am trimming the sequences using conserved flanking regions (Primers) to keep exclusively the nucleotide sequences coding for those genes (Gene). The sequences can be in the right direction (forward) or reversed (antisense), depending on the raw sequences generated by our MySeq sequencer. I am using a modified version of your perl one-liner script to determine if the sequence is forward or reverse (the script in this thread). If the sequence is in the right direction nothing is done; however, if the sequence is in the opposite direction, your script will reverse and complement it, outputting a forward sequence every time. Then I just proceed to name the sequence and files. Thus, this bash script outputs four different genes from the original fasta file.
My bash file is complete and working like a charm but I am interested in learning better ways to use awk and/or perl to improve my script
Thanks!
PS. My Example.txt file was trimmed to fit the size limit.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Loop through the folders and search for particular string in files

Hello, Opearting System Environment : HP Unix B.11.31 U I look for script to On specific folders list On specific filelist Search for given string For Example : r48_buildlib.txt contains wpr480.0_20161027 wpr480.0_20161114 wpr481.0_20161208 wpr482.0_20161222... (4 Replies)
Discussion started by: Siva SQL
4 Replies

2. UNIX for Dummies Questions & Answers

Search different string using perl

Hello, I want to search two strings in a file and print the same in the new file using perl script. Can anyone suggest me how to do this... The file looks like below: <UML:ModelElement.requirement> <UML:Dependency name="Row_MainColumn_FW_0009"> <UML:ModelElement.taggedValue>... (3 Replies)
Discussion started by: suvendu4urs
3 Replies

3. Shell Programming and Scripting

Recursive search for string in file with Loop condition

Hi, Need some help... I want to execute sequence commands, like below test1.sh test2.sh ...etc test1.sh file will generate log file, we need to search for 'complete' string on test1.sh file, once that condition success and then it should go to test2.sh file, each .sh scripts will take... (5 Replies)
Discussion started by: rkrish123
5 Replies

4. Shell Programming and Scripting

perl search string for cut data

perl -lne '$/="1H1XXXXX";print $_ if /0001|0002|0003/' data.txt> output.txt more data.txt 1H1XXXXX|0001|Y| aaa bbb ccc 1H1XXXXX|0005|N| bbb g 1H1XXXXX|0001|Y| hhh ddd 222 1H1XXXXX|0002|Y| 444 1H1XXXXX|0002|N| 222 1H1XXXXX|0003|Y| hhhh (3 Replies)
Discussion started by: kittiwas
3 Replies

5. Programming

PERL, search and replace inside foreach loop

Hello All, Im a Hardware engineer, I have written this script to automate my job. I got stuck in the following location. CODE: .. .. ... foreach $key(keys %arr_hash) { my ($loc,$ind,$add) = split /,/, $arr_hash{$key}; &create_verilog($key, $loc, $ind ,$add); } sub create_verilog{... (2 Replies)
Discussion started by: riyasnr007
2 Replies

6. Shell Programming and Scripting

search of string from an array in Perl

Hi All I want to search a string from an array in Perl. If a match occurs, assign that string to a variable else assign 'No match'. I tried writing the script as follows but it's in vain. Please help me.. #!/usr/bin/perl use strict; my $NER; my @text=("ORG","PER"); ... (4 Replies)
Discussion started by: my_Perl
4 Replies

7. Shell Programming and Scripting

Perl search in a string for....

ok so what I am trying to do is search through 200k files that have ext .000 or .702. for *@yahoo.com.tw and if it finds that in the file. then remove the file. this is my code... what am i doing wrong. it seams it will only find asdflkajsdf@yahoo.com.tw as a string and not *@yahoo.com.tw so it... (5 Replies)
Discussion started by: Philux
5 Replies

8. Shell Programming and Scripting

search for a string -perl

Hi, I have a line where i need to get certain part of it.. example.. text txt tt: 1909 thats how exactly it looks and all spaces are to be counted.. i need to retrieve 1909.. Thanks (11 Replies)
Discussion started by: meghana
11 Replies

9. Shell Programming and Scripting

Perl: Search for string on line then search and replace text

Hi All, I have a file that I need to be able to find a pattern match on a line, search that line for a text pattern, and replace that text. An example of 4 lines in my file is: 1. MatchText_randomNumberOfText moreData ReplaceMe moreData 2. MatchText_randomNumberOfText moreData moreData... (4 Replies)
Discussion started by: Crypto
4 Replies

10. Shell Programming and Scripting

Perl: Search for string then parse next line

Hi All, I have a file that I need to be able to find a pattern match on one line then parse data on the next or subsequent lines - I will know which line needs to be parsed beforehand. This is what I currently have: while (<COMMAND_OUT>) { if ($_ =~ m/TEST/) { ... (4 Replies)
Discussion started by: pondlife
4 Replies
Login or Register to Ask a Question