Loop with Perl (string search)


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Loop with Perl (string search)
# 8  
Old 03-24-2016
Hi, Xterra,

Quote:
Originally Posted by Xterra
[...]
My bash file is complete and working like a charm but I am interested in learning better ways to use awk and/or perl to improve my script [...]
I am going to take your word for it. I took a look at your script and and run it against the Example.txt. It processes just one entry in pykA-Example.txt. You are opening and closing files quite a bit and overwriting them.

Anyway, for whatever is worth, here's an easy to understand Perl snippet that it produces the same output without the calling of external programs and continually opening and close files. The output gets send to your stdout instead of multiple files, so you may want to redirect.

Instead of hard coding gene, bait, primers and fwd, I created an extra setting file that contains these gene entries and patterns, which it does get read before hand.

Here's the look of it:

Code:
$ cat patterns.txt
pncA ^.*CCCGGGCAGTCGCCCGAACGTATGGTGGACGT|TGATGGCACCGCCGAACCGGGATGAACTGTTGGCGG.*$|^.*CCGCCAACAGTTCATCCCGGTTCGGCGGTGCCATCA|ACGTCCACCATACGTTCGGGCGACTGCCCGGG.*$ GTCTGGACACGTCGGCAATC|GATTGCCGACGTGTCCAGAC GTCTGGACACGTCGGCAATC
panD ^.*CACCAGGCTGCTGGACAACATTGCGATTGA|TAGCCGTGCTGCTGGCGATTGACGTCCGCAACACCCA.*$|^.*TGGGTGTTGCGGACGTCAATCGCCAGCAGCACGGCTA|TCAATCGCAATGTTGTCCAGCAGCCTGGTG.*$ TCAACGGTTCCGGTCGGCTGCT|AGCAGCCGACCGGAACCGTTGA TGGTCACCTACGCGATCACCGGCGAACGCGGCA
glpK ^.*GCTGCGGTGGACCATCATGGACGATTACATGCAGTGTCC|GACGTGTCCTAGCTTTCGCTGTGCGCCTGAACATGTCCGCA.*$|^.*TGCGGACATGTTCAGGCGCACAGCGAAAGCTAGGACACGTC|GGACACTGCATGTAATCGTCCATGATGGTCCACCGCAGC.*$ CGGCAAGCTGCAGTGGATCCTGGAA|TTCCAGGATCCACTGCAGCTTGCCG CGGCAAGCTGCAGTGGATCCTGGAA
pykA ^.*CGGCCCTACCGCCGTCGCGACTATGCTGAGTCGTCGTG|GACGTCTAGCCGGGTCGTGCCGGACGGTAAACCCATGTCC.*$|^.*GGACATGGGTTTACCGTCCGGCACGACCCGGCTAGACGTC|CACGACGACTCAGCATAGTCGCGACGGCGGTAGGGCCG.*$ CGTTGCCCGGAATGAACGTG|CACGTTCATTCCGGGCAACG CGTTGCCCGGAATGAACGTG


Here's the code that pulls it:
Code:
#!/usr/bin/perl
# xterra.pl
use strict;
use warnings;

my $settingsfile = shift or die "Missing settings file: $!\n";
open my $sfh, '<', $settingsfile or die "Could not open $settingsfile: $!\n";
my $fastafile = shift or die "Missing fasta file  $!\n";
open my $fh, '<', $fastafile or die "Could not open $fastafile: $!\n";
my %pattern;
my %genes;

while (<$sfh>) {
    chomp;
    my ($gene, $primers, $bait, $fwd) = split;
    @{$pattern{$gene}} = ($primers, $bait, $fwd);
    $genes{$gene} = undef;
}
close $sfh;

my %contig;
while (get_contig($fh, \%contig)) {
    find_gene(\%pattern, \%contig);
}
close $fh;

save_gene(\%genes);


sub save_gene {
    my ($results) = @_;
    for my $g (keys %$results) {
        print $results->{$g} if $results->{$g};
    }
}

sub find_gene {
    my ($test, $seq) = @_;
    for my $gene (keys %$test) {
        my $working_on;
        $working_on = $seq->{dna} if $seq->{dna} =~ /$test->{$gene}[1]/;
        if ($working_on) {
            $working_on =~ s/$test->{$gene}[0]//g;
        }
        else {
            next;
        }

        unless ($working_on =~ /$test->{$gene}[2]/) {
            $working_on =~ y/ACGT/TGCA/;
            $working_on = reverse $working_on;
        }
        $genes{$gene} = "\>$gene-$fastafile\n$working_on\n";
    }
}

sub get_contig {
    my ($fh, $seq) = @_;
    $seq->{label} = $seq->{next_label} if $seq->{next_label};
    my $done = 1;
    while (<$fh>) {
        $done = 0;
        next if /^\s*$/;
        chomp;
        if (/^>\w+/) {
            if ($seq->{label}) {
                $seq->{next_label} = $_;
                return $seq;
            }
            else {
                $seq->{label} = $_;
            }
        }
        else {
            s/\s+//;
            $seq->{dna} .= $_;
        }
    }
    unless ($done) {
        return $seq;
    }
    else {
        $seq->{label} = $seq->{dna} = $seq->{next_label} = undef;
        return;
    }
}

Save as xterra.pl
Run as perl xterra.pl patterns.txt Example.txt > results.output.

It is trivial to add the functionality of outputting a different file for each gene, if necessary.

Here's a resource link to learn more about Perl and Bioinformatics in case that it might be usable.

Last edited by Aia; 03-24-2016 at 12:27 AM.. Reason: Corrects wrong indentation.
# 9  
Old 03-25-2016
Aia

Thanks! I will analyze your script. I know perl is great. However, at this point I am trying to learn more about SED and awk. Once I am really familiar with both, I am planning to start with perl
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Loop through the folders and search for particular string in files

Hello, Opearting System Environment : HP Unix B.11.31 U I look for script to On specific folders list On specific filelist Search for given string For Example : r48_buildlib.txt contains wpr480.0_20161027 wpr480.0_20161114 wpr481.0_20161208 wpr482.0_20161222... (4 Replies)
Discussion started by: Siva SQL
4 Replies

2. UNIX for Dummies Questions & Answers

Search different string using perl

Hello, I want to search two strings in a file and print the same in the new file using perl script. Can anyone suggest me how to do this... The file looks like below: <UML:ModelElement.requirement> <UML:Dependency name="Row_MainColumn_FW_0009"> <UML:ModelElement.taggedValue>... (3 Replies)
Discussion started by: suvendu4urs
3 Replies

3. Shell Programming and Scripting

Recursive search for string in file with Loop condition

Hi, Need some help... I want to execute sequence commands, like below test1.sh test2.sh ...etc test1.sh file will generate log file, we need to search for 'complete' string on test1.sh file, once that condition success and then it should go to test2.sh file, each .sh scripts will take... (5 Replies)
Discussion started by: rkrish123
5 Replies

4. Shell Programming and Scripting

perl search string for cut data

perl -lne '$/="1H1XXXXX";print $_ if /0001|0002|0003/' data.txt> output.txt more data.txt 1H1XXXXX|0001|Y| aaa bbb ccc 1H1XXXXX|0005|N| bbb g 1H1XXXXX|0001|Y| hhh ddd 222 1H1XXXXX|0002|Y| 444 1H1XXXXX|0002|N| 222 1H1XXXXX|0003|Y| hhhh (3 Replies)
Discussion started by: kittiwas
3 Replies

5. Programming

PERL, search and replace inside foreach loop

Hello All, Im a Hardware engineer, I have written this script to automate my job. I got stuck in the following location. CODE: .. .. ... foreach $key(keys %arr_hash) { my ($loc,$ind,$add) = split /,/, $arr_hash{$key}; &create_verilog($key, $loc, $ind ,$add); } sub create_verilog{... (2 Replies)
Discussion started by: riyasnr007
2 Replies

6. Shell Programming and Scripting

search of string from an array in Perl

Hi All I want to search a string from an array in Perl. If a match occurs, assign that string to a variable else assign 'No match'. I tried writing the script as follows but it's in vain. Please help me.. #!/usr/bin/perl use strict; my $NER; my @text=("ORG","PER"); ... (4 Replies)
Discussion started by: my_Perl
4 Replies

7. Shell Programming and Scripting

Perl search in a string for....

ok so what I am trying to do is search through 200k files that have ext .000 or .702. for *@yahoo.com.tw and if it finds that in the file. then remove the file. this is my code... what am i doing wrong. it seams it will only find asdflkajsdf@yahoo.com.tw as a string and not *@yahoo.com.tw so it... (5 Replies)
Discussion started by: Philux
5 Replies

8. Shell Programming and Scripting

search for a string -perl

Hi, I have a line where i need to get certain part of it.. example.. text txt tt: 1909 thats how exactly it looks and all spaces are to be counted.. i need to retrieve 1909.. Thanks (11 Replies)
Discussion started by: meghana
11 Replies

9. Shell Programming and Scripting

Perl: Search for string on line then search and replace text

Hi All, I have a file that I need to be able to find a pattern match on a line, search that line for a text pattern, and replace that text. An example of 4 lines in my file is: 1. MatchText_randomNumberOfText moreData ReplaceMe moreData 2. MatchText_randomNumberOfText moreData moreData... (4 Replies)
Discussion started by: Crypto
4 Replies

10. Shell Programming and Scripting

Perl: Search for string then parse next line

Hi All, I have a file that I need to be able to find a pattern match on one line then parse data on the next or subsequent lines - I will know which line needs to be parsed beforehand. This is what I currently have: while (<COMMAND_OUT>) { if ($_ =~ m/TEST/) { ... (4 Replies)
Discussion started by: pondlife
4 Replies
Login or Register to Ask a Question