Thanks! I am chasing genes. I sequenced whole genomes and built large contigs (Example.txt). I am fishing out the contigs (sequences with > as identifiers) containing the genes of interest using conserved DNA stretches (Bait). Then, I am trimming the sequences using conserved flanking regions (Primers) to keep exclusively the nucleotide sequences coding for those genes (Gene). The sequences can be in the right direction (forward) or reversed (antisense), depending on the raw sequences generated by our MySeq sequencer. I am using a modified version of your perl one-liner script to determine if the sequence is forward or reverse (the script in this thread). If the sequence is in the right direction nothing is done; however, if the sequence is in the opposite direction, your script will reverse and complement it, outputting a forward sequence every time. Then I just proceed to name the sequence and files. Thus, this bash script outputs four different genes from the original fasta file.
My bash file is complete and working like a charm but I am interested in learning better ways to use awk and/or perl to improve my script
Thanks!
PS. My Example.txt file was trimmed to fit the size limit.
Hi All,
I have a file that I need to be able to find a pattern match on one line then parse data on the next or subsequent lines - I will know which line needs to be parsed beforehand.
This is what I currently have:
while (<COMMAND_OUT>) {
if ($_ =~ m/TEST/) {
... (4 Replies)
Hi All,
I have a file that I need to be able to find a pattern match on a line, search that line for a text pattern, and replace that text.
An example of 4 lines in my file is:
1. MatchText_randomNumberOfText moreData ReplaceMe moreData
2. MatchText_randomNumberOfText moreData moreData... (4 Replies)
Hi,
I have a line where i need to get certain part of it.. example..
text txt tt: 1909
thats how exactly it looks and all spaces are to be counted.. i need to retrieve 1909..
Thanks (11 Replies)
ok so what I am trying to do is search through 200k files that have ext .000 or .702. for *@yahoo.com.tw and if it finds that in the file. then remove the file. this is my code... what am i doing wrong. it seams it will only find asdflkajsdf@yahoo.com.tw as a string and not *@yahoo.com.tw so it... (5 Replies)
Hi All
I want to search a string from an array in Perl. If a match occurs, assign that string to a variable else assign 'No match'. I tried writing the script as follows but it's in vain. Please help me..
#!/usr/bin/perl
use strict;
my $NER;
my @text=("ORG","PER");
... (4 Replies)
Hello All,
Im a Hardware engineer, I have written this script to automate my job. I got stuck in the following location.
CODE:
..
..
...
foreach $key(keys %arr_hash) {
my ($loc,$ind,$add) = split /,/, $arr_hash{$key};
&create_verilog($key, $loc, $ind ,$add);
}
sub create_verilog{... (2 Replies)
Hi,
Need some help...
I want to execute sequence commands, like below
test1.sh
test2.sh
...etc
test1.sh file will generate log file, we need to search for 'complete' string on test1.sh file, once that condition success and then it should go to test2.sh file, each .sh scripts will take... (5 Replies)
Hello,
I want to search two strings in a file and print the same in the new file using perl script.
Can anyone suggest me how to do this...
The file looks like below:
<UML:ModelElement.requirement>
<UML:Dependency name="Row_MainColumn_FW_0009"> <UML:ModelElement.taggedValue>... (3 Replies)
Hello,
Opearting System Environment : HP Unix B.11.31 U
I look for script to
On specific folders list
On specific filelist
Search for given string
For Example :
r48_buildlib.txt contains
wpr480.0_20161027
wpr480.0_20161114
wpr481.0_20161208
wpr482.0_20161222... (4 Replies)
Discussion started by: Siva SQL
4 Replies
LEARN ABOUT DEBIAN
bp_gccalc
BP_GCCALC(1p) User Contributed Perl Documentation BP_GCCALC(1p)NAME
gccalc - GC content of nucleotide sequences
SYNOPSIS
gccalc [-f/--format FORMAT] [-h/--help] filename
or
gccalc [-f/--format FORMAT] < filename
or
gccalc [-f/--format FORMAT] -i filename
DESCRIPTION
This scripts prints out the GC content for every nucleotide sequence from the input file.
OPTIONS
The default sequence format is fasta.
The sequence input can be provided using any of the three methods:
unnamed argument
gccalc filename
named argument
gccalc -i filename
standard input
gccalc < filename
FEEDBACK
Mailing Lists
User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the
Bioperl mailing list. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the
web:
https://redmine.open-bio.org/projects/bioperl/
AUTHOR - Jason Stajich
Email jason@bioperl.org
HISTORY
Based on script code (see bottom) submitted by cckim@stanford.edu
Submitted as part of bioperl script project 2001/08/06
perl v5.14.2 2012-03-02 BP_GCCALC(1p)