A developer of mine has this requirement - I couldn't tell her quickly how to do it with UNIX commands or a quick script so she's writing a quick program to do it - but that got my curiousity up and thought I'd ask here for advice.
In a text file, there are some records (about half of them)... (4 Replies)
Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this?
For Example Chr 1 is in following format
I need regions from 2 - 10 should give me AATTCCAAA
and in a similar way 15- 25 should give... (8 Replies)
Hi all !
I have a fasta file that looks like that:
>Sequence1
RTYIPLCASQHKLCPITFLAVK
(it's just an example, obviously in reality I have several pairs of lines like that)
Using UNIX command(s), would it be possible to replace all the characters except the "C" of the second line only by... (7 Replies)
Hi
I have an alignment file (.fasta) with ~80 sequences. They look like this-
>JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0
GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT
TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
I have fasta files with multiple sequences in each. I need to change the sequence name headers from:
>accD:_59176-60699
ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA
>atpA_(reverse_strand):_showing_revcomp_of_10525-12048
ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC... (2 Replies)
I need assistance with following requirement, I am new to Unix.
I want to do the following task but stuck with file creation date(sysdate)
Following is the requirement
I need to create a script that will read the abc/xyz/klm folder and look for *.err files for that day’s date and then send an... (4 Replies)
Hi,
I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help .
input
> fefrwefrwef X900
AGAGGGAATTGG
AGGGGCCTGGAG
GGTTCTCTTC
> fefrwefrwef X932
AGAGGGAATTGG
AGGAGGTGGAG
GGTTCTCTTC
> fefrwefrwef X937... (2 Replies)
Hello,
I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
Hi
This is my first post and I'm just a beginner. So please be nice to me.
I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file?
I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies
LEARN ABOUT DEBIAN
bp_mask_by_search
BP_MASK_BY_SEARCH(1p) User Contributed Perl Documentation BP_MASK_BY_SEARCH(1p)NAME
mask_by_search - mask sequence(s) based on its alignment results
SYNOPSIS
mask_by_search.pl -f blast genomefile blastfile.bls > maskedgenome.fa
DESCRIPTION
Mask sequence based on significant alignments of another sequence. You need to provide the report file and the entire sequence data which
you want to mask. By default this will assume you have done a TBLASTN (or TFASTY) and try and mask the hit sequence assuming you've
provided the sequence file for the hit database. If you would like to do the reverse and mask the query sequence specify the -t/--type
query flag.
This is going to read in the whole sequence file into memory so for large genomes this may fall over. I'm using DB_File to prevent keeping
everything in memory, one solution is to split the genome into pieces (BEFORE you run the DB search though, you want to use the exact file
you BLASTed with as input to this program).
Below the double dash (--) options are of the form --format=fasta or --format fasta or you can just say -f fasta
By -f/--format I mean either are acceptable options. The =s or =n or =c specify these arguments expect a 'string'
Options:
-f/--format=s Search report format (fasta,blast,axt,hmmer,etc)
-sf/--sformat=s Sequence format (fasta,genbank,embl,swissprot)
--hardmask (booelean) Hard mask the sequence
with the maskchar [default is lowercase mask]
--maskchar=c Character to mask with [default is N], change
to 'X' for protein sequences
-e/--evalue=n Evalue cutoff for HSPs and Hits, only
mask sequence if alignment has specified evalue
or better
-o/--out/
--outfile=file Output file to save the masked sequence to.
-t/--type=s Alignment seq type you want to mask, the
'hit' or the 'query' sequence. [default is 'hit']
--minlen=n Minimum length of an HSP for it to be used
in masking [default 0]
-h/--help See this help information
AUTHOR - Jason Stajich
Jason Stajich, jason-at-bioperl-dot-org.
perl v5.14.2 2012-03-02 BP_MASK_BY_SEARCH(1p)