fasta file

Unix and Linux Discussions Tagged with fasta file
	Thread / Thread Starter	Last Post	Replies	Views	Forum
	How to add specific bases at the beginning and ending of all the fasta sequences? dineshkumarsrk	01-22-2020 by RudiC	1	2,706	UNIX for Beginners Questions & Answers
	How to find a specific sequence pattern in a fasta file? dineshkumarsrk	11-29-2019 by Scrutinizer	3	6,416	UNIX for Beginners Questions & Answers
	Is it possible to rename fasta headers based on its position specified in another file? dineshkumarsrk	11-13-2019 by Scrutinizer	5	2,518	UNIX for Beginners Questions & Answers
	How to append two fasta files? ( 1 2) dineshkumarsrk	06-13-2019 by MadeInGermany	11	6,046	UNIX for Beginners Questions & Answers
	How to extract the partial matching strings among two files? dineshkumarsrk	06-11-2019 by anbu23	2	2,753	UNIX for Beginners Questions & Answers
	How to count the length of fasta sequences? ( 1 2 3) dineshkumarsrk	04-21-2019 by drl	14	23,924	UNIX for Beginners Questions & Answers
	Search for a particular word and replace the first character Fahmida	06-05-2014 by Fahmida	4	1,823	UNIX for Dummies Questions & Answers

LEARN ABOUT DEBIAN

bp_mask_by_search

BP_MASK_BY_SEARCH(1p)					User Contributed Perl Documentation				     BP_MASK_BY_SEARCH(1p)

NAME

       mask_by_search - mask sequence(s) based on its alignment results

SYNOPSIS

	 mask_by_search.pl -f blast genomefile blastfile.bls > maskedgenome.fa

DESCRIPTION

       Mask sequence based on significant alignments of another sequence.  You need to provide the report file and the entire sequence data which
       you want to mask.  By default this will assume you have done a TBLASTN (or TFASTY) and try and mask the hit sequence assuming you've
       provided the sequence file for the hit database.  If you would like to do the reverse and mask the query sequence specify the -t/--type
       query flag.

       This is going to read in the whole sequence file into memory so for large genomes this may fall over.  I'm using DB_File to prevent keeping
       everything in memory, one solution is to split the genome into pieces (BEFORE you run the DB search though, you want to use the exact file
       you BLASTed with as input to this program).

       Below the double dash (--) options are of the form --format=fasta or --format fasta or you can just say -f fasta

       By -f/--format I mean either are acceptable options.  The =s or =n or =c specify these arguments expect a 'string'

       Options:
	   -f/--format=s    Search report format (fasta,blast,axt,hmmer,etc)
	   -sf/--sformat=s  Sequence format (fasta,genbank,embl,swissprot)
	   --hardmask	    (booelean) Hard mask the sequence
			    with the maskchar [default is lowercase mask]
	   --maskchar=c     Character to mask with [default is N], change
			    to 'X' for protein sequences
	   -e/--evalue=n    Evalue cutoff for HSPs and Hits, only
			    mask sequence if alignment has specified evalue
			    or better
	   -o/--out/
	   --outfile=file   Output file to save the masked sequence to.
	   -t/--type=s	    Alignment seq type you want to mask, the
			    'hit' or the 'query' sequence. [default is 'hit']
	   --minlen=n	    Minimum length of an HSP for it to be used
			    in masking [default 0]
	   -h/--help	    See this help information

AUTHOR - Jason Stajich
       Jason Stajich, jason-at-bioperl-dot-org.

perl v5.14.2							    2012-03-02						     BP_MASK_BY_SEARCH(1p)