bp_mask_by_search(1p) [debian man page]

BP_MASK_BY_SEARCH(1p)					User Contributed Perl Documentation				     BP_MASK_BY_SEARCH(1p)

NAME

       mask_by_search - mask sequence(s) based on its alignment results

SYNOPSIS

	 mask_by_search.pl -f blast genomefile blastfile.bls > maskedgenome.fa

DESCRIPTION

       Mask sequence based on significant alignments of another sequence.  You need to provide the report file and the entire sequence data which
       you want to mask.  By default this will assume you have done a TBLASTN (or TFASTY) and try and mask the hit sequence assuming you've
       provided the sequence file for the hit database.  If you would like to do the reverse and mask the query sequence specify the -t/--type
       query flag.

       This is going to read in the whole sequence file into memory so for large genomes this may fall over.  I'm using DB_File to prevent keeping
       everything in memory, one solution is to split the genome into pieces (BEFORE you run the DB search though, you want to use the exact file
       you BLASTed with as input to this program).

       Below the double dash (--) options are of the form --format=fasta or --format fasta or you can just say -f fasta

       By -f/--format I mean either are acceptable options.  The =s or =n or =c specify these arguments expect a 'string'

       Options:
	   -f/--format=s    Search report format (fasta,blast,axt,hmmer,etc)
	   -sf/--sformat=s  Sequence format (fasta,genbank,embl,swissprot)
	   --hardmask	    (booelean) Hard mask the sequence
			    with the maskchar [default is lowercase mask]
	   --maskchar=c     Character to mask with [default is N], change
			    to 'X' for protein sequences
	   -e/--evalue=n    Evalue cutoff for HSPs and Hits, only
			    mask sequence if alignment has specified evalue
			    or better
	   -o/--out/
	   --outfile=file   Output file to save the masked sequence to.
	   -t/--type=s	    Alignment seq type you want to mask, the
			    'hit' or the 'query' sequence. [default is 'hit']
	   --minlen=n	    Minimum length of an HSP for it to be used
			    in masking [default 0]
	   -h/--help	    See this help information

AUTHOR - Jason Stajich
       Jason Stajich, jason-at-bioperl-dot-org.

perl v5.14.2							    2012-03-02						     BP_MASK_BY_SEARCH(1p)

Check Out this Related Man Page

PSI-CD-HIT.PL(1)						   User Commands						  PSI-CD-HIT.PL(1)

NAME

       psi-cd-hit.pl - runs similar algorithm like CD-HIT but using BLAST to calculate similarities

DESCRIPTION

       Usage psi-cd-hit [Options]

       Options

       -i     in_dbname, required

       -o     out_dbname, required

       -c     clustering threshold (sequence identity), default 0.3

       -ce clustering threshold (blast expect), default -1,

	      it  means  by default it doesn't use expect threshold, but with positive value, the program cluster seqs if similarities meet either
	      identity threshold or expect threshold

       -L     coverage of shorter sequence ( aligned / full), default 0.0

       -M     coverage of longer sequence ( aligned / full), default 0.0

       -R     (1/0) use psi-blast profile? default 0 perform psi-blast / pdb-blast type search

       -G     (1/0) use global identity? default 1 sequence identity calculated as

	      total identical residues of local alignments / length of shorter seq

	      if you prefer to use -G 0, it is suggested that you also use -L, such as -L 0.8, to prevent very short matches.

       -d     length of description line in the .clstr file, default 30 if set to 0, it takes the fasta defline and stops at first space

       -l     length_of_throw_away_sequences, default 10

       -p     profile search para, default

	      "-a 2 -d nr80 -j 3 -F F -e 0.001 -b 500 -v 500"

       -bfdb profile database, default nr80

       -s     blast search para, default

	      "-F F -e 0.000001 -b 100000 -v 100000"

       -be blast expect cutoff, default 0.000001

       -b     filename of list of hosts to run this program in parallel with ssh calls, you need provide a list of hosts

       -pbs No of jobs to send each time by PBS querying system

	      you can not use both ssh and pbs at same time

       -k (1/0) keep blast raw output file, default 1

       -rs steps of save restart file and clustering output, default 5000

	      everytime after process 5000 sequences, program write a restart file and current clustering information

       -restart restart file, readin a restart file

	      if program crash, stoped, termitated, you can restart it by add a option "-restart sth.restart"

       -rf steps of re format blast database, default 200,000

	      if program clustered 200,000 seqs, it remove them from seq pool, and re format blast db to save time

       -local dir of local blast db,

	      when run in parallel with ssh (not pbs), I can copy blast dbs to local drives on each node to save blast db reading time BUT, IT MAY
	      NOT FASTER

       -J     job, job_file, exe specific jobs like parse blast outonly DON'T use it, it is only used by this program itself

       -single files of ids those you known that they are singletons

	      so I won't run them as queries

	      ============================== by Weizhong Li, liwz@sdsc.edu ==============================

	      If you find cd-hit useful, please kindly cite:

	      "Clustering  of  highly  homologous  sequences  to reduce thesize of large protein database", Weizhong Li, Lukasz Jaroszewski & Adam
	      GodzikBioinformatics, (2001) 17:282-283 "Cd-hit: a fast program for clustering and comparing large sets  of  protein  or	nucleotide
	      sequences", Weizhong Li & Adam Godzik Bioinformatics, (2006) 22:1658-1659

psi-cd-hit.pl 4.6-2012-04-25					    April 2012							  PSI-CD-HIT.PL(1)

Linux and UNIX Man Pages

bp_mask_by_search(1p) [debian man page]

Check Out this Related Man Page