bio::tools::run::alignment::tcoffee(3pm) [debian man page]

Bio::Tools::Run::Alignment::TCoffee(3pm)		User Contributed Perl Documentation		  Bio::Tools::Run::Alignment::TCoffee(3pm)

NAME

       Bio::Tools::Run::Alignment::TCoffee - Object for the calculation of a multiple sequence alignment from a set of unaligned sequences or
       alignments using the TCoffee program

SYNOPSIS

	 # Build a tcoffee alignment factory
	 @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
	 $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params);

	 # Pass the factory a list of sequences to be aligned.
	 $inputfilename = 't/cysprot.fa';
	 # $aln is a SimpleAlign object.
	 $aln = $factory->align($inputfilename);

	 # or where @seq_array is an array of Bio::Seq objects
	 $seq_array_ref = @seq_array;
	 $aln = $factory->align($seq_array_ref);

	 # Or one can pass the factory a pair of (sub)alignments
	 #to be aligned against each other, e.g.:

	 # where $aln1 and $aln2 are Bio::SimpleAlign objects.
	 $aln = $factory->profile_align($aln1,$aln2);

	 # Or one can pass the factory an alignment and one or more
	 # unaligned sequences to be added to the alignment. For example:

	 # $seq is a Bio::Seq object.
	 $aln = $factory->profile_align($aln1,$seq);

	 #There are various additional options and input formats available.
	 #See the DESCRIPTION section that follows for additional details.

DESCRIPTION

       Note: this DESCRIPTION only documents the (Bio)perl interface to TCoffee.

   Helping the module find your executable
       You will need to enable TCoffee to find the t_coffee program. This can be done in (at least) three ways:

	1. Make sure the t_coffee executable is in your path so that
	   which t_coffee returns a t_coffee executable on your system.

	2. Define an environmental variable TCOFFEEDIR which is a dir
	   which contains the 't_coffee' app:
	   In bash
	   export TCOFFEEDIR=/home/username/progs/T-COFFEE_distribution_Version_1.37/bin
	   In csh/tcsh
	   setenv TCOFFEEDIR /home/username/progs/T-COFFEE_distribution_Version_1.37/bin

	3. Include a definition of an environmental variable TCOFFEEDIR in
	   every script that will use this TCoffee wrapper module.
	   BEGIN { $ENV{TCOFFEDIR} = '/home/username/progs/T-COFFEE_distribution_Version_1.37/bin' }
	   use Bio::Tools::Run::Alignment::TCoffee;

       If you are running an application on a webserver make sure the webserver environment has the proper PATH set or use the options 2 or 3 to
       set the variables.

PARAMETERS FOR ALIGNMENT COMPUTATION

       There are a number of possible parameters one can pass in TCoffee.  One should really read the online manual for the best explanation of
       all the features.  See http://igs-server.cnrs-mrs.fr/~cnotred/Documentation/t_coffee/t_coffee_doc.html

       These can be specified as parameters when instantiating a new TCoffee object, or through get/set methods of the same name (lowercase).

   IN
	Title	    : IN
	Description : (optional) input filename, this is specified when
		      align so should not use this directly unless one
		      understand TCoffee program very well.

   TYPE
	Title	    : TYPE
	Args	    : [string] DNA, PROTEIN
	Description : (optional) set the sequence type, guessed automatically
		      so should not use this directly

   PARAMETERS
	Title	    : PARAMETERS
	Description : (optional) Indicates a file containing extra parameters

   EXTEND
	Title	    : EXTEND
	Args	    : 0, 1, or positive value
	Default     : 1
	Description : Flag indicating that library extension should be
		      carried out when performing multiple alignments, if set
		      to 0 then extension is not made, if set to 1 extension
		      is made on all pairs in the library.  If extension is
		      set to another positive value, the extension is only
		      carried out on pairs having a weigth value superior to
		      the specified limit.

   DP_NORMALISE
	Title	    : DP_NORMALISE
	Args	    : 0 or positive value
	Default     : 1000
	Description : When using a value different from 0, this flag sets the
		      score of the highest scoring pair to 1000.

   DP_MODE
	Title	    : DP_MODE
	Args	    : [string] gotoh_pair_wise, myers_miller_pair_wise,
		      fasta_pair_wise cfasta_pair_wise
	Default     : cfast_fair_wise
	Description : Indicates the type of dynamic programming used by
		      the program

	   gotoh_pair_wise : implementation of the gotoh algorithm
	   (quadratic in memory and time)

	   myers_miller_pair_wise : implementation of the Myers and Miller
	   dynamic programming algorithm ( quadratic in time and linear in
	   space). This algorithm is recommended for very long sequences. It
	   is about 2 time slower than gotoh. It only accepts tg_mode=1.

	   fasta_pair_wise: implementation of the fasta algorithm. The
	   sequence is hashed, looking for ktuples words. Dynamic programming
	   is only carried out on the ndiag best scoring diagonals. This is
	   much faster but less accurate than the two previous.

	   cfasta_pair_wise : c stands for checked. It is the same
	   algorithm. The dynamic programming is made on the ndiag best
	   diagonals, and then on the 2*ndiags, and so on until the scores
	   converge. Complexity will depend on the level of divergence of the
	   sequences, but will usually be L*log(L), with an accuracy
	   comparable to the two first mode ( this was checked on BaliBase).

   KTUPLE
	Title	    : KTUPLE
	Args	    : numeric value
	Default     : 1 or 2 (1 for protein, 2 for DNA )

	Description : Indicates the ktuple size for cfasta_pair_wise dp_mode
		      and fasta_pair_wise. It is set to 1 for proteins, and 2
		      for DNA. The alphabet used for protein is not the 20
		      letter code, but a mildly degenerated version, where
		      some residues are grouped under one letter, based on
		      physicochemical properties:
		      rk, de, qh, vilm, fy (the other residues are
		      not degenerated).

   NDIAGS
	Title	    : NDIAGS
	Args	    : numeric value
	Default     : 0
	Description : Indicates the number of diagonals used by the
		      fasta_pair_wise algorithm. When set to 0,
		      n_diag=Log (length of the smallest sequence)

   DIAG_MODE
	Title	    : DIAG_MODE
	Args	    : numeric value
	Default     : 0

	Description : Indicates the manner in which diagonals are scored
		     during the fasta hashing.

		     0 indicates that the score of a diagonal is equal to the
		     sum of the scores of the exact matches it contains.

		     1 indicates that this score is set equal to the score of
		     the best uninterrupted segment

		     1 can be useful when dealing with fragments of sequences.

   SIM_MATRIX
	Title	    : SIM_MATRIX
	Args	    : string
	Default     : vasiliky
	Description : Indicates the manner in which the amino acid is being
		      degenerated when hashing. All the substitution matrix
		      are acceptable. Categories will be defined as sub-group
		      of residues all having a positive substitution score
		      (they can overlap).

		      If you wish to keep the non degenerated amino acid
		      alphabet, use 'idmat'

   MATRIX
	Title	    : MATRIX
	Args	    :
	Default     :
	Description : This flag is provided for compatibility with
		      ClustalW. Setting matrix = 'blosum' is equivalent to
		      -in=Xblosum62mt , -matrix=pam is equivalent to
		      in=Xpam250mt . Apart from this, the rules are similar
		      to those applying when declaring a matrix with the
		      -in=X fl

   GAPOPEN
	Title	    : GAPOPEN
	Args	    : numeric
	Default     : 0
	Description : Indicates the penalty applied for opening a gap. The
		      penalty must be negative. If you provide a positive
		      value, it will automatically be turned into a negative
		      number. We recommend a value of 10 with pam matrices,
		      and a value of 0 when a library is used.

   GAPEXT
	Title	    : GAPEXT
	Args	    : numeric
	Default     : 0
	Description : Indicates the penalty applied for extending a gap.

   COSMETIC_PENALTY
	Title	    : COSMETIC_PENALTY
	Args	    : numeric
	Default     : 100
	Description : Indicates the penalty applied for opening a gap. This
		      penalty is set to a very low value. It will only have
		      an influence on the portions of the alignment that are
		      unalignable. It will not make them more correct, but
		      only more pleasing to the eye ( i.e. Avoid stretches of
		      lonely residues).

		      The cosmetic penalty is automatically turned off if a
		      substitution matrix is used rather than a library.

   TG_MODE
	Title	    : TG_MODE
	Args	    : 0,1,2
	Default     : 1
	Description : (Terminal Gaps)
		      0: indicates that terminal gaps must be panelized with
			 a gapopen and a gapext penalty.
		      1: indicates that terminal gaps must be penalized only
			 with a gapext penalty
		      2: indicates that terminal gaps must not be penalized.

   WEIGHT
	Title	    : WEIGHT
	Args	    : sim or sim_<matrix_name or matrix_file> or integer value
	Default     : sim

	Description : Weight defines the way alignments are weighted when
		      turned into a library.

		      sim indicates that the weight equals the average
			  identity within the match residues.

		      sim_matrix_name indicates the average identity with two
			  residues regarded as identical when their
			  substitution value is positive. The valid matrices
			  names are in matrices.h (pam250mt) . Matrices not
			  found in this header are considered to be
			  filenames. See the format section for matrices. For
			  instance, -weight=sim_pam250mt indicates that the
			  grouping used for similarity will be the set of
			  classes with positive substitutions. Other groups
			  include

			      sim_clustalw_col ( categories of clustalw
			      marked with :)

			      sim_clustalw_dot ( categories of clustalw
			      marked with .)

		      Value indicates that all the pairs found in the
		      alignments must be given the same weight equal to
		      value. This is useful when the alignment one wishes to
		      turn into a library must be given a pre-specified score
		      (for instance if they come from a structure
		      super-imposition program). Value is an integer:

			      -weight=1000

	 Note	    : Weight only affects methods that return an alignment to
		      T-Coffee, such as ClustalW. On the contrary, the
		      version of Lalign we use here returns a library where
		      weights have already been applied and are therefore
		      insensitive to the -weight flag.

   SEQ_TO_ALIGN
	Title	    : SEQ_TO_ALIGN
	Args	    : filename
	Default     : no file - align all the sequences

	Description : You may not wish to align all the sequences brought in
		      by the -in flag. Supplying the seq_to_align flag allows
		      for this, the file is simply a list of names in Fasta
		      format.

		      However, note that library extension will be carried out
		      on all the sequences.

PARAMETERS FOR TREE COMPUTATION AND OUTPUT

   NEWTREE
	Title	    : NEWTREE
	Args	    : treefile
	Default     : no file
	Description : Indicates the name of the new tree to compute. The
		      default will be <sequence_name>.dnd, or <run_name.dnd>.
		      Format is Phylip/Newick tree format

   USETREE
	Title	    : USETREE
	Args	    : treefile
	Default     : no file specified
	Description : This flag indicates that rather than computing a new
		      dendrogram, t_coffee can use a pre-computed one. The
		      tree files are in phylips format and compatible with
		      ClustalW. In most cases, using a pre-computed tree will
		      halve the computation time required by t_coffee. It is
		      also possible to use trees output by ClustalW or
		      Phylips. Format is Phylips tree format

   TREE_MODE
	Title	    : TREE_MODE
	Args	    : slow, fast, very_fast
	Default     : very_fast
	Description : This flag indicates the method used for computing the
		      dendrogram.
		      slow : the chosen dp_mode using the extended library,
		      fast : The fasta dp_mode using the extended library.
		      very_fast: The fasta dp_mode using pam250mt.

   QUICKTREE
	Title	    : QUICKTREE
	Args	    :
	Default     :
	Description : This flag is kept for compatibility with ClustalW.
		      It indicates that:  -tree_mode=very_fast

PARAMETERS FOR ALIGNMENT OUTPUT

   OUTFILE
	Title	    : OUTFILE
	Args	    : out_aln file, default, no
	Default     : default ( yourseqfile.aln)
	Description : indicates name of output alignment file

   OUTPUT
	Title	    : OUTPUT
	Args	    : format1, format2
	Default     : clustalw
	Description : Indicated format for outputting outputfile
		      Supported formats are:

		      clustalw_aln, clustalw: ClustalW format.
		      gcg, msf_aln : Msf alignment.
		      pir_aln : pir alignment.
		      fasta_aln : fasta alignment.
		      phylip : Phylip format.
		      pir_seq : pir sequences (no gap).
		      fasta_seq : fasta sequences (no gap).
	   As well as:
		       score_html : causes the output to be a reliability
				    plot in HTML
		       score_pdf : idem in PDF.
		       score_ps : idem in postscript.

	   More than one format can be indicated:
		       -output=clustalw,gcg, score_html

   CASE
	Title	    : CASE
	Args	    : upper, lower
	Default     : upper
	Description : triggers choice of the case for output

   CPU
	Title	    : CPU
	Args	    : value
	Default     : 0
	Description : Indicates the cpu time (micro seconds) that must be
		      added to the t_coffee computation time.

   OUT_LIB
	Title	    : OUT_LIB
	Args	    : name of library, default, no
	Default     : default
	Description : Sets the name of the library output. Default implies
		      <run_name>.tc_lib

   OUTORDER
	Title	    : OUTORDER
	Args	    : input or aligned
	Default     : input
	Description : Sets the name of the library output. Default implies
		      <run_name>.tc_lib

   SEQNOS
	Title	    : SEQNOS
	Args	    : on or off
	Default     : off
	Description : Causes the output alignment to contain residue numbers
		      at the end of each line:

PARAMETERS FOR GENERIC OUTPUT

   RUN_NAME
	Title	    : RUN_NAME
	Args	    : your run name
	Default     :
	Description : This flag causes the prefix <your sequences> to be
		      replaced by <your run name> when renaming the default
		      files.

   ALIGN
	Title	    : ALIGN
	Args	    :
	Default     :
	Description : Indicates that the program must produce the
		      alignment. This flag is here for compatibility with
		      ClustalW

   QUIET
	Title	    : QUIET
	Args	    : stderr, stdout, or filename, or nothing
	Default     : stderr
	Description : Redirects the standard output to either a file.
		     -quiet on its own redirect the output to /dev/null.

   CONVERT
	Title	    : CONVERT
	Args	    :
	Default     :
	Description : Indicates that the program must not compute the
		      alignment but simply convert all the sequences,
		      alignments and libraries into the format indicated with
		      -output. This flag can also be used if you simply want
		      to compute a library ( i.e. You have an alignment and
		      you want to turn it into a library).

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one
       of the Bioperl mailing lists.  Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the problem with code and data examples if at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution.  Bug reports can be submitted via the
       web:

	http://redmine.open-bio.org/projects/bioperl/

AUTHOR -  Jason Stajich, Peter Schattner
       Email jason-at-bioperl-dot-org, schattner@alum.mit.edu

APPENDIX

       The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

   program_name
	Title	: program_name
	Usage	: $factory->program_name()
	Function: holds the program name
	Returns:  string
	Args	: None

   program_dir
	Title	: program_dir
	Usage	: $factory->program_dir(@params)
	Function: returns the program directory, obtained from ENV variable.
	Returns:  string
	Args	:

   error_string
	Title	: error_string
	Usage	: $obj->error_string($newval)
	Function: Where the output from the last analysus run is stored.
	Returns : value of error_string
	Args	: newvalue (optional)

   version
	Title	: version
	Usage	: exit if $prog->version() < 1.8
	Function: Determine the version number of the program
	Example :
	Returns : float or undef
	Args	: none

   run
	Title	: run
	Usage	: my $output = $application->run(-seq	  => $seq,
						 -profile => $profile,
						 -type	  => 'profile-aln');
	Function: Generic run of an application
	Returns : Bio::SimpleAlign object
	Args	: key-value parameters allowed for TCoffee runs AND
		  -type     => profile-aln or alignment for profile alignments or
			       just multiple sequence alignment
		  -seq	    => either Bio::PrimarySeqI object OR
			       array ref of Bio::PrimarySeqI objects OR
			       filename of sequences to run with
		  -profile  => profile to align to, if this is an array ref
			       will specify the first two entries as the two
			       profiles to align to each other

   align
	Title	: align
	Usage	:
	       $inputfilename = 't/data/cysprot.fa';
	       $aln = $factory->align($inputfilename);
       or
	       $seq_array_ref = @seq_array;
	       # @seq_array is array of Seq objs
	       $aln = $factory->align($seq_array_ref);
	Function: Perform a multiple sequence alignment
	Returns : Reference to a SimpleAlign object containing the
		  sequence alignment.
	Args	: Name of a file containing a set of unaligned fasta sequences
		  or else an array of references to Bio::Seq objects.

	Throws an exception if argument is not either a string (eg a
	filename) or a reference to an array of Bio::Seq objects.  If
	argument is string, throws exception if file corresponding to string
	name can not be found. If argument is Bio::Seq array, throws
	exception if less than two sequence objects are in array.

   profile_align
	Title	: profile_align
	Usage	:
	Function: Perform an alignment of 2 (sub)alignments
	Example :
	Returns : Reference to a SimpleAlign object containing the (super)alignment.
	Args	: Names of 2 files containing the subalignments
		  or references to 2 Bio::SimpleAlign objects.
	Note	: Needs to be updated to run with newer TCoffee code, which
		  allows more than two profile alignments.

       Throws an exception if arguments are not either strings (eg filenames) or references to SimpleAlign objects.

   _run
	Title	:  _run
	Usage	:  Internal function, not to be called directly
	Function:  makes actual system call to tcoffee program
	Example :
	Returns : nothing; tcoffee output is written to a
		  temporary file OR specified output file
	Args	: Name of a file containing a set of unaligned fasta sequences
		  and hash of parameters to be passed to tcoffee

   _setinput
	Title	:  _setinput
	Usage	:  Internal function, not to be called directly
	Function:  Create input file for tcoffee program
	Example :
	Returns : name of file containing tcoffee data input AND
		  type of file (if known, S for sequence, L for sequence library,
		  A for sequence alignment)
	Args	: Seq or Align object reference or input file name

   _setparams
	Title	:  _setparams
	Usage	:  Internal function, not to be called directly
	Function:  Create parameter inputs for tcoffee program
	Example :
	Returns : parameter string to be passed to tcoffee
		  during align or profile_align
	Args	: name of calling object

   aformat
	Title	: aformat
	Usage	: my $alignmentformat = $self->aformat();
	Function: Get/Set alignment format
	Returns : string
	Args	: string

   methods
	Title	: methods
	Usage	: my @methods = $self->methods()
	Function: Get/Set Alignment methods - NOT VALIDATED
	Returns : array of strings
	Args	: arrayref of strings

Bio::Tools::Run::BaseWrapper methods
   no_param_checks
	Title	: no_param_checks
	Usage	: $obj->no_param_checks($newval)
	Function: Boolean flag as to whether or not we should
		  trust the sanity checks for parameter values
	Returns : value of no_param_checks
	Args	: newvalue (optional)

   save_tempfiles
	Title	: save_tempfiles
	Usage	: $obj->save_tempfiles($newval)
	Function:
	Returns : value of save_tempfiles
	Args	: newvalue (optional)

   outfile_name
	Title	: outfile_name
	Usage	: my $outfile = $tcoffee->outfile_name();
	Function: Get/Set the name of the output file for this run
		  (if you wanted to do something special)
	Returns : string
	Args	: [optional] string to set value to

   tempdir
	Title	: tempdir
	Usage	: my $tmpdir = $self->tempdir();
	Function: Retrieve a temporary directory name (which is created)
	Returns : string which is the name of the temporary directory
	Args	: none

   cleanup
	Title	: cleanup
	Usage	: $tcoffee->cleanup();
	Function: Will cleanup the tempdir directory
	Returns : none
	Args	: none

   io
	Title	: io
	Usage	: $obj->io($newval)
	Function:  Gets a L<Bio::Root::IO> object
	Returns : L<Bio::Root::IO>
	Args	: none

perl v5.12.3							    2011-06-18				  Bio::Tools::Run::Alignment::TCoffee(3pm)
Linux and UNIX Man Pages

bio::tools::run::alignment::tcoffee(3pm) [debian man page]