Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

kalign(1) [debian man page]

KALIGN(1)							Kalign User Manual							 KALIGN(1)

NAME
kalign - performs multiple alignment of biological sequences. SYNOPSIS
kalign [infile.fasta] [outfile.fasta] [Options] kalign [-i infile.fasta] [-o outfile.fasta] [Options] kalign [< infile.fasta] [> outfile.fasta] [Options] DESCRIPTION
Kalign is a command line tool to perform multiple alignment of biological sequences. It employs the Muth?Manber string-matching algorithm, to improve both the accuracy and speed of the alignment. It uses global, progressive alignment approach, enriched by employing an approximate string-matching algorithm to calculate sequence distances and by incorporating local matches into the otherwise global alignment. OPTIONS
-s -gpo -gapopen -gap_open x Gap open penalty . -e -gpe -gap_ext -gapextension x Gap extension penalty. -t -tgpe -terminal_gap_extension_penalty x Terminal gap penalties. -m -bonus -matrix_bonus x A constant added to the substitution matrix. -c -sort <input, tree, gaps.> The order in which the sequences appear in the output alignment. -g -feature Selects feature mode and specifies which features are to be used: e.g. all, maxplp, STRUCT, PFAM-A? -same_feature_score Score for aligning same features. -diff_feature_score Penalty for aligning different features. -d -distance <wu, pair> Distance method -b -tree -guide-tree <nj, upgma> Guide tree method. -z -zcutoff Parameter used in the wu-manber based distance calculation. -i -in -input Name of the input file. -o -out -output Name of the output file. -a -gap_inc Increases gap penalties depending on the number of existing gaps. -f -format <fasta, msf, aln, clu, macsim> The output format. -q -quiet Print nothing to STDERR. Read nothing from STDIN. REFERENCES
o Timo Lassmann and Erik L.L. Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6:298 o Timo Lassmann, Oliver Frings and Erik L. L. Sonnhammer (2009) Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acid Research 3:858?865. AUTHORS
Timo Lassmann <timolassmann@gmail.com> Upstream author of Kalign. Charles Plessy <plessy@debian.org> Wrote the manpage. COPYRIGHT
Copyright (C) 2004, 2005, 2006, 2007, 2008 Timo Lassmann Kalign is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. This manual page was written by Charles Plessy <plessy@debian.org> for the Debian(TM) system (but may be used by others). Permission is granted to copy, distribute and/or modify this document under the same terms as kalign itself. On Debian systems, the complete text of the GNU General Public License version 2 can be found in /usr/share/common-licenses/GPL-2. kalign 2.04 February 25, 2009 KALIGN(1)

Check Out this Related Man Page

SIGMA(1)							  Manual of Sigma							  SIGMA(1)

NAME
sigma - Simple greedy multiple alignment of non-coding DNA sequences SYNOPSIS
sigma [options] [inputfile.fasta] [inputfile2.fasta ...] Each fasta file may contain a single sequence or multiple sequences; all sequences will be aligned together. DESCRIPTION
Sigma ("Simple greedy multiple alignment") is an alignment program with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. It uses a strategy of seeking the best possible gapless local alignments, at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. OPTIONS
-A --aligned_output Aligned, pretty-printed output (compare with -F option) (default: only this). See also -C. -b --bgprobfile filename Auxiliary file (in fasta format) from which to read background sequences (overridden by -B). Typically this is a file containing large quantities of similar non-coding sequence, from which background probabilities of single- and di-nucleotides may be estimated. -B --bgseqfile filename File containing background probabilities. The format is described further below. -C --caps_only Use only upper-case letters in output sequence, for compatibility with output of some other programs like ClustalW and MLagan. By default, output is mixed-case (as in Dialign), and lower-case bases are treated as not aligned. -F --fasta_output Multi-fasta output (can use both -A and -F in either order). See also -C. -n --ncorrel number Background correlation (default 2=dinucleotide; 1=single-site basecounts, 0=0.25 per base). -x, --significance number Set limit for how probable the match is by chance (default 0.002, smaller=more stringent). -h, --help Displays this list of options. MORE HELP
The "significance" parameter (-x) determines whether local alignments are accepted or rejected. The default at present is 0.002. Experiments on synthetic data (described in the paper) suggest that 0.002 is about the threshold where sigma fails to align phylogenetically-unrelated data that has moderate (yeast-like) dinucleotide correlation. Using a "background model" appropriate to the sequences being aligned greatly reduces spurious alignments on synthetic data (and, one hopes, on real data too). The simplest way to ensure this is to supply, via the -b parameter, a FASTA-format file containing large quantities of similar sequence data (eg, if one is aligning yeast sequences, supply a file containing all intergenic yeast sequence). Instead of this, if the single-site and dinucleotide frequencies are known already, they may be supplied in a file via the -B option. The file format should be: one entry per line, with the mononucleotide or dinucleotide (case-insensitive) followed by the frequency. (eg, "A 0.3", "AT 0.16", etc on successive lines.) A sample file is in the "Background" subdirectory of the source distribution (on Debian systems, this file can be found in the /usr/share/doc/sigma-align/Background directory). A file like "yeast.nc.3.freq" in the "tests" subdirectory of the MEME source distribution works fine (trinucleotide counts are ignored). REFERENCE
Please cite Sigma: Rahul Siddharthan (2006) Multiple alignment of weakly-conserved non-coding DNA sequence BMC Bioinformatics 2006, 7:143 doi:10.1186/1471-2105-7-143 Published 16 March 2006, available online at http://www.biomedcentral.com/1471-2105/7/143/ AUTHORS
Rahul Siddharthan <rsidd@imsc.res.in> Wrote sigma. If you're using Sigma for actual research, please let the author know so that he can alert you of bugfixes or new releases. Charles Plessy <charles-debian-nospam@plessy.org> Wrote the manpage in DocBook XML for the Debian distribution. COPYRIGHT
Copyright (C) 2006-2007 Rahul Siddharthan Copyright (C) 2006-2007 Charles Plessy Sigma is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. sigma 1.1 2007-04-07 SIGMA(1)
Man Page