mummer(1) [debian man page]

MUMMER(1)						      General Commands Manual							 MUMMER(1)

NAME

       mummer - package for sequence alignment of multiple genomes

SYNOPSIS

       mummer-annotate <gapfile><datafile>
       combineMUMs <RefSequence><MatchSequences><GapsFile>
       delta-filter [options]<deltafile>
       dnadiff [options]<reference><query> or [options]-d<deltafile>
       exact-tandems <file><min-match-len>
       gaps
       mapview [options]<coordsfile>[UTRcoords][CDScoords]
       mgaps [-d<DiagDiff>][-f<DiagFactor>][-l<MatchLen>][-s<MaxSeparation>]
       mummer [options]<reference-file><query-files>
       mummerplot [options]<matchfile>
       nucmer [options]<Reference><Query>
       nucmer2xfig
       promer [options]<Reference><Query>
       repeat-match [options]<genome-file>
       run-mummer1 <fastareference><fastaquery><prefix>[-r]
       run-mummer3 <fastareference><multi-fastaquery><prefix>
       show-aligns [options]<deltafile><refID><qryID>

       Input is the .delta output of either the "nucmer" or the "promer" program passed on the command line.

       Output is to stdout, and consists of all the alignments between the query and reference sequences identified on the command line.

       NOTE: No sorting is done by default, therefore the alignments will be ordered as found in the <deltafile> input.
       show-coords [options]<deltafile>
       show-snps [options]<deltafile>
       show-tiling [options]<deltafile>

DESCRIPTION

OPTIONS

       All tools (exept for gaps) obey to the -h, --help, -V and --version options as one would expect. This help is excellent and makes these man
       pages basically obsolete.
       combineMUMs Combines MUMs in <GapsFile> by extending matches off ends and between MUMs.	<RefSequence> is a fasta  file	of  the  reference
       sequence.  <MatchSequences> is a multi-fasta file of the sequences matched against the reference

	 -D	 Only output to stdout the difference positions
		 and characters
	 -n	 Allow matches only between nucleotides, i.e., ACGTs
	 -N num  Break matches at <num> or more consecutive non-ACGTs
	 -q tag  Used to label query match
	 -r tag  Used to label reference match
	 -S	 Output all differences in strings
	 -t	 Label query matches with query fasta header
	 -v num  Set verbose level for extra output
	 -W file Reset the default output filename witherrors.gaps
	 -x	 Don't output .cover files
	 -e	 Set error-rate cutoff to e (e.g. 0.02 is two percent)
       dnadiff	Run  comparative  analysis  of two sequence sets using nucmer and its associated utilities with recommended parameters. See MUMmer
       documentation for a more detailed description of the output. Produces the following output files:

	   .report  - Summary of alignments, differences and SNPs
	   .delta   - Standard nucmer alignment output
	   .1delta  - 1-to-1 alignment from delta-filter -1
	   .mdelta  - M-to-M alignment from delta-filter -m
	   .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
	   .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
	   .snps    - SNPs from show-snps -rlTHC .1delta
	   .rdiff   - Classified ref breakpoints from show-diff -rH .mdelta
	   .qdiff   - Classified qry breakpoints from show-diff -qH .mdelta
	   .unref   - Unaligned reference IDs and lengths (if applicable)
	   .unqry   - Unaligned query IDs and lengths (if applicable)

       MANDATORY:
	   reference	   Set the input reference multi-FASTA filename
	   query	   Set the input query multi-FASTA filename
	     or
	   delta file	   Unfiltered .delta alignment file from nucmer

       OPTIONS:
	   -d|delta	   Provide precomputed delta file for analysis
	   -h
	   --help	   Display help information and exit
	   -p|prefix	   Set the prefix of the output files (default "out")
	   -V
	   --version	   Display the version information and exit

       delta-filter
	 -e float    For switches -g -r -q, keep repeats within e percent
		     of the best LIS score [0, 100], no repeats by default
	 -g	     Global alignment using length*identity weighted LIS.
		     For every reference-query pair, leave only the aligns
		     which form the longest mutually consistent set
	 -h	     Display help information
	 -i float    Set the minimum alignment identity [0, 100], default 0
	 -l int      Set the minimum alignment length, default 0
	 -q	     Query alignment using length*identity weighted LIS.
		     For each query, leave only the aligns which form the
		     longest consistent set for the query
	 -r	     Reference alignment using length*identity weighted LIS.
		     For each reference, leave only the aligns which form
		     the longest consistent set for the reference
	 -u float    Set the minimum alignment uniqueness, i.e. percent of
		     the alignment matching to unique reference AND query
		     sequence [0, 100], default 0
	 -o float    Set the maximum alignment overlap for -r and -q options
		     as a percent of the alignment length [0, 100], default 100

	 Reads a delta alignment file from either nucmer or promer and filters the alignments based on the command-line switches, leaving only the
       desired	alignments which are output to stdout in the same delta format as the input. For multiple switches, order of operations is as fol-
       lows: -i -l -u -q -r -g. If an alignment is excluded by a preceding operation, it will be ignored by the succeeding operations

	 An important distinction between the -g option and the -r -q options is that -g requires the alignments  to  be  mutually  consistent	in
       their  order,  while  the  -r -q options are not required to be mutually consistent and therefore tolerate translocations, inversions, etc.
       Thus, -r provides a one-to-many, -q a many-to-one, -r -q a one-to-one local mapping, and -g a one-to-one global mapping	of  reference  and
       query bases respectively.
       mapview
	 -h
	 --help   Display help information and exit
	 -m|mag   Set the magnification at which the figure is rendered,
		  this is an option for fig2dev which is used to generate
		  the PDF and PS files (default 1.0)
	 -n|num   Set the number of output files used to partition the
		  output, this is to avoid generating files that are too
		  large to display (default 10)
	 -p|prefix  Set the output file prefix
		  (default "PROMER_graph or NUCMER_graph")
	 -v
	 --verbose  Verbose logging of the processed files
	 -V
	 --version  Display the version information and exit
	 -x1 coord  Set the lower coordinate bound of the display
	 -x2 coord  Set the upper coordinate bound of the display
	 -g|ref     If the input file is provided by 'mgaps', set the
		    reference sequence ID (as it appears in the first column
		    of the UTR/CDS coords file)
	 -I	    Display the name of query sequences
	 -Ir	    Display the name of reference genes
       mummer  Find  and  output  (to stdout) the positions and length of all sufficiently long maximal matches of a substring in <query-file> and
       <reference-file>

	 -mum		compute maximal matches that are unique in both sequences
	 -mumcand	same as -mumreference
	 -mumreference	compute maximal matches that are unique in
		  the reference-sequence but not necessarily		in the query-sequence (default)
	 -maxmatch	compute all maximal matches regardless of their uniqueness
	 -n		match only the characters a, c, g, or t
			they can be in upper or in lower case
	 -l		set the minimum length of a match
			if not set, the default value is 20
	 -b		compute forward and reverse complement matches
	 -r		only compute reverse complement matches
	 -s		show the matching substrings
	 -c		report the query-position of a reverse complement match
			relative to the original query sequence
	 -F		force 4 column output format regardless of the number of
			reference sequence inputs
	 -L		show the length of the query sequences on the header line
       nuncmer
	   nucmer generates nucleotide alignments between two mutli-FASTA input
	   files. Two output files are generated. The .cluster output file lists
	   clusters of matches between each sequence. The .delta file lists the
	   distance between insertions and deletions that produce maximal scoring
	   alignments between each sequence.

       MANDATORY:
	   Reference	 Set the input reference multi-FASTA filename
	   Query	 Set the input query multi-FASTA filename

	 --mum		 Use anchor matches that are unique in both the reference
			 and query
	 --mumcand	 Same as --mumreference
	 --mumreference  Use anchor matches that are unique in in the reference
			 but not necessarily unique in the query (default behavior)
	 --maxmatch	 Use all anchor matches regardless of their uniqueness

	 -b|breaklen	 Set the distance an alignment extension will attempt to
			 extend poor scoring regions before giving up (default 200)
	 -c|mincluster	 Sets the minimum length of a cluster of matches (default 65)
	 --[no]delta	 Toggle the creation of the delta file (default --delta)
	 --depend	 Print the dependency information and exit
	 -d|diagfactor	 Set the clustering diagonal difference separation factor
			 (default 0.12)
	 --[no]extend	 Toggle the cluster extension step (default --extend)
	 -f
	 --forward	 Use only the forward strand of the Query sequences
	 -g|maxgap	 Set the maximum gap between two adjacent matches in a
			 cluster (default 90)
	 -h
	 --help 	 Display help information and exit
	 -l|minmatch	 Set the minimum length of a single match (default 20)
	 -o
	 --coords	 Automatically generate the original NUCmer1.1 coords
			 output file using the 'show-coords' program
	 --[no]optimize  Toggle alignment score optimization, i.e. if an alignment
			 extension reaches the end of a sequence, it will backtrack
			 to optimize the alignment score instead of terminating the
			 alignment at the end of the sequence (default --optimize)
	 -p|prefix	 Set the prefix of the output files (default "out")
	 -r
	 --reverse	 Use only the reverse complement of the Query sequences
	 --[no]simplify  Simplify alignments by removing shadowed clusters. Turn
			 this option off if aligning a sequence to itself to look
			 for repeats (default --simplify)

       promer
	   promer generates amino acid alignments between two mutli-FASTA DNA input
	   files. Two output files are generated. The .cluster output file lists
	   clusters of matches between each sequence. The .delta file lists the
	   distance between insertions and deletions that produce maximal scoring
	   alignments between each sequence. The DNA input is translated into all 6
	   reading frames in order to generate the output, but the output coordinates
	   reference the original DNA input.

       MANDATORY:
	   Reference	 Set the input reference multi-FASTA DNA file
	   Query	 Set the input query multi-FASTA DNA file

	 --mum		 Use anchor matches that are unique in both the reference
			 and query
	 --mumcand	 Same as --mumreference
	 --mumreference  Use anchor matches that are unique in in the reference
			 but not necessarily unique in the query (default behavior)
	 --maxmatch	 Use all anchor matches regardless of their uniqueness

	 -b|breaklen	 Set the distance an alignment extension will attempt to
			 extend poor scoring regions before giving up, measured in
			 amino acids (default 60)
	 -c|mincluster	 Sets the minimum length of a cluster of matches, measured in
			 amino acids (default 20)
	 --[no]delta	 Toggle the creation of the delta file (default --delta)
	 --depend	 Print the dependency information and exit
	 -d|diagfactor	 Set the clustering diagonal difference separation factor
			 (default .11)
	 --[no]extend	 Toggle the cluster extension step (default --extend)
	 -g|maxgap	 Set the maximum gap between two adjacent matches in a
			 cluster, measured in amino acids (default 30)
	 -l|minmatch	 Set the minimum length of a single match, measured in amino
			 acids (default 6)
	 -m|masklen	 Set the maximum bookend masking lenth, measured in amino
			 acids (default 8)
	 -o
	 --coords	 Automatically generate the original PROmer1.1 ".coords"
			 output file using the "show-coords" program
	 --[no]optimize  Toggle alignment score optimization, i.e. if an alignment
			 extension reaches the end of a sequence, it will backtrack
			 to optimize the alignment score instead of terminating the
			 alignment at the end of the sequence (default --optimize)

	 -p|prefix	 Set the prefix of the output files (default "out")
	 -x|matrix	 Set the alignment matrix number to 1 [BLOSUM 45],
			 2 [BLOSUM 62] or 3 [BLOSUM 80] (default 2)
       repeat-match Find all maximal exact matches in <genome-file>
	 -E    Use exhaustive (slow) search to find matches
	 -f    Forward strand only, don't use reverse complement
	 -n #  Set minimum exact match length to #
	 -t    Only output tandem repeats
	 -V #  Set level of verbose (debugging) printing to #
       show-aligns
	 -h	 Display help information
	 -q	 Sort alignments by the query start coordinate
	 -r	 Sort alignments by the reference start coordinate
	 -w int  Set the screen width - default is 60
	 -x int  Set the matrix type - default is 2 (BLOSUM 62),
		 other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)
		 note: only has effect on amino acid alignments
       show-coords
	 -b	     Merges overlapping alignments regardless of match dir
		     or frame and does not display any idenitity information.
	 -B	     Switch output to btab format
	 -c	     Include percent coverage information in the output
	 -d	     Display the alignment direction in the additional
		     FRM columns (default for promer)
	 -g	     Deprecated option. Please use 'delta-filter' instead
	 -h	     Display help information
	 -H	     Do not print the output header
	 -I float    Set minimum percent identity to display
	 -k	     Knockout (do not display) alignments that overlap
		     another alignment in a different frame by more than 50%
		     of their length, AND have a smaller percent similarity
		     or are less than 75% of the size of the other alignment
		     (promer only)
	 -l	     Include the sequence length information in the output
	 -L long     Set minimum alignment length to display
	 -o	     Annotate maximal alignments between two sequences, i.e.
		     overlaps between reference and query sequences
	 -q	     Sort output lines by query IDs and coordinates
	 -r	     Sort output lines by reference IDs and coordinates
	 -T	     Switch output to tab-delimited format

	 Input is the .delta output of either the "nucmer" or the "promer" program passed on the command line.

	 Output is to stdout, and consists of a list of coordinates, percent identity, and other useful information regarding the  alignment  data
       contained in the .delta file used as input.

	 NOTE: No sorting is done by default, therefore the alignments will be ordered as found in the <deltafile> input.
       show-snps
	 -C	       Do not report SNPs from alignments with an ambiguous
		       mapping, i.e. only report SNPs where the [R] and [Q]
		       columns equal 0 and do not output these columns
	 -h	       Display help information
	 -H	       Do not print the output header
	 -I	       Do not report indels
	 -l	       Include sequence length information in the output
	 -q	       Sort output lines by query IDs and SNP positions
	 -r	       Sort output lines by reference IDs and SNP positions
	 -S	       Specify which alignments to report by passing
		       'show-coords' lines to stdin
	 -T	       Switch to tab-delimited format
	 -x int        Include x characters of surrounding SNP context in the
		       output, default 0

	 Input is the .delta output of either the nucmer or promer program passed on the command line.

	 Output  is to stdout, and consists of a list of SNPs (or amino acid substitutions for promer) with positions and other useful info.  Out-
       put will be sorted with -r by default and the [BUFF] column will always refer to the sequence whose positions have been sorted. This  value
       specifies  the  distance  from this SNP to the nearest mismatch (end of alignment, indel, SNP, etc) in the same alignment, while the [DIST]
       column specifies the distance from this SNP to the nearest sequence end. SNPs for which the [R] and [Q] columns are greater than  0  should
       be  evaluated  with caution, as these columns specify the number of other alignments which overlap this position. Use -C to assure SNPs are
       only reported from unique alignment regions.

       show-tiling
	 -a	     Describe the tiling path by printing the tab-delimited
		     alignment region coordinates to stdout
	 -c	     Assume the reference sequences are circular, and allow
		     tiled contigs to span the origin
	 -g int      Set maximum gap between clustered alignments [-1, INT_MAX]
		     A value of -1 will represent infinity
		     (nucmer default = 1000)
		     (promer default = -1)
	 -i float    Set minimum percent identity to tile [0.0, 100.0]
		     (nucmer default = 90.0)
		     (promer default = 55.0)
	 -l int      Set minimum length contig to report [-1, INT_MAX]
		     A value of -1 will represent infinity
		     (common default = 1)
	 -p file     Output a pseudo molecule of the query contigs to 'file'
	 -R	     Deal with repetitive contigs by randomly placing them
		     in one of their copy locations (implies -V 0)
	 -t file     Output a TIGR style contig list of each query sequence
		     that sufficiently matches the reference (non-circular)
	 -u file     Output the tab-delimited alignment region coordinates
		     of the unusable contigs to 'file'
	 -v float    Set minimum contig coverage to tile [0.0, 100.0]
		     (nucmer default = 95.0) sum of individual alignments
		     (promer default = 50.0) extent of syntenic region
	 -V float    Set minimum contig coverage difference [0.0, 100.0]
		     i.e. the difference needed to determine one alignment
		     is 'better' than another alignment
		     (nucmer default = 10.0) sum of individual alignments
		     (promer default = 30.0) extent of syntenic region
	 -x	     Describe the tiling path by printing the XML contig
		     linking information to stdout

	 Input is the .delta output of the nucmer program, run on very similar sequence data, or the .delta output of the promer program,  run	on
       divergent sequence data.

	 Output  is  to  stdout, and consists of the predicted location of each aligning query contig as mapped to the reference sequences.  These
       coordinates reference the extent of the entire query contig, even when only a certain percentage of the contig was actually aligned (unless
       the  -a	option	is used). Columns are, start in ref, end in ref, distance to next contig, length of this contig, alignment coverage, iden-
       tity, orientation, and ID respectively.

SEE ALSO

       http://mummer.sourceforge.net/

       Open source MUMmer 3.0 is described in
       Versatile and open software for comparing large genomes.  S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M.  Shumway,  C.  Antonescu,  and
       S.L. Salzberg, Genome Biology (2004), 5:R12.

AUTHOR

       mummer was written by S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.

								   May 21, 2005 							 MUMMER(1)
Linux and UNIX Man Pages

mummer(1) [debian man page]