sfetch(1) [debian man page]

sfetch(1)							  Biosquid Manual							 sfetch(1)

NAME

       sfetch - get a sequence from a flatfile database.

SYNOPSIS

       sfetch [options] seqname

DESCRIPTION

       sfetch retrieves the sequence named seqname from a sequence database.

       Which  database	is used is controlled by the -d and -D options, or "little databases" and "big databases".  The directory location of "big
       databases" can be specified by environment variables, such as $SWDIR for Swissprot, and $GBDIR for Genbank (see -D for complete	list).	 A
       complete  file  path  must be specified for "little databases".	By default, if neither option is specified and the name looks like a Swis-
       sprot identifier (e.g. it has a _ character), the $SWDIR environment variable is used to attempt to  retrieve  the  sequence  seqname  from
       Swissprot.

       A  variety  of  other options are available which allow retrieval of subsequences (-f,-t); retrieval by accession number instead of by name
       (-a); reformatting the extracted sequence into a variety of other formats (-F); etc.

       If the database has been SSI indexed, sequence retrieval will be extremely efficient; else, retrieval may be  painfully	slow  (the  entire
       database  may  have to be read into memory to find seqname).  SSI indexing is recommended for all large or permanent databases. The program
       sindex creates SSI indexes for any sequence file.

       sfetch was originally named getseq, and was renamed because it clashed with a GCG program of the same name.

OPTIONS

       -a     Interpret seqname as an accession number, not an identifier.

       -d <seqfile>
	      Retrieve the sequence from a sequence file named <seqfile>.  If a GSI index <seqfile>.gsi  exists,  it  is  used	to  speed  up  the
	      retrieval.

       -f <from>
	      Extract  a  subsequence  starting from position <from>, rather than from 1. See -t.  If <from> is greater than <to> (as specified by
	      the -t option), then the sequence is extracted as its reverse complement (it is assumed to be nucleic acid sequence).

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -o <outfile>
	      Direct the output to a file named <outfile>.  By default, output would go to stdout.

       -r <newname>
	      Rename the sequence <newname> in the output after extraction. By default, the original sequence identifier would be  retained.  Use-
	      ful,  for instance, if retrieving a sequence fragment; the coordinates of the fragment might be added to the name (this is what Pfam
	      does).

       -t <to>
	      Extract a subsequence that ends at position <to>, rather than at the end of the sequence. See -f.  If <to> is less than  <from>  (as
	      specified by the -f option), then the sequence is extracted as its reverse complement (it is assumed to be nucleic acid sequence)

       -D <database>
	      Retrieve	the sequence from the main sequence database coded <database>. For each code, there is an environment variable that speci-
	      fies the directory path to that database.  Recognized codes and their  corresponding  environment  variables  are  -Dsw  (Swissprot,
	      $SWDIR);	-Dpir  (PIR,  $PIRDIR); -Dem (EMBL, $EMBLDIR); -Dgb (Genbank, $GBDIR); -Dwp (Wormpep, $WORMDIR); and -Dowl (OWL, $OWLDIR).
	      Each database is read in its native flatfile format.

       -F <format>
	      Reformat the extracted sequence into a different format.	(By default, the sequence is extracted from the database in the same  for-
	      mat as the database.) Available formats are embl, fasta, genbank, gcg, strider, zuker, ig, pir, squid, and raw.

EXPERT OPTIONS

       --informat <s>
	      Specify  that the sequence file is in format <s>, rather than the default FASTA format.  Common examples include Genbank, EMBL, GCG,
	      PIR, Stockholm, Clustal, MSF, or PHYLIP; see the printed documentation for a complete list of accepted format  names.   This  option
	      overrides the default format (FASTA) and the -B Babelfish autodetection option.

SEE ALSO

       afetch(1),  alistat(1),	compalign(1),  compstruct(1),  revcomp(1),  seqsplit(1),  seqstat(1), shuffle(1), sindex(1), sreformat(1), strans-
       late(1), weight(1).

AUTHOR

       Biosquid and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under  the  GNU
       General Public License (GPL) See COPYING in the source code distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu

Biosquid 1.9g							   January 2003 							 sfetch(1)
Linux and UNIX Man Pages

sfetch(1) [debian man page]