Extract sequence from fasta file

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to find a specific sequence pattern in a fasta file?

I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position, AAGCZ-N16-AAGCZ Z represents A, C or G (Except T) N16 represents any of the four...

2. Shell Programming and Scripting

Extract distinc sequence of letters

Hallo, I need to extract distinct sequence of letters for example from 136 to 193 Files are quite big, so I would prefer not to use "fold -w1" Thank you very much Input file look like this: 1 cttttacctt catgtgtttt tgcagatatt tgttcataat aacatcttct ttttaagtta 61 ttaaaatctt...

3. Shell Programming and Scripting

Count and search by sequence in multiple fasta file

Hello, I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table...

4. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

5. UNIX for Dummies Questions & Answers

Change sequence names in fasta file

I have fasta files with multiple sequences in each. I need to change the sequence name headers from: >accD:_59176-60699 ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA >atpA_(reverse_strand):_showing_revcomp_of_10525-12048 ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC...

6. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT...

7. Shell Programming and Scripting

Parsing a fasta sequence with start and end coordinates

Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this? For Example Chr 1 is in following format I need regions from 2 - 10 should give me AATTCCAAA and in a similar way 15- 25 should give...

8. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat...

9. Shell Programming and Scripting

Extract Pattern Sequence

Dear Collegues I have to extract Some pattern from raw text file using perl The input will be raw text. Pattern to get - Sequence of Capital Letter Words ( e.g. he is working in Center for Perl Studies. He will come tomorrow...) from thos I have to extract sequences like "Center for Perl...

10. Shell Programming and Scripting

How to extract a sequence of n lines from a file

Hi I want to be able to extract a sequence of n lines from a file. ideas, commands and suggestions would be highly appreciated. Thanks

LEARN ABOUT DEBIAN

fastacmd

FASTACMD(1)						     NCBI Tools User's Manual						       FASTACMD(1)

NAME

       fastacmd - retrieve FASTA sequences from a BLAST database

SYNOPSIS

       fastacmd [-] [-D N] [-I] [-L start,stop] [-P N] [-S N] [-T] [-a] [-c] [-d str] [-i str] [-l N] [-o filename] [-p type] [-s str] [-t]

DESCRIPTION

       fastacmd retrieves FASTA formatted sequences from a blast(1) database formatted using the `-o' option.  An example fastacmd call would be

								fastacmd -d nr -s p38398

OPTIONS

       A summary of options is included below.

       -      Print usage message

       -D N   Dump the entire database in some format:
	      1      fasta
	      2      GI list
	      3      Accession.version list

       -I     Print database information only (overrides all other options)

       -L start,stop
	      Range of sequence to extract (0 in start is beginning of sequence, 0 in stop is end of sequence, default is whole sequence)

       -P N   Retrieve sequences with Protein Identification Group (PIG) N.

       -S N   Strand on subsequence (nucleotide only):
	      1      top (default)
	      2      bottom

       -T     Print taxonomic information for requested sequence(s)

       -a     Retrieve duplicate accessions

       -c     Use ^A (01) as non-redundant defline separator

       -d str Database (default is nr)

       -i str Input file with GIs/accessions/loci for batch retrieval

       -l N   Line length for sequence (default = 80)

       -o filename
	      Output file (default = stdout)

       -p type
	      Type of file:
	      G      guess (default): look for protein, then nucleotide
	      T      protein
	      F      nucleotide

       -s str Comma-delimited search string(s).  GIs, accessions, loci, or fullSeq-id strings may be used, e.g., 555, AC147927, 'gnl|dbname|tag'

       -t     Definition line should contain target GI only

EXIT STATUS

	      0      Completed successfully.
	      1      An error (other than those below) occurred.
	      2      The BLAST database was not found.
	      3      A search (accession, GI, or taxonomy info) failed.
	      4      No taxonomy database was found.

AUTHOR

       The National Center for Biotechnology Information.

SEE ALSO

       blast(1), /usr/share/doc/blast2/fastacmd.html.

NCBI
								    2005-11-04							       FASTACMD(1)

Shell Programming and Scripting