Select distinct sequences from fasta file and list Post: 302918637

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

select distinct row from a file

Hi, buddies out there. I have a text file ( only one column ) which I created using vi editor. The file contains duplicate rows and I would like to select distinct rows, how to go on it using unix command: file content = apple apple orange watermelon apple orange Can it be done...

2. Shell Programming and Scripting

Select distinct values from a flat file

Hi , I have a similar problem. Please can anyone help me with a shell script or a perl. I have a flat file like this fruit country apple germany apple india banana pakistan banana saudi mango india I want to get a output like fruit country apple ...

3. Shell Programming and Scripting

Select distinct rows in a file by last column

Hi, I have the following file: LOG:015608::ERR:2310:map_spsrec:Invalid parameter LOG:015608::ERR:2471:map_dgdrec:Invalid parameter LOG:015608::ERR:2487:map_nnmrec:Invalid number LOG:015608::ERR:2310:map_nmrec:Invalid number LOG:015608::ERR:2438:map_nmrec:Invalid number As a delimiter I...

4. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to...

5. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

6. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2...

7. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2...

8. UNIX for Beginners Questions & Answers

How to count the length of fasta sequences?

I could calculate the length of entire fasta sequences by following command, awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta But, I need to calculate the length of a particular fasta sequence specified/listed in another txt file. The results to to be...

9. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

I have a fasta file as follows >sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3 MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT...

10. UNIX for Beginners Questions & Answers

How to add specific bases at the beginning and ending of all the fasta sequences?

Hi, I have to add 7 bases of specific nucleotide at the beginning and ending of all the fasta sequences of a file. For example, I have a multi fasta file namely test.fasta as given below test.fasta >TalAA18_Xoo_CIAT_NZ_CP033194.1:_2936369-2939570:+1...

LEARN ABOUT DEBIAN

fastacmd

FASTACMD(1)						     NCBI Tools User's Manual						       FASTACMD(1)

NAME

       fastacmd - retrieve FASTA sequences from a BLAST database

SYNOPSIS

       fastacmd [-] [-D N] [-I] [-L start,stop] [-P N] [-S N] [-T] [-a] [-c] [-d str] [-i str] [-l N] [-o filename] [-p type] [-s str] [-t]

DESCRIPTION

       fastacmd retrieves FASTA formatted sequences from a blast(1) database formatted using the `-o' option.  An example fastacmd call would be

								fastacmd -d nr -s p38398

OPTIONS

       A summary of options is included below.

       -      Print usage message

       -D N   Dump the entire database in some format:
	      1      fasta
	      2      GI list
	      3      Accession.version list

       -I     Print database information only (overrides all other options)

       -L start,stop
	      Range of sequence to extract (0 in start is beginning of sequence, 0 in stop is end of sequence, default is whole sequence)

       -P N   Retrieve sequences with Protein Identification Group (PIG) N.

       -S N   Strand on subsequence (nucleotide only):
	      1      top (default)
	      2      bottom

       -T     Print taxonomic information for requested sequence(s)

       -a     Retrieve duplicate accessions

       -c     Use ^A (01) as non-redundant defline separator

       -d str Database (default is nr)

       -i str Input file with GIs/accessions/loci for batch retrieval

       -l N   Line length for sequence (default = 80)

       -o filename
	      Output file (default = stdout)

       -p type
	      Type of file:
	      G      guess (default): look for protein, then nucleotide
	      T      protein
	      F      nucleotide

       -s str Comma-delimited search string(s).  GIs, accessions, loci, or fullSeq-id strings may be used, e.g., 555, AC147927, 'gnl|dbname|tag'

       -t     Definition line should contain target GI only

EXIT STATUS

	      0      Completed successfully.
	      1      An error (other than those below) occurred.
	      2      The BLAST database was not found.
	      3      A search (accession, GI, or taxonomy info) failed.
	      4      No taxonomy database was found.

AUTHOR

       The National Center for Biotechnology Information.

SEE ALSO

       blast(1), /usr/share/doc/blast2/fastacmd.html.

NCBI
								    2005-11-04							       FASTACMD(1)