Extract sequences from a FASTA file based on another file

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Is it possible to rename fasta headers based on its position specified in another file?

I have 5 sequences in a fasta file namely gene1.fasta as follows, gene1.fasta >1256 ATGTAGC >GEP TAGAG >GTY578 ATGCATA >67_iga ATGCTGA >90_ld ATGCTG I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows, list.txt position1=org5...

2. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

I have a fasta file as follows >sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3 MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT...

3. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2...

4. UNIX for Dummies Questions & Answers

Select distinct sequences from fasta file and list

Hi How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this: >H8V34IS02I59VP SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA...

5. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2...

6. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to...

7. Shell Programming and Scripting

Extract the part of sequences from a file

I have a text file, input.fasta contains some protein sequences. input.fasta is shown below. >P02649 MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY...

8. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937...

9. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that?

10. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ...

LEARN ABOUT DEBIAN

fastx_clipper

FASTX_CLIPPER(1)						   User Commands						  FASTX_CLIPPER(1)

NAME

       fastx_clipper - FASTA/Q Clipper

DESCRIPTION

       usage:  fastx_clipper  [-h]  [-a  ADAPTER]  [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE] Part of FASTX Toolkit
       0.0.13.2 by A. Gordon (gordon@cshl.edu)

       [-h]   = This helpful help screen.

	      [-a ADAPTER] = ADAPTER string. default is CCTTAAGG (dummy adapter).  [-l N]	= discard sequences shorter  than  N  nucleotides.
	      default is 5.  [-d N]	  = Keep the adapter and N bases after it.

	      (using '-d 0' is the same as not using '-d' at all. which is the default).

       [-c]   = Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter).

       [-C]   = Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter).

       [-k]   = Report Adapter-Only sequences.

       [-n]   = keep sequences with unknown (N) nucleotides. default is to discard such sequences.

       [-v]   = Verbose - report number of sequences.

       If [-o] is specified,
	      report will be printed to STDOUT.

	      If [-o] is not specified (and output goes to STDOUT), report will be printed to STDERR.

       [-z]   = Compress output with GZIP.

       [-D]   = DEBUG output.

       [-M N] = require minimum adapter alignment length of N.

       If less than N nucleotides aligned with the adapter - don't clip it.
	      [-i INFILE]  = FASTA/Q input file. default is STDIN.

	      [-o OUTFILE] = FASTA/Q output file. default is STDOUT.

SEE ALSO

       The quality of this automatically generated manpage might be insufficient.  It is suggested to visit

	      http://hannonlab.cshl.edu/fastx_toolkit/commandline.html

       to get a better layout as well as an overview about connected FASTX tools.

fastx_clipper 0.0.13.2						     May 2012							  FASTX_CLIPPER(1)

Shell Programming and Scripting