Extract sequences from a FASTA file based on another file Post: 302881700

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ...

2. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that?

3. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937...

4. Shell Programming and Scripting

Extract the part of sequences from a file

I have a text file, input.fasta contains some protein sequences. input.fasta is shown below. >P02649 MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY...

5. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to...

6. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2...

7. UNIX for Dummies Questions & Answers

Select distinct sequences from fasta file and list

Hi How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this: >H8V34IS02I59VP SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA...

8. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2...

9. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

I have a fasta file as follows >sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3 MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT...

10. UNIX for Beginners Questions & Answers

Is it possible to rename fasta headers based on its position specified in another file?

I have 5 sequences in a fasta file namely gene1.fasta as follows, gene1.fasta >1256 ATGTAGC >GEP TAGAG >GTY578 ATGCATA >67_iga ATGCTGA >90_ld ATGCTG I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows, list.txt position1=org5...

LEARN ABOUT CENTOS

paps

PAPS(1) 						      General Commands Manual							   PAPS(1)

NAME

       paps - UTF-8 to PostScript converter using Pango

SYNOPSIS

       paps [options] files...

DESCRIPTION

       paps reads a UTF-8 encoded file and generates a PostScript language rendering of the file. The rendering is done by creating outline curves
       through the pango ft2 backend.

OPTIONS

       These programs follow the usual GNU command line syntax, with long options starting with  two  dashes  (`-').   A  summary  of  options	is
       included below.

       --landscape
	      Landscape output. Default is portrait.

       --columns=cl
	      Number of columns output. Default is 1.

       --font=desc
	      Set the font description. Default is Monospace 12.

       --rtl  Do rtl layout.

       --paper ps
	      Choose paper size. Known paper sizes are legal, letter, a4. Default is A4.

       --bottom-margin=bm
	      Set bottom margin in postscript points (1/72 inch). Default is 36.

       --top-margin=tm
	      Set top margin. Default is 36.

       --left-margin=lm
	      Set left margin. Default is 36.

       --right-margin=rm
	      Set right margin. Default is 36.

       --help Show summary of options.

       --header
	      Draw page header for each page.

       --markup
	      Interpret the text as pango markup.

       --encoding=ENCODING
	      Assume the documentation encoding is ENCODING.

       --lpi  Set the lines per inch. This determines the line spacing.

       --cpi  Set the characters per inch. This is an alternative method of specifying the font size.

       --stretch-chars
	      Indicates  that  characters  should be stretched in the y-direction to fill up their vertical space. This is similar to the texttops
	      behaviour.

AUTHOR

       paps was written by Dov Grobgeld <dov.grobgeld@gmail.com>.

       This manual page was written by Lior Kaplan <kaplan@debian.org>, for the Debian project (but may be used by others).

								  April  17, 2006							   PAPS(1)

awk -F"[><pipe>]"	split input file into fields using ">" and "<pipe>" as delimiters
'NR==FNR{	execute the code that follows for the first file (File2)
sub(".$",":&",$0);	insert ":" before the last character in the line ($0)
a[$0]=1}	populate "a" associative array with modified line as a key
/^>/&&($2 in a){p=1}	set "p" to "1" if line starts with ">" and second field can be found in "a" array
/^>/&&!($2 in a){p=0}	set "p" to "0" if line starts with ">" and second field cannot be found in "a" array
p'	print line if "p" is greater than "0"
File2 File1	process File2 first, then File1

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract sequences based on the list

Discussion started by: Diya123

2. Shell Programming and Scripting

Extract length wise sequences from fastq file

Discussion started by: empyrean

3. Shell Programming and Scripting

Extract sequence from fasta file

Discussion started by: ritakadm

4. Shell Programming and Scripting

Extract the part of sequences from a file

Discussion started by: rahim42