Sponsored Content
Top Forums Shell Programming and Scripting Extract sequences from a FASTA file based on another file Post 302881691 by bartus11 on Tuesday 31st of December 2013 08:21:57 AM
Old 12-31-2013
Try:
Code:
awk -F"[>|]" 'NR==FNR{sub(".$",":&",$0);a[$0]=1}/^>/&&($2 in a){p=1}/^>/&&!($2 in a){p=0}p' File2 File1

This User Gave Thanks to bartus11 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ... (2 Replies)
Discussion started by: Diya123
2 Replies

2. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

3. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

4. Shell Programming and Scripting

Extract the part of sequences from a file

I have a text file, input.fasta contains some protein sequences. input.fasta is shown below. >P02649 MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY... (8 Replies)
Discussion started by: rahim42
8 Replies

5. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
Discussion started by: margarita
5 Replies

6. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2... (3 Replies)
Discussion started by: alexypaul
3 Replies

7. UNIX for Dummies Questions & Answers

Select distinct sequences from fasta file and list

Hi How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this: >H8V34IS02I59VP SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA... (6 Replies)
Discussion started by: Marion MPI
6 Replies

8. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2... (3 Replies)
Discussion started by: Ibk
3 Replies

9. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

I have a fasta file as follows >sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3 MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT... (3 Replies)
Discussion started by: jerrild
3 Replies

10. UNIX for Beginners Questions & Answers

Is it possible to rename fasta headers based on its position specified in another file?

I have 5 sequences in a fasta file namely gene1.fasta as follows, gene1.fasta >1256 ATGTAGC >GEP TAGAG >GTY578 ATGCATA >67_iga ATGCTGA >90_ld ATGCTG I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows, list.txt position1=org5... (5 Replies)
Discussion started by: dineshkumarsrk
5 Replies
FASTA_FORMATTER(1)						   User Commands						FASTA_FORMATTER(1)

NAME
fasta_formatter - changes the width of sequences line in a FASTA file DESCRIPTION
usage: fasta_formatter [-h] [-i INFILE] [-o OUTFILE] [-w N] [-t] [-e] Part of FASTX Toolkit 0.0.13.2 by gordon@cshl.edu [-h] = This helpful help screen. [-i INFILE] = FASTA/Q input file. default is STDIN. [-o OUTFILE] = FASTA/Q output file. default is STDOUT. [-w N] = max. sequence line width for output FASTA file. When ZERO (the default), sequence lines will NOT be wrapped - all nucleotides of each sequences will appear on a single line (good for scripting). [-t] = Output tabulated format (instead of FASTA format). Sequence-Identifiers will be on first column, Nucleotides will appear on second column (as single line). [-e] = Output empty sequences (default is to discard them). Empty sequences are ones who have only a sequence identifier, but not actual nucleotides. Input Example: >MY-ID AAAAAGGGGG CCCCCTTTTT AGCTN Output example with unlimited line width [-w 0]: >MY-ID AAAAAGGGGGCCCCCTTTTTAGCTN Output example with max. line width=7 [-w 7]: >MY-ID AAAAAGG GGGTTTT TCCCCCA GCTN Output example with tabular output [-t]: MY-ID AAAAAGGGGGCCCCCTTTTAGCTN example of empty sequence: (will be discarded unless [-e] is used) >REGULAR-SEQUENCE-1 AAAGGGTTTCCC >EMPTY-SEQUENCE >REGULAR-SEQUENCE-2 AAGTAGTAGTAGTAGT GTATTTTATAT SEE ALSO
The quality of this automatically generated manpage might be insufficient. It is suggested to visit http://hannonlab.cshl.edu/fastx_toolkit/commandline.html to get a better layout as well as an overview about connected FASTX tools. fasta_formatter 0.0.13.2 May 2012 FASTA_FORMATTER(1)
All times are GMT -4. The time now is 10:27 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy