Sponsored Content
Top Forums Shell Programming and Scripting Extract sequences from a FASTA file based on another file Post 302881703 by kurumi on Tuesday 31st of December 2013 10:31:58 AM
Old 12-31-2013
if you have Ruby
Code:
f2 = File.open("file2").readlines.map(&:strip)      # read file2 , store in array
File.open("file1").read.split(">").each do |record|  
  if not record.eql?("") and 
    f2.include?( record[0..3] + record[5] )         # if array contains first 4 chars and the 6th char, print
     puts ">#{record}"   
  end
end

result
Code:
# ruby test.rb
>153L:B|PDBID|CHAIN|SEQUENCE
RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL
KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM
DIGTTHDDYANDVVARAQYYKQHGY
>16VP:A|PDBID|CHAIN|SEQUENCE
SRMPSPPMPVPPAALFNRLLDDLGFSAGPALCTMLDTWNEDLFSALPTNADLYRECKFLSTLPSDVVEWGDAYVPERTQI
DIRAHGDVAFPTLPATRDGLGLYYEALSRFFHAELRAREESYRTVLANFCSALYRYLRASVRQLHRQAHMRGRDRDLGEM
LRATIADRYYRETARLARVLFLHLYLFLTREILWAAYAEQMMRPDLFDCLCCDLESWRQLAGLFQPFMFVNGALTVRGVP
IEARRLRELNHIREHLNLPLVRSAATEEPGAPLTTPPTLHGNQARASGYFMVLIRAKLDSYSSFTTSPSEAVMREHAYSR
APTKNNYGSTIEGLLDLPDDDAPEEAGLAAPRLSFLPAGHTRRLST
>1A04:B|PDBID|CHAIN|SEQUENCE
SNQEPATILLIDDHPMLRTGVKQLISMAPDITVVGEASNGEQGIELAESLDPDLILLDLNMPGMNGLETLDKLREKSLSG
RIVVFSVSNHEEDVVTALKRGADGYLLKDMEPEDLLKALHQAAAGEMVLSEALTPVLAASLRANRATTERDVNQLTPRER
DILKLIAQGLPNKMIARRLDITESTVKVHVKHMLKKMKLKSRVEAAVWVHQERIF

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ... (2 Replies)
Discussion started by: Diya123
2 Replies

2. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

3. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

4. Shell Programming and Scripting

Extract the part of sequences from a file

I have a text file, input.fasta contains some protein sequences. input.fasta is shown below. >P02649 MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY... (8 Replies)
Discussion started by: rahim42
8 Replies

5. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
Discussion started by: margarita
5 Replies

6. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2... (3 Replies)
Discussion started by: alexypaul
3 Replies

7. UNIX for Dummies Questions & Answers

Select distinct sequences from fasta file and list

Hi How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this: >H8V34IS02I59VP SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA... (6 Replies)
Discussion started by: Marion MPI
6 Replies

8. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2... (3 Replies)
Discussion started by: Ibk
3 Replies

9. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

I have a fasta file as follows >sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3 MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT... (3 Replies)
Discussion started by: jerrild
3 Replies

10. UNIX for Beginners Questions & Answers

Is it possible to rename fasta headers based on its position specified in another file?

I have 5 sequences in a fasta file namely gene1.fasta as follows, gene1.fasta >1256 ATGTAGC >GEP TAGAG >GTY578 ATGCATA >67_iga ATGCTGA >90_ld ATGCTG I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows, list.txt position1=org5... (5 Replies)
Discussion started by: dineshkumarsrk
5 Replies
DBASE_ADD_RECORD(3)							 1						       DBASE_ADD_RECORD(3)

dbase_add_record - Adds a record to a database

SYNOPSIS
bool dbase_add_record (int $dbase_identifier, array $record) DESCRIPTION
Adds the given data to the database. PARAMETERS
o $dbase_identifier - The database link identifier, returned by dbase_open(3) or dbase_create(3). o $record - An indexed array of data. The number of items must be equal to the number of fields in the database, otherwise dbase_add_record(3) will fail. Note If you're using dbase_get_record(3) return value for this parameter, remember to reset the key named deleted. RETURN VALUES
Returns TRUE on success or FALSE on failure. EXAMPLES
Example #1 Inserting a record in a dBase database <?php // open in read-write mode $db = dbase_open('/tmp/test.dbf', 2); if ($db) { dbase_add_record($db, array( date('Ymd'), 'Maxim Topolov', '23', 'max@example.com', 'T')); dbase_close($db); } ?> SEE ALSO
dbase_delete_record(3), dbase_replace_record(3). PHP Documentation Group DBASE_ADD_RECORD(3)
All times are GMT -4. The time now is 06:50 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy