Get string of sequence from other file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Get string of sequence from other file
# 1  
Old 10-10-2013
Get string of sequence from other file

Hi guys,

Does anyone know how to get a string of sequence from other file? Should I use awk? Please see below. Thanks!

LIST_FILE:
Code:
>NAME1
>NAME3
>NAME5
>NAME7
>NAME8

SEQ_FILE:
Code:
>NAME1 LEN75
100100101001010001010
>NAME2 LEN90
111010101010101101101
>NAME3 LEN27
101000101001010010101
>NAME4 LEN61
101001001010010010100
>NAME5 LEN25
010010001001010100101
>NAME6 LEN78
010010100101010010111
>NAME7 LEN49
010010101101111010100
>NAME8 LEN66
010101011111101001000


OUTPUT:
Code:
>NAME1 LEN75
100100101001010001010
>NAME3 LEN27
101000101001010010101
>NAME5 LEN25
010010001001010100101
>NAME7 LEN49
010010101101111010100
>NAME8 LEN66
010101011111101001000


Last edited by Franklin52; 10-11-2013 at 02:54 AM.. Reason: Please use code tags
# 2  
Old 10-10-2013
Try:
Code:
 awk 'NR==FNR{a[$0]=1;next}$1 in a{print;getline;print}' LIST_FILE SEQ_FILE

# 3  
Old 10-10-2013
Hi bartus11,

I tried the code, unfortunately it only extract the first line of the sequence. My sequence is actually much longer than that. For example:
Code:
>NAME1 LEN75
100100101001010001010
001101010001001001000
110101001000100100010
010010001000010001000
>NAME3 LEN27
101000101001010010101
010010001001001000100
010010001001001010000
101001010100110010101

The code that you wrote only gives the first line of the sequence. How do I get the all sequence of the ">NAME# LEN##", before the next ">" sign?

Last edited by Franklin52; 10-11-2013 at 02:54 AM.. Reason: Please use code tags
# 4  
Old 10-10-2013
Try:
Code:
awk 'NR==FNR{a[$0]=1;next}$1 in a{p=1;print;getline;}$0~"^>"{p=0}p' LIST_FILE SEQ_FILE

# 5  
Old 10-11-2013
Code:
grep -A1 -f file1 file2
>NAME1 LEN75
100100101001010001010
--
>NAME3 LEN27
101000101001010010101
--
>NAME5 LEN25
010010001001010100101
--
>NAME7 LEN49
010010101101111010100
>NAME8 LEN66
010101011111101001000

# 6  
Old 10-11-2013
Thanks guys!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Inserting IDs from a text file into a sequence alignment file

Hi, I have one file with one column and several hundred entries File1: NA1 NA2 NA3And now I need to run a command within a mapping aligner tool to insert these sample names into a sequence alignment file (SAM) such that they look like this @RG ID:Library1 SM:NA1 PL:Illumina ... (7 Replies)
Discussion started by: nans
7 Replies

2. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

3. Shell Programming and Scripting

find sequence of 13 digits in file

I need to extract all sequences of thirteen digits in a file, e.g. 4384976350232, and at the same time not extract sequences with 14 or more digits. How do I do that using sed, awk or something built into bash? (8 Replies)
Discussion started by: locoroco
8 Replies

4. Shell Programming and Scripting

First number sequence from string

Hi, I have a string like: DBMS stats (Number Used | Percentage of total): 10 | 1.00% I have a sed command to extract numbers from this string: sed "s///g;s/^$/-1/;" Output: 10100 However what I want the sed command to return is only the first number(regardless of its size) i.e.... (3 Replies)
Discussion started by: mccartj5
3 Replies

5. UNIX for Dummies Questions & Answers

Help Parsing Sequence File

Hi Everyone, I am new in the world of UNIX and Shell scripting. I am working with a sequence file that looks like this: >contig00001 length=128 numreads=2 aTGTGCTGGgTGGGTGCCTGTTgCCccATGCTCCAGTtCAGGATTtCAGGCAttCTCATG TCCAGCATTTCTATTTAATCCTGCTGCTGGACTTGGGTGGtCTCAGTCtGGGAAGTGAGC tGTCTGTG... (8 Replies)
Discussion started by: Fahmida
8 Replies

6. Shell Programming and Scripting

Adding sequence to the file

How do I add the sequence number to the file? I have a file seperated by commas. appusage,243,jsdgh,798 appusage,876,0989,900 . . appusage,82374,ajfgdh,9284 The output would be as below 1,appusage,243,jsdgh,798 2,appusage,876,0989,900 . . 100,appusage,876,0989,900 (5 Replies)
Discussion started by: smee
5 Replies

7. Shell Programming and Scripting

Renaming a file use another file as a sequence calling a shl

have this shl that will FTP a file from the a directory in windows to UNIX, It get the name of the file stored in this variable $UpLoadFileName then put in the local directory LocalDir="${MPATH}/xxxxx/dat_files" that part seems to be working, but then I need to take that file and rename, I am using... (3 Replies)
Discussion started by: rechever
3 Replies

8. UNIX for Dummies Questions & Answers

cmd sequence to find & cut out a specific string

A developer of mine has this requirement - I couldn't tell her quickly how to do it with UNIX commands or a quick script so she's writing a quick program to do it - but that got my curiousity up and thought I'd ask here for advice. In a text file, there are some records (about half of them)... (4 Replies)
Discussion started by: LisaS
4 Replies

9. UNIX for Dummies Questions & Answers

string replacement in a sequence of characters

Hi All, I have a string "TBM630300000000020080506094041000003818".I want to replace the last nine digits with another string stored in a variabe called "count".The variabe is also having nine digits.Could any one please help me on this how to accomplish.I need a detail syntax(not in the... (3 Replies)
Discussion started by: raoscb
3 Replies

10. Shell Programming and Scripting

Adding a sequence string to a file

I have a pipe delimited file I need to add a sequence number to in the third field. The record fields will be variable length, so I have to parse for the second pipe. Another requirement is that the sequence number must be unique to all records in the file and subsequent files created, so the... (5 Replies)
Discussion started by: MrPeabody
5 Replies
Login or Register to Ask a Question