Help with merge data with a reference sequence


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with merge data with a reference sequence
# 8  
Old 07-31-2016
Hi,

It seems like no work Smilie
It return the header together with the length of the fasta file I believe.
Code:
awk 'NR==FNR {a[$1,$2]=$2; b[$1,$2]=$3; c[$1]=$1; next} /^>/ {w=$0; sub(".*> *", "", w)} ! /^>/ && c[w] {for (i in a) $(a[i])=b[i]} 1 ' file2 FS= OFS= file1

>Data_1
2421442

Thanks.
# 9  
Old 07-31-2016
Probably this is because of what I mentioned in the note in post #5 about FS=
# 10  
Old 07-31-2016
Hi Scrutinizer,

I try with your awk code. It seems to return "Syntax error" Smilie
Code:
awk:  NR==FNR { R[$1,$2]=$3 next } FNR>1 { s=x for(i=1; i<=length($2); i++) s=s (($1,i) in R ? R[$1,i] : substr($2,i,1)) print RS $1 FS s }
awk:                        ^ syntax error
awk:  NR==FNR { R[$1,$2]=$3 next } FNR>1 { s=x for(i=1; i<=length($2); i++) s=s (($1,i) in R ? R[$1,i] : substr($2,i,1)) print RS $1 FS s }
awk:                                           ^ syntax error
awk:  NR==FNR { R[$1,$2]=$3 next } FNR>1 { s=x for(i=1; i<=length($2); i++) s=s (($1,i) in R ? R[$1,i] : substr($2,i,1)) print RS $1 FS s }
awk:                                                                                                                     ^ syntax error

I just type the below command at my terminal:
Code:
awk ' NR==FNR { R[$1,$2]=$3 next } FNR>1 { s=x for(i=1; i<=length($2); i++) s=s (($1,i) in R ? R[$1,i] : substr($2,i,1)) print RS $1 FS s } ' file2 RS=\> FS='\n' file1

I type it as a long awk command Smilie
My Operation System is "x86_64 x86_64 x86_64 GNU/Linux". My awk is "GNU Awk 3.1.7".

Would it be the main problem cause it return syntax error?
Thanks a lot and again for your advice.

---------- Post updated at 05:01 AM ---------- Previous update was at 05:00 AM ----------

Hi,

I believe so Smilie
Do you have any advice regarding my concern?

Sorry.
Still quite new about awk,perl, etc shell script and programming Smilie
# 11  
Old 07-31-2016
You did not turn it into a one-liner properly, watch the semicolons. Try:
Code:
awk 'NR==FNR{R[$1,$2]=$3; next} FNR>1{s=x; for(i=1; i<=length($2); i++) s=s (($1,i) in R ? R[$1,i] : substr($2,i,1)); print RS $1 FS s}' file2 RS=\> FS='\n' file1

But you do not have to turn it into a one-liner, you can also paste multiple lines or put it in a file and execute that.

---
With your version of awk the other script should work too, probably you forgot to put semicolons there too
# 12  
Old 07-31-2016
Hi Scrutinizer,

Thanks again.
It worked perfectly with my sample sequence provided.
However I aware if I replace it with my real own data set.
It just print out the original File 1 Smilie

Would it the fasta sequence length issue?
My original file is around 2 million word and is single line.

My File 1 is 2 Line; First Line is header description and second line is a very long word (around 2 million).

My File 2 is tab-delimited file.
First column is the header of File 1;
Second column is the word to replace in File 1;
Third column is position of word to replace in File 1;
# 13  
Old 07-31-2016
I think that is too long. The Fasta format allows wrapping of the sequence over multiple lines. That should be an option in the program you used to generate the file with.

Please indicate if you would like to go that route, then I can adjust my suggestion, so that it works for that format as well..
This User Gave Thanks to Scrutinizer For This Post:
# 14  
Old 07-31-2016
Many thanks for your help.

I split the long sequence into 100 word a line now.
Unfortunately the output file just return the header with the first 100 word record Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract data using a reference

Gents, If there the possibility can to extract data using a reference from other file. input.txt ( big file which contends all data output.txt ( data extracted ) selection.txt ( information to extract the data Example In file input.txt there is big data each record have 56 lines like... (3 Replies)
Discussion started by: jiam912
3 Replies

2. Shell Programming and Scripting

Compare data with reference from other file

Gents, Please can you help with this. I have a big file (file2) which contends many records increment every 25 rows ( column 1 ). Then I have other file as reference (file1).. column 1 to 11. I want to compare that all values in file2 (column 2 to 12.) match with values in... (2 Replies)
Discussion started by: jiam912
2 Replies

3. Shell Programming and Scripting

Replace a value using a reference data from other file

Gents, Can you please help me to solve this case In my input file I have a values in column 49 which always need to be one, but sometimes the system create a value 2, in this case I need to go to search in the original file and replace the values in the row where the value 2 is and in the... (6 Replies)
Discussion started by: jiam912
6 Replies

4. Shell Programming and Scripting

Merge 2 files with one reference columns

Hi All Source1 servername1,patchid1 servername1,patchid2 servername1,patchid3 servername2,patchid1 servername2,patchid2 servername3,patchid4 servername3,patchid5 Source2 servername1,appname1 servername1,appname2 servername1,appname3 servername2,appname1 servername2,appname2... (13 Replies)
Discussion started by: mv_mv
13 Replies

5. Shell Programming and Scripting

Retrieving sequence data from other file

Hello experts :cool:, I am new to programming and will need your help.. I have 2 very large files with the following format: FILE1: >MLP1019 PL4 >MLP7456 PL3 >MLP9268 PL9 >MLP6245 PL1 FILE2: >MLP1019 STNAPLQTSNTWVSYQPSMMMSLQ >MLP7456 PPYWYWNSAVMIFYVQPLSLLAVLLA >MLP9268... (2 Replies)
Discussion started by: narachaid
2 Replies

6. Shell Programming and Scripting

Help with checking reference data frequency count

reference data GHTAS QER CC N input data NNWQERPROEGHTASTTTGHTASNCC Desired output GHTAS 2 QER 1 CC 1 N 3 (2 Replies)
Discussion started by: perl_beginner
2 Replies

7. Shell Programming and Scripting

Reference data check for delete

Dear All, I have a master file - Master.txt 100|ABC 200|CED 500|XYZ 800|POL I have a reference file - Ref.txt 200 800 What is desired.. Check for all those records in reference file matching with those within master file and then delete those records from Master file So, at end,... (1 Reply)
Discussion started by: sureshg_sampat
1 Replies

8. Shell Programming and Scripting

perl merge two files by the time sequence

Hi Guys, i have two files: fileA: 20090611 00:00:11 20090611 00:00:11 20090611 00:00:24 20090611 00:01:10 20090611 07:13:00 fileB: 20090611 00:00:01 20090611 00:00:12 20090611 00:00:24 20090611 00:01:12 20090611 09:13:00 want to make two files into a single file, but follow the... (14 Replies)
Discussion started by: jimmy_y
14 Replies

9. Shell Programming and Scripting

How to extract data from BNC xml with reference brackets?

I have data like the following pattern: <change date="2000-01-09" who="#OUCS">Updated all catrefs</change> <change date="2000-01-08" who="#OUCS">Manually updated tagcounts, titlestmt, and title in source</change> <change date="1999-09-13" who="#UCREL">POS codes revised for BNC-2; header... (14 Replies)
Discussion started by: Johnivy
14 Replies
Login or Register to Ask a Question