Help with merge data with a reference sequence


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with merge data with a reference sequence
# 15  
Old 07-31-2016
Quote:
Originally Posted by cpp_beginner
.
.
.
My File 2 is tab-delimited file.
First column is the header of File 1;
Second column is the word to replace in File 1;
Third column is position of word to replace in File 1;
This is NOT what you specified in post#1:

Code:
Data_1 2 Z
Data_1 3 T
Data_1 10 A
Data_1 11 T

These 2 Users Gave Thanks to RudiC For This Post:
# 16  
Old 07-31-2016
Yes as I mentioned, it will only work with single sequence line FASTA.
Try this instead, which should no work with a wrapped (multi-line) FASTA sequence:
Code:
awk '
  NR==FNR {
    R[$1,$2]=$3
    next
  }

  FNR>1 {
    h=$1
    len=length($2)
    print RS h
    for(i=2; i<=NF; i++) {
      s=x
      for(j=1; j<=len; j++) {
        pos=j+(i-2)*len
        s=s ((h,pos) in R ? R[h,pos] : substr($i,j,1))
      }
      print s
    }
  }
' FS='\t' file2 FS=" " RS=\> file1


Last edited by Scrutinizer; 07-31-2016 at 08:36 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 17  
Old 07-31-2016
Thanks for reminding, RudiC.
Sorry for my mistake.

I just edit my Post 1.
Thanks a lot.
# 18  
Old 07-31-2016
So that means the sample of file2 also changes?

Also, your sample file2 is not TAB-delimited

I corrected post #16 so that it works for TAB delimited file2
Could you check the order and if the file is indeed TAB delimited

Last edited by Scrutinizer; 07-31-2016 at 08:29 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 19  
Old 07-31-2016
Thanks, Scrutinizer.

Might to know how to correct Syntax error issue again?
If I run it as a long awk command at terminal.
It will return Syntax error etc.

If I copy and paste the whole command to a file called "run.sh" and execute it as "sh run.sh".
It will still return some Syntax error Smilie

Sorry and thanks for your guide and advice.

---------- Post updated at 06:28 AM ---------- Previous update was at 06:25 AM ----------

Hi Scrutinizer,

File 1 is a one line long record Fasta file (Include a first line of header description and second line is its corresponding nucleotide sequence).
File 2 is a file got 3 column (Tab-delimited).
First column is the header description (without ">") of File 1;
Second column is the word to replace in File 1;
Third column is position of word to replace in File 1;

Basically it is still same as my original question.
Just I forget to mention that my file 2 is a tab-delimited file Smilie

Sorry for confusing.
I just edited my thread to clarify it.

---------- Post updated at 06:29 AM ---------- Previous update was at 06:28 AM ----------

My main objective is hope to replace all specific word in File 1 based on the record provided in File 2 (specific position and replace with new word based on File 2).
# 20  
Old 07-31-2016
Yes but now your sample file2 does not match the description. Which one is right and if it is not the sample, could you correct the sample?

And you meant 3 fields presumably, not lines ...
This User Gave Thanks to Scrutinizer For This Post:
# 21  
Old 07-31-2016
Hi Scrutinizer,

I just edited my Post#1.
Is that make the thing clear now?

Basically I still have 2 Input File now.
The data I shown in Post#1 just a first few record as the whole file is around 2 million word.

Thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract data using a reference

Gents, If there the possibility can to extract data using a reference from other file. input.txt ( big file which contends all data output.txt ( data extracted ) selection.txt ( information to extract the data Example In file input.txt there is big data each record have 56 lines like... (3 Replies)
Discussion started by: jiam912
3 Replies

2. Shell Programming and Scripting

Compare data with reference from other file

Gents, Please can you help with this. I have a big file (file2) which contends many records increment every 25 rows ( column 1 ). Then I have other file as reference (file1).. column 1 to 11. I want to compare that all values in file2 (column 2 to 12.) match with values in... (2 Replies)
Discussion started by: jiam912
2 Replies

3. Shell Programming and Scripting

Replace a value using a reference data from other file

Gents, Can you please help me to solve this case In my input file I have a values in column 49 which always need to be one, but sometimes the system create a value 2, in this case I need to go to search in the original file and replace the values in the row where the value 2 is and in the... (6 Replies)
Discussion started by: jiam912
6 Replies

4. Shell Programming and Scripting

Merge 2 files with one reference columns

Hi All Source1 servername1,patchid1 servername1,patchid2 servername1,patchid3 servername2,patchid1 servername2,patchid2 servername3,patchid4 servername3,patchid5 Source2 servername1,appname1 servername1,appname2 servername1,appname3 servername2,appname1 servername2,appname2... (13 Replies)
Discussion started by: mv_mv
13 Replies

5. Shell Programming and Scripting

Retrieving sequence data from other file

Hello experts :cool:, I am new to programming and will need your help.. I have 2 very large files with the following format: FILE1: >MLP1019 PL4 >MLP7456 PL3 >MLP9268 PL9 >MLP6245 PL1 FILE2: >MLP1019 STNAPLQTSNTWVSYQPSMMMSLQ >MLP7456 PPYWYWNSAVMIFYVQPLSLLAVLLA >MLP9268... (2 Replies)
Discussion started by: narachaid
2 Replies

6. Shell Programming and Scripting

Help with checking reference data frequency count

reference data GHTAS QER CC N input data NNWQERPROEGHTASTTTGHTASNCC Desired output GHTAS 2 QER 1 CC 1 N 3 (2 Replies)
Discussion started by: perl_beginner
2 Replies

7. Shell Programming and Scripting

Reference data check for delete

Dear All, I have a master file - Master.txt 100|ABC 200|CED 500|XYZ 800|POL I have a reference file - Ref.txt 200 800 What is desired.. Check for all those records in reference file matching with those within master file and then delete those records from Master file So, at end,... (1 Reply)
Discussion started by: sureshg_sampat
1 Replies

8. Shell Programming and Scripting

perl merge two files by the time sequence

Hi Guys, i have two files: fileA: 20090611 00:00:11 20090611 00:00:11 20090611 00:00:24 20090611 00:01:10 20090611 07:13:00 fileB: 20090611 00:00:01 20090611 00:00:12 20090611 00:00:24 20090611 00:01:12 20090611 09:13:00 want to make two files into a single file, but follow the... (14 Replies)
Discussion started by: jimmy_y
14 Replies

9. Shell Programming and Scripting

How to extract data from BNC xml with reference brackets?

I have data like the following pattern: <change date="2000-01-09" who="#OUCS">Updated all catrefs</change> <change date="2000-01-08" who="#OUCS">Manually updated tagcounts, titlestmt, and title in source</change> <change date="1999-09-13" who="#UCREL">POS codes revised for BNC-2; header... (14 Replies)
Discussion started by: Johnivy
14 Replies
Login or Register to Ask a Question