Help with merge data with a reference sequence


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with merge data with a reference sequence
# 1  
Old 07-29-2016
Help with merge data with a reference sequence

I have two input file.:

File 1 is a large reference sequence (A large Fasta sequence);

File 1 (is a file which first line is the header description and line other ">" is its corresponding word and counting from 1 till end of file);
Code:
>Data_1
ASWDADAQTWQQGSAAAAASDAFAFA
.
.

File 2 is list of data that I interested to replace specific word at specific location in File 1;
File 2 (3 column in File 2 and is tab-delimited format);
Code:
Data_1 2 Z
Data_1 3 T
Data_1 10 A
Data_1 11 T
.
.

Desired Output File
Code:
>Data_1
AZTDADAQTATQGSAAAAASDAFAFA
.
.

File 1 is a long record Fasta file (Include a first line of header description and line after ">" is its corresponding word).
File 2 is a file got 3 column (Tab-delimited).
First column is the header description (without ">") of File 1;
Second column is the position of word that I wanna to replace in File 1 based on Third column data;
Third column is the word that I wanna to replace it which specific word/specific position of File 1;

Awk code try
Code:
awk -F "\t" '(FNR==1){x++} (x==1){a[$1][$2]=$3;next} (x==2){if($0~/>/){h=$0;sub(/^.*Data/,"",h);sub(/ .*/,"",h)} else{seq[h]=seq[h]$0}} END{for(i in a){s=0; for(j in a[i]){m=m substr(seq[i],s,j-1) a[i][j];s=j+1} m=m substr(seq[i],s); print ">Data"i"\n"m}}' File 2 File 1

I would like to replace specific word (exclude header >Data_1) at specific location in File 1 if it is list on File 2 data.
My main objective is hope to replace specific word at specific location in File 1 based on the record provided in File 2 (specific position and replace with new word based on File 2).

Thanks for any advice.

Last edited by cpp_beginner; 07-31-2016 at 12:04 PM.. Reason: Corrected code tags.
# 2  
Old 07-29-2016
Try this:-
Code:
awk '
        NR == FNR {
                A[">"$1 FS $2] = $3
                next
        }
        /^>/ {
                T = $0
                print
                next
        }
        {
                for ( i = 1; i <= length; i++ )
                {
                        if ( ( T FS i ) in A )
                                printf "%s", A[T FS i]
                        else
                                printf "%s", substr( $0, i, 1 )
                }
                printf "\n"
        }
' file2 file1

# 3  
Old 07-29-2016
Can the sequences in your FASTA file be spread over multiple lines?
# 4  
Old 07-29-2016
Code:
awk 'NR==FNR {a[$1,$2]=$2; b[$1,$2]=$3; c[$1]=$1; next}
/^>/ {w=$0; sub(".*> *", "", w)}
! /^>/ && c[w] {for (i in a) $(a[i])=b[i]}
1
' file2 FS= OFS= file1


Last edited by rdrtx1; 07-29-2016 at 02:09 PM..
# 5  
Old 07-29-2016
If fasta sequences are always only a single line:
Code:
awk '
  NR==FNR { 
    R[$1,$2]=$3
    next
  }
  FNR>1 {
    s=x
    for(i=1; i<=length($2); i++) s=s (($1,i) in R ? R[$1,i] : substr($2,i,1))
    print RS $1 FS s
  }
' file2 RS=\> FS='\n' file1



----
Note: FS= (the extension that if FS is equal to the empty string, each character becomes a separate field) is not part of POSIX and may or may not work with your version of awk.
# 6  
Old 07-31-2016
Hi,

The fasta sequence is only a very long single line Smilie

---------- Post updated at 04:52 AM ---------- Previous update was at 04:47 AM ----------

Hi,

Sorry.
Mind to know why it will return syntax error when I type it as a one line awk command at my terminal?

Is it I should run your awk command as a shell script instead?
Thanks for advice.
# 7  
Old 07-31-2016
Hi, which script are you referring to?
What is your OS?
How do you paste it?
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract data using a reference

Gents, If there the possibility can to extract data using a reference from other file. input.txt ( big file which contends all data output.txt ( data extracted ) selection.txt ( information to extract the data Example In file input.txt there is big data each record have 56 lines like... (3 Replies)
Discussion started by: jiam912
3 Replies

2. Shell Programming and Scripting

Compare data with reference from other file

Gents, Please can you help with this. I have a big file (file2) which contends many records increment every 25 rows ( column 1 ). Then I have other file as reference (file1).. column 1 to 11. I want to compare that all values in file2 (column 2 to 12.) match with values in... (2 Replies)
Discussion started by: jiam912
2 Replies

3. Shell Programming and Scripting

Replace a value using a reference data from other file

Gents, Can you please help me to solve this case In my input file I have a values in column 49 which always need to be one, but sometimes the system create a value 2, in this case I need to go to search in the original file and replace the values in the row where the value 2 is and in the... (6 Replies)
Discussion started by: jiam912
6 Replies

4. Shell Programming and Scripting

Merge 2 files with one reference columns

Hi All Source1 servername1,patchid1 servername1,patchid2 servername1,patchid3 servername2,patchid1 servername2,patchid2 servername3,patchid4 servername3,patchid5 Source2 servername1,appname1 servername1,appname2 servername1,appname3 servername2,appname1 servername2,appname2... (13 Replies)
Discussion started by: mv_mv
13 Replies

5. Shell Programming and Scripting

Retrieving sequence data from other file

Hello experts :cool:, I am new to programming and will need your help.. I have 2 very large files with the following format: FILE1: >MLP1019 PL4 >MLP7456 PL3 >MLP9268 PL9 >MLP6245 PL1 FILE2: >MLP1019 STNAPLQTSNTWVSYQPSMMMSLQ >MLP7456 PPYWYWNSAVMIFYVQPLSLLAVLLA >MLP9268... (2 Replies)
Discussion started by: narachaid
2 Replies

6. Shell Programming and Scripting

Help with checking reference data frequency count

reference data GHTAS QER CC N input data NNWQERPROEGHTASTTTGHTASNCC Desired output GHTAS 2 QER 1 CC 1 N 3 (2 Replies)
Discussion started by: perl_beginner
2 Replies

7. Shell Programming and Scripting

Reference data check for delete

Dear All, I have a master file - Master.txt 100|ABC 200|CED 500|XYZ 800|POL I have a reference file - Ref.txt 200 800 What is desired.. Check for all those records in reference file matching with those within master file and then delete those records from Master file So, at end,... (1 Reply)
Discussion started by: sureshg_sampat
1 Replies

8. Shell Programming and Scripting

perl merge two files by the time sequence

Hi Guys, i have two files: fileA: 20090611 00:00:11 20090611 00:00:11 20090611 00:00:24 20090611 00:01:10 20090611 07:13:00 fileB: 20090611 00:00:01 20090611 00:00:12 20090611 00:00:24 20090611 00:01:12 20090611 09:13:00 want to make two files into a single file, but follow the... (14 Replies)
Discussion started by: jimmy_y
14 Replies

9. Shell Programming and Scripting

How to extract data from BNC xml with reference brackets?

I have data like the following pattern: <change date="2000-01-09" who="#OUCS">Updated all catrefs</change> <change date="2000-01-08" who="#OUCS">Manually updated tagcounts, titlestmt, and title in source</change> <change date="1999-09-13" who="#UCREL">POS codes revised for BNC-2; header... (14 Replies)
Discussion started by: Johnivy
14 Replies
Login or Register to Ask a Question