Help with compare 2 column content and corrected/replaced word


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with compare 2 column content and corrected/replaced word
# 1  
Old 07-29-2016
Help with compare 2 column content and corrected/replaced word

Input File
Code:
CGGCGCCTCGCNNNCGAGCG    CGGCGCGCCGAATCCGTGCG
TCGCNGC GCGCCGC
ACGGCNNNNN     ACGGCCTCGCG
CGGCNGCCCGCCC   CGGCGCGCCGTCC

Desired Output File
Code:
CGGCGCCTCGCNNNCGAGCG    CGGCGCGCCGAATCCGTGCG CGGCGCCTCGCATCCGAGCG
TCGCNGC GCGCCGC TCGCCGC
ACGGCNNNNN     ACGGCCTCGCG ACGGCTCGCG
CGGCNGCCCGCCC   CGGCGCGCCGTCC CGGCGGCCCGCCC

First and Second Column are always the same number of characteristics (words). I wanna Third Column will print out exactly the same words as First Column but corrected/replaceed all N based on corresponding word position at Second Column word.

It seems a bit complicated Smilie
Thanks for any advice.
# 2  
Old 07-29-2016
Hello perl_beginner,

Thank you for asking good question, please keep it up. Coming to your requirement now, off course SHELL can't understand you BOLD characters(which you actually made for us to understand) so considering that you character/characters N will come in a continuous sequence only, following may help you in same.
Code:
awk '{split("ATC:C:TCGCG:G", array,":");$(NF+1)=$1;sub(/N*N/,array[NR],$NF);print}'   Input_file

Output will be as follows.
Code:
CGGCGCCTCGCNNNCGAGCG CGGCGCGCCGAATCCGTGCG CGGCGCCTCGCATCCGAGCG
TCGCNGC GCGCCGC TCGCCGC
ACGGCNNNNN ACGGCCTCGCG ACGGCTCGCG
CGGCNGCCCGCCC CGGCGCGCCGTCC CGGCGGCCCGCCC

Here you need to give you all strings which you want to be substituted(in newly created 3rd column) in split("ATC:C:TCGCG:G", array,":") highlighted column of split according to their sequence/line vice and it should fly then. If you have more permutations/combinations for this please do let us know on same then.

EDIT: We could put splitcode into BEGIN section so that array will be created only once. As follows a minor change in above code.
Code:
awk 'BEGIN{split("ATC:C:TCGCG:G", array,":")};{$(NF+1)=$1;sub(/N*N/,array[NR],$NF);print}'  Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 07-29-2016 at 03:24 AM.. Reason: Added one more solution, which have a little change in previous code.
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 07-29-2016
Hi R. Singh, thanks again for your prompt reply and help Smilie

I try your awk command with other record, it seems no work.
I believe is due to
Code:
split("ATC:C:TCGCG:G", array,":")

which is different from other record.

Is there any way can let the awk automatic replace all the N in first column based on the corresponding position at second column word?

I was thinking to use the split command to split all the word in first column and second column.
Then use
Code:
awk if else

to print out the word based on second column when the first column got "N".

Thanks a lot and again for your advice.
# 4  
Old 07-29-2016
Hello perl_beginner,

Sorry, I didn't see the point like it is same position in column 2 where you want to get the replacements, so could you please try following.
Code:
awk '{$(NF+1)=$1;match($1,/N*N/);$NF=substr($NF,1,RSTART-1) substr($2,RSTART,RLENGTH) substr($NF,RSTART+RLENGTH);print}'   Input_file

Output will be as follows.
Code:
CGGCGCCTCGCNNNCGAGCG CGGCGCGCCGAATCCGTGCG CGGCGCCTCGCATCCGAGCG
TCGCNGC GCGCCGC TCGCCGC
ACGGCNNNNN ACGGCCTCGCG ACGGCCTCGC
CGGCNGCCCGCCC CGGCGCGCCGTCC CGGCGGCCCGCCC

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 07-29-2016
Thanks a lot and very much, R. Singh.
It worked perfectly now Smilie

---------- Post updated at 02:12 AM ---------- Previous update was at 01:44 AM ----------

Hi R. Singh,

Sorry again for disturbing.
I just find out one more new interesting issue.

Is it possible that your awk command continue to search through all the "N" in first column and replace based on corresponding position at second column?

I notice if I have 3 N at different position of first column.
The awk command will replace only the first N and stop replace other N in the first string.
eg.
Code:
NCGTNGGCGTCGGCGN      GCGTCGGCGTGGGCGT      GCGTNGGCGTCGGCGN

At the above example, it will only replace the first occurrence N at first column and stop replace second and third N at first column.

Thanks a lot and again.
# 6  
Old 07-29-2016
Try an adaption of RavinderSingh13's fine proposal:

Code:
awk '
        {$3 = $1
         while (match ($3, /N*N/)) $3 =  substr($3, 1, RSTART-1) substr($2, RSTART, RLENGTH) substr($3, RSTART+RLENGTH)
        }
1
' file

These 2 Users Gave Thanks to RudiC For This Post:
# 7  
Old 07-29-2016
Thanks, RudiC.
It solve my inquiry regarding more than 1 N at different position at first column data Smilie

Many thanks and again.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to search for a word in column header that fully matches the word not partially in awk?

I have a multicolumn text file with header in the first row like this The headers are stored in an array called . which contains I want to search for each elements of this array from that multicolumn text file. And I am using this awk approach for ii in ${hdr} do gawk -vcol="$ii" -F... (1 Reply)
Discussion started by: Atta
1 Replies

2. UNIX for Beginners Questions & Answers

UNIX script to compare 3rd column value with first column and display

Hello Team, My source data (INput) is like below EPIC1 router EPIC2 Targetdefinition Exp1 Expres rtr1 Router SQL SrcQual Exp1 Expres rtr1 Router EPIC1 Targetdefinition My output like SQL SrcQual Exp1 Expres Exp1 Expres rtr1 Router rtr1 Router EPIC1 Targetdefinition... (5 Replies)
Discussion started by: sekhar.lsb
5 Replies

3. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Example: I have files in below format file 1: zxc,133,joe@example.com cst,222,xyz@example1.com File 2 Contains: hxd hcd jws zxc cst File 1 has 50000 lines and file 2 has around 30000 lines : Expected Output has to be : hxd hcd jws (5 Replies)
Discussion started by: TestPractice
5 Replies

4. UNIX for Dummies Questions & Answers

Compare data - Match first column and compare second

Hi guys, looking for some help with a way to compare data in two files but with some conditions. example, File 1 consists of site1,10.1.1.1 site2,20.2.2.2 site3,30.3.3.3 File 2 contains site1,l0.1.1.1 site2,50.1.1.1 site3,30.3.3.3 site4,40.1.1.1 I want to be able to match the... (1 Reply)
Discussion started by: mutley2202
1 Replies

5. UNIX for Dummies Questions & Answers

Search word in 3rd column and move it to next column (4th)

Hi, I have a file with +/- 13000 lines and 4 column. I need to search the 3rd column for a word that begins with "SAP-" and move/skip it to the next column (4th). Because the 3rd column need to stay empty. Thanks in advance.:) 89653 36891 OTR-60 SAP-2 89653 36892 OTR-10 SAP-2... (2 Replies)
Discussion started by: AK47
2 Replies

6. Shell Programming and Scripting

Compare two files word by word

I need to compare two files word by word using unix shell scripting. Could someone help me? I need the code which will compare the 1st word from file1 with 1st word from file2, 2nd word from file1 with 2nd word from file2 etc..., for all the lines. Example: File1: aaa bbb ccc ... (7 Replies)
Discussion started by: rsmohankumar
7 Replies

7. Shell Programming and Scripting

Change file content 4 column to one Column using script

Hi Gurus, I have file content sample: ,5113955056,,TAgent-Suspend ,5119418233,,TAgent-Suspend ,5102119078,,TAgent-Suspend filenames 120229H5_suspend, 120229H6_unsuspend I receive those files one of directory /home/temp/ I need following: 1. Backup first /home/temp/ file to... (5 Replies)
Discussion started by: thepurple
5 Replies

8. Shell Programming and Scripting

Query for replacing a string and keeping the non-replaced content

Hi experts, As i am a novice unix player...so need help for the below query...banged my head from quite a while...:confused: i have a set of html files, in which i need to search for string "Page"(case sensitive) and then replace the same with some numeric code ,say, "XXX1234". Here in... (2 Replies)
Discussion started by: rahulfhp
2 Replies

9. Shell Programming and Scripting

Compare a content of variable to a database column

Hi have an array like this colarray="a" colarray="b" colarray="c" colarray="d" colarray="e" colarray="f" the arrayvariable is in unix sh file i want to check the content of the array to oracle database table. that is whether "a" is present in the table. (4 Replies)
Discussion started by: barani75
4 Replies

10. Shell Programming and Scripting

Replace aword in a.The replaced word should not be overwitten in perl(details inside)

Hi i am trying to write a perl program where i have to open a 1)directory "unit" 2) rename the files in the dir say file1.txt;file2.txt...file5.txt to file1_a.txt;file2_a.txt,....file5_a.txt ;file1_x.txt ;file2_x.txt 3) open these renamed files and replace the words lets say file1_a.txt... (7 Replies)
Discussion started by: madhul2002
7 Replies
Login or Register to Ask a Question