Use awk to replace numbers in a file with a column from another file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Use awk to replace numbers in a file with a column from another file
# 1  
Old 07-05-2018
Use awk to replace numbers in a file with a column from another file

Hello,

I am trying to make a awk code that will take 2 files, a txt file like this :


Code:
1   1   88                        c(1:38, 42, 102)
2   2  128 c(39:41, 43:101, 103:105, 153, 155:189, 292, 344:369)
3   3   84                     c(190:249, 603, 606:607, 609:629)
4   4   12                                   c(250:251, 253:262)
5   6   51                     c(263, 265:291, 293:313, 315:316)
6   8   28                                       c(314, 317:343)
7   9   60            c(370:385, 561:587, 589:602, 604:605, 608)
8  10   39                                               386:424

and if the numbers in blue match with the numbers in red of the 2nd column of a pdb file 2



Code:
ATOM      1  N   PRO   889      24.289  17.277 -19.912  1.00  0.00           N  
ATOM      2  CA  PRO   889      25.072  18.509 -19.702  1.00  0.00           C  
ATOM      3  C   PRO   889      24.200  19.747 -19.486  1.00  0.00           C  
ATOM      4  O   PRO   889      24.602  20.661 -18.749  1.00  0.00           O  
ATOM      5  N   THR   890      23.002  19.770 -20.124  1.00  0.00           N  
ATOM      6  CA  THR   890      22.044  20.878 -20.060  1.00  0.00           C  
ATOM      7  C   THR   890      21.613  21.209 -18.629  1.00  0.00           C  
ATOM      8  O   THR   890      21.429  20.303 -17.812  1.00  0.00           O  
ATOM      9  N   VAL   891      21.484  22.513 -18.332  1.00  0.00           N

will be replaced from the 5th column (in green) for example the output has to be like this:
Code:
1   1   88                        c(889:898 , 899, 914)

...
Thank you for your time SmilieSmilie

Moderator's Comments:
Mod Comment Edit: darker green

Last edited by jim mcnamara; 07-05-2018 at 11:24 PM..
# 2  
Old 07-05-2018
It's quite difficult understand/see how you arrived at your desired output given a sample input. Where did 914 come from, for example?
Could you give it another try explaining? Maybe with a more representative data files...
Also your choice of colors might not be optimal - I can hardly see this green
# 3  
Old 07-05-2018
On top of what vgersh99 already said, I can't find a number in blue in file1 that matches any number in red in file2 (except for the 1 in the first line).
# 4  
Old 07-05-2018
Changed color to darker green. As asked I cannot see how to arrive at a solution, either.
# 5  
Old 07-06-2018
Indeed it seems like the values required to arrive at the example output are missing from the pdb file.

Perhaps something like this is intended:
Code:
awk '
  NR==FNR {                    # Read the pdb file
    A[$2]=$5                   # Store the pdb values in Array Aa as a lookup table
    next
  }

  {                            # read the txt file, use the letter c as the field separator
    split($2,SEP,/[0-9]*/)     # put all field separators of field $2 in array SEP
    n=split($2,VAL,/[^0-9]*/)  # put all values of $2 in array VAL
    for(i in VAL)              # for every value
      if(VAL[i] in A)          # if it is in column 2 of the pdb file
        VAL[i]=A[VAL[i]]       # replace it with the corresponding value of column 5 in the pdb file
    $2=SEP[1]                  # replace $2 with the first separator
    for(i=2; i<n; i++)         # enumerate over fields and separators
      $2=$2 VAL[i] SEP[i]      # reassemble $2 with the fields and separators
    print
  }
' file.pdb FS=c OFS=c file.txt


Last edited by Scrutinizer; 07-06-2018 at 01:52 AM..
# 6  
Old 07-06-2018
I'm sorry, I will try to explain it better,


file n.1


Code:

1   1   88                        c(1:5, 7, 9) 

2   2  128 c(39:41, 43:101, 103:105, 153, 155:189, 292, 344:369) 

3   3   84                     c(190:249, 603, 606:607, 609:629) 

4   4   12                                   c(250:251, 253:262)

 5   6   51                     c(263, 265:291, 293:313, 315:316)
 6   8   28                                       c(314, 317:343)

where 1:5, 7, 9 is the atomic number,

file n.2


Code:
 ATOM      1  N   PRO   889      24.289  17.277 -19.912  1.00  0.00           N  

 ATOM      2  CA  PRO   889      25.072  18.509 -19.702  1.00  0.00           C   

ATOM      3   C   PRO   889      24.200  19.747 -19.486  1.00  0.00           C  

ATOM      4   O   PRO   889      24.602  20.661 -18.749  1.00  0.00           O  

 ATOM      5   N   THR   890      23.002  19.770 -20.124  1.00  0.00           N   

ATOM      6  CA  THR   890      22.044  20.878 -20.060  1.00  0.00           C  

 ATOM      7  C   THR   890      21.613  21.209 -18.629  1.00  0.00           C   

ATOM       8  O   THR   890      21.429  20.303 -17.812  1.00  0.00           O   

ATOM      9   N   VAL   891      21.484  22.513 -18.332  1.00  0.00           N

the second column is also the atomic number (blue), the fifth column is the residue number(red),
so i want the atomic number in file 1 to be replaced with the residue number taken from file 2. So as an output

before
1 1 88 c(1:5, 7, 9)

After
1 1 88 c(889:890, 890, 891)

Last edited by nastaziales; 07-06-2018 at 05:59 AM..
# 7  
Old 07-06-2018
Wouldn't it make sense to group the result numbers in ranges and single numbers, where applicable? Like 889:890, 890, 891 would be one range 889:891 only, and if there was a gap, it should read like 889:891,893:894? If so, try (stealing from scrutinizer's approach):

Code:
awk '
NR==FNR         {A[$2] = $5
                 next
                }

                {CNT = 0
                 TMP = ""
                 gsub (/[)(     ]/, _, $2)
                 n = split ($2, VAL, ",")
                 for (i=1; i<=n; i++)   {if (1 == split (VAL[i], LMT, ":")) LMT[2] = LMT[1]
                                         for (j=LMT[1]; j<=LMT[2]; j++) RES[++CNT] = A[j]
                                        }
                 $2 = "c(" RES[1]
                 for (i=2; i<=CNT; i++) {DLT = RES[i] - RES[i-1]
                                         if (DLT > 1)   {$2  = $2 TMP "," RES[i]
                                                         TMP = ""
                                                        }
                                           else if (DLT == 1) TMP = ":" RES[i]
                                        }
                 $2 = $2 TMP ")"
                 print
                }
' file2 FS=c file1

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match column 8 in file 1 with column 2 in file 2 and replace..

I am looking at the NR==FNR posts and trying to use them to achieve the following but I am not getting it. I have 2 files. I want to match column 8 in file 1 with column 2 in file 2. When they match I want to replace column 9 in file 1 with column 1 in file 2. This is and extract from file 1 ... (5 Replies)
Discussion started by: kieranfoley
5 Replies

2. Shell Programming and Scripting

awk compare column n replace with in one file

hi Friends need to compare columns in one file where the data looks like below laptop,IBM phone,samsung car,rental user1,laptop user2,laptop user3,phone want to get output as laptop,IBM phone,samsung car,rental user1,IBM user2,IBM user3,samsung need to seach $2 in array of $1 and... (4 Replies)
Discussion started by: arun1401
4 Replies

3. Shell Programming and Scripting

Awk: Need help replacing a specific column in a file by part of a column in another file

Hi, I have two input files as File1 : ABC:client1:project1 XYZ:client2-aa:project2 DEF:client4:proj File2 : client1:W-170:xx client2-aa:WT-04:yy client4:L-005A:zz Also, array of valid values can be hardcoded like Output : ABC:W:project1 XYZ:WT:project2 (1 Reply)
Discussion started by: aa2601
1 Replies

4. Shell Programming and Scripting

Replace column that matches specific pattern, with column data from another file

Can anyone please help with this? I have 2 files as given below. If 2nd column of file1 has pattern foo1@a, find the matching 1st column in file2 & replace 2nd column of file1 with file2's value. file1 abc_1 foo1@a .... abc_1 soo2@a ... def_2 soo2@a .... def_2 foo1@a ........ (7 Replies)
Discussion started by: prashali
7 Replies

5. Shell Programming and Scripting

Find in first column and replace the line with Awk, and output new file

Find in first column and replace the line with Awk, and output new file File1.txt"2011-11-02","Georgia","Atlanta","x","","" "2011-11-03","California","Los Angeles","x","","" "2011-11-04","Georgia","Atlanta","x","x","x" "2011-11-05","Georgia","Atlanta","x","x","" ... (4 Replies)
Discussion started by: charles33
4 Replies

6. Shell Programming and Scripting

Replace 2nd column of CSV file with numbers on line

I have a csv file with occasional multiple entries in the second column. 111111,104,07-24-2011,3.15,N, 222222,020 140,07-24-2011,10.00,N,I want the result 111111,104,07-24-2011,3.15,N, 222222,020,07-24-2011,10.00,N, 222222,140,07-24-2011,10.00,N, I know I can get the output of the second... (5 Replies)
Discussion started by: ffdstanley
5 Replies

7. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

8. Shell Programming and Scripting

Need an awk for a global find/replace in a file, specific column

I am new to unix and awk/sed etc... using C-Shell. Basically, I have a fixed length file that has 4 different record types on it, H, D, V, W all in column 1. I need to change all the W's in column 1 to D's. in the entire file. The W's can be anywhere in the file and must remain in the same... (3 Replies)
Discussion started by: jclanc8
3 Replies

9. AIX

How to replace many numbers with one number in a file

How to replace many numbers with one number in a file. Many numbers like 444565,454678,443298,etc. i want to replace these with one number (300).Please halp me out. (2 Replies)
Discussion started by: vpandey
2 Replies

10. Shell Programming and Scripting

to replace one character by numbers in a file

suppose u have a file aas P-H 123 gdg O-U 223 hdy I-Y 12 fgd K-O 333 ssa L-P 32 output shud be like that aas P123H gdg O223U hdy I12Y fgd K333O ssa L32P thanks (7 Replies)
Discussion started by: cdfd123
7 Replies
Login or Register to Ask a Question