Use awk to replace numbers in a file with a column from another file

07-05-2018

Registered User

3, 0

Join Date: Jul 2018

Last Activity: 6 July 2018, 11:10 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Use awk to replace numbers in a file with a column from another file

Hello,

I am trying to make a awk code that will take 2 files, a txt file like this :

Code:

1   1   88                        c(1:38, 42, 102)
2   2  128 c(39:41, 43:101, 103:105, 153, 155:189, 292, 344:369)
3   3   84                     c(190:249, 603, 606:607, 609:629)
4   4   12                                   c(250:251, 253:262)
5   6   51                     c(263, 265:291, 293:313, 315:316)
6   8   28                                       c(314, 317:343)
7   9   60            c(370:385, 561:587, 589:602, 604:605, 608)
8  10   39                                               386:424

and if the numbers in blue match with the numbers in red of the 2nd column of a pdb file 2

Code:

ATOM      1  N   PRO   889      24.289  17.277 -19.912  1.00  0.00           N  
ATOM      2  CA  PRO   889      25.072  18.509 -19.702  1.00  0.00           C  
ATOM      3  C   PRO   889      24.200  19.747 -19.486  1.00  0.00           C  
ATOM      4  O   PRO   889      24.602  20.661 -18.749  1.00  0.00           O  
ATOM      5  N   THR   890      23.002  19.770 -20.124  1.00  0.00           N  
ATOM      6  CA  THR   890      22.044  20.878 -20.060  1.00  0.00           C  
ATOM      7  C   THR   890      21.613  21.209 -18.629  1.00  0.00           C  
ATOM      8  O   THR   890      21.429  20.303 -17.812  1.00  0.00           O  
ATOM      9  N   VAL   891      21.484  22.513 -18.332  1.00  0.00           N

will be replaced from the 5th column (in green) for example the output has to be like this:

Code:

1   1   88                        c(889:898 , 899, 914)

...
Thank you for your time

Moderator's Comments:

Edit: darker green

Last edited by jim mcnamara; 07-05-2018 at 11:24 PM..

nastaziales

View Public Profile for nastaziales

Find all posts by nastaziales

07-05-2018

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

It's quite difficult understand/see how you arrived at your desired output given a sample input. Where did 914 come from, for example?
Could you give it another try explaining? Maybe with a more representative data files...
Also your choice of colors might not be optimal - I can hardly see this green

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

07-05-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

On top of what vgersh99 already said, I can't find a number in blue in file1 that matches any number in red in file2 (except for the 1 in the first line).

RudiC

View Public Profile for RudiC

Find all posts by RudiC

07-05-2018

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Changed color to darker green. As asked I cannot see how to arrive at a solution, either.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

07-06-2018

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Indeed it seems like the values required to arrive at the example output are missing from the pdb file.

Perhaps something like this is intended:

Code:

awk '
  NR==FNR {                    # Read the pdb file
    A[$2]=$5                   # Store the pdb values in Array Aa as a lookup table
    next
  }

  {                            # read the txt file, use the letter c as the field separator
    split($2,SEP,/[0-9]*/)     # put all field separators of field $2 in array SEP
    n=split($2,VAL,/[^0-9]*/)  # put all values of $2 in array VAL
    for(i in VAL)              # for every value
      if(VAL[i] in A)          # if it is in column 2 of the pdb file
        VAL[i]=A[VAL[i]]       # replace it with the corresponding value of column 5 in the pdb file
    $2=SEP[1]                  # replace $2 with the first separator
    for(i=2; i<n; i++)         # enumerate over fields and separators
      $2=$2 VAL[i] SEP[i]      # reassemble $2 with the fields and separators
    print
  }
' file.pdb FS=c OFS=c file.txt

Last edited by Scrutinizer; 07-06-2018 at 01:52 AM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

07-06-2018

Registered User

3, 0

Join Date: Jul 2018

Last Activity: 6 July 2018, 11:10 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

I'm sorry, I will try to explain it better,

file n.1

Code:


1   1   88                        c(1:5, 7, 9) 

2   2  128 c(39:41, 43:101, 103:105, 153, 155:189, 292, 344:369) 

3   3   84                     c(190:249, 603, 606:607, 609:629) 

4   4   12                                   c(250:251, 253:262)

 5   6   51                     c(263, 265:291, 293:313, 315:316)
 6   8   28                                       c(314, 317:343)

where 1:5, 7, 9 is the atomic number,

file n.2

Code:

 ATOM      1  N   PRO   889      24.289  17.277 -19.912  1.00  0.00           N  

 ATOM      2  CA  PRO   889      25.072  18.509 -19.702  1.00  0.00           C   

ATOM      3   C   PRO   889      24.200  19.747 -19.486  1.00  0.00           C  

ATOM      4   O   PRO   889      24.602  20.661 -18.749  1.00  0.00           O  

 ATOM      5   N   THR   890      23.002  19.770 -20.124  1.00  0.00           N   

ATOM      6  CA  THR   890      22.044  20.878 -20.060  1.00  0.00           C  

 ATOM      7  C   THR   890      21.613  21.209 -18.629  1.00  0.00           C   

ATOM       8  O   THR   890      21.429  20.303 -17.812  1.00  0.00           O   

ATOM      9   N   VAL   891      21.484  22.513 -18.332  1.00  0.00           N

the second column is also the atomic number (blue), the fifth column is the residue number(red),
so i want the atomic number in file 1 to be replaced with the residue number taken from file 2. So as an output

before
1 1 88 c(1:5, 7, 9)

After
1 1 88 c(889:890, 890, 891)

Last edited by nastaziales; 07-06-2018 at 05:59 AM..

nastaziales

View Public Profile for nastaziales

Find all posts by nastaziales

07-06-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Wouldn't it make sense to group the result numbers in ranges and single numbers, where applicable? Like 889:890, 890, 891 would be one range 889:891 only, and if there was a gap, it should read like 889:891,893:894? If so, try (stealing from scrutinizer's approach):

Code:

awk '
NR==FNR         {A[$2] = $5
                 next
                }

                {CNT = 0
                 TMP = ""
                 gsub (/[)(     ]/, _, $2)
                 n = split ($2, VAL, ",")
                 for (i=1; i<=n; i++)   {if (1 == split (VAL[i], LMT, ":")) LMT[2] = LMT[1]
                                         for (j=LMT[1]; j<=LMT[2]; j++) RES[++CNT] = A[j]
                                        }
                 $2 = "c(" RES[1]
                 for (i=2; i<=CNT; i++) {DLT = RES[i] - RES[i-1]
                                         if (DLT > 1)   {$2  = $2 TMP "," RES[i]
                                                         TMP = ""
                                                        }
                                           else if (DLT == 1) TMP = ":" RES[i]
                                        }
                 $2 = $2 TMP ")"
                 print
                }
' file2 FS=c file1

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Use awk to replace numbers in a file with a column from another file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match column 8 in file 1 with column 2 in file 2 and replace..

Discussion started by: kieranfoley

2. Shell Programming and Scripting

awk compare column n replace with in one file

Discussion started by: arun1401

3. Shell Programming and Scripting

Awk: Need help replacing a specific column in a file by part of a column in another file

Discussion started by: aa2601

4. Shell Programming and Scripting

Replace column that matches specific pattern, with column data from another file

Discussion started by: prashali

5. Shell Programming and Scripting

Find in first column and replace the line with Awk, and output new file

Discussion started by: charles33

6. Shell Programming and Scripting

Replace 2nd column of CSV file with numbers on line

Discussion started by: ffdstanley

7. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Discussion started by: rydz00

8. Shell Programming and Scripting

Need an awk for a global find/replace in a file, specific column

Discussion started by: jclanc8

9. AIX

How to replace many numbers with one number in a file

Discussion started by: vpandey

10. Shell Programming and Scripting

to replace one character by numbers in a file

Discussion started by: cdfd123