Assign a particular value to all items in a group that have the same identifier

07-12-2017

Registered User

7, 0

Join Date: Apr 2016

Last Activity: 13 July 2017, 6:57 PM EDT

Posts: 7

Thanks Given: 4

Thanked 0 Times in 0 Posts

Assign a particular value to all items in a group that have the same identifier

I have a pdb file with the following format:

Code:

ATOM     11  N   PRO A   1      23.223  20.197  14.441  1.00 12.21           N
ATOM     12  CA  PRO A   1      21.881  20.749  14.227  1.00 11.37           C
ATOM     13  C   PRO A   1      21.929  21.556  12.903  1.00 10.73           C
ATOM     14  O   PRO A   1      22.872  22.308  12.668  1.00 12.80           O
ATOM     15  CB  PRO A   1      21.641  21.649  15.437  1.00 10.87           C
ATOM     16  CG  PRO A   1      22.595  21.065  16.473  1.00 11.96           C
ATOM     17  CD  PRO A   1      23.844  20.738  15.680  1.00 12.56           C
ATOM     18  N   GLN A   2      20.920  21.358  12.090  1.00 10.01           N
ATOM     19  CA  GLN A   2      20.848  22.096  10.790  1.00 10.36           C
ATOM     20  C   GLN A   2      20.523  23.577  11.104  1.00  9.48           C
ATOM     21  O   GLN A   2      19.483  23.784  11.751  1.00  9.88           O
ATOM     22  CB  GLN A   2      19.839  21.439   9.860  1.00  9.46           C
ATOM     23  CG  GLN A   2      19.997  22.014   8.451  1.00 10.39           C
ATOM     24  CD  GLN A   2      19.124  21.359   7.433  1.00 11.31           C
ATOM     25  OE1 GLN A   2      18.853  20.153   7.480  1.00 11.96           O
ATOM     26  NE2 GLN A   2      18.609  22.130   6.468  1.00 10.45           N
ATOM     27  N   ALA A   3      21.334  24.475  10.596  1.00  9.29           N
ATOM     28  CA  ALA A   3      21.051  25.904  10.815  1.00  9.15           C
ATOM     29  C   ALA A   3      20.012  26.344   9.800  1.00  9.68           C
ATOM     30  O   ALA A   3      20.253  26.155   8.562  1.00 11.79           O
ATOM     31  CB  ALA A   3      22.339  26.688  10.684  1.00 11.79           C
ATOM     32  N   ILE A   4      18.911  26.884  10.201  1.00  8.39           N
ATOM     33  CA  ILE A   4      17.818  27.322   9.338  1.00  8.72           C
ATOM     34  C   ILE A   4      17.469  28.769   9.682  1.00  8.88           C
ATOM     35  O   ILE A   4      17.202  29.056  10.870  1.00 10.24           O
ATOM     36  CB  ILE A   4      16.576  26.401   9.508  1.00  9.53           C
ATOM     37  CG1 ILE A   4      16.904  24.950   9.073  1.00 10.08           C
ATOM     38  CG2 ILE A   4      15.347  26.971   8.765  1.00 10.36           C
ATOM     39  CD1 ILE A   4      15.720  23.972   9.288  1.00 11.15           C

Between columns 61 to 66 is the value for the B-factor;
between columns 23-26 is the residue number; and
between columns 13-15 is the atom name.

I need to take the B-factor (columns 61-66) for atom CA (columns 13-15) that corresponds to each residue number (columns 23-26), and write that value down in columns 68 to 73 for all rows with the matching residue number.

The pattern always puts CA as the second atom in the group of residues, but the complication is that the number of atoms for each residue varies.

For example, for the data above, I need to have the following output:

Code:

ATOM     11  N   PRO A   1      23.223  20.197  14.441  1.00 12.21 11.37     N    
ATOM     12  CA  PRO A   1      21.881  20.749  14.227  1.00 11.37 11.37     C    
ATOM     13  C   PRO A   1      21.929  21.556  12.903  1.00 10.73 11.37     C    
ATOM     14  O   PRO A   1      22.872  22.308  12.668  1.00 12.80 11.37     O    
ATOM     15  CB  PRO A   1      21.641  21.649  15.437  1.00 10.87 11.37     C    
ATOM     16  CG  PRO A   1      22.595  21.065  16.473  1.00 11.96 11.37     C    
ATOM     17  CD  PRO A   1      23.844  20.738  15.680  1.00 12.56 11.37     C    
ATOM     18  N   GLN A   2      20.920  21.358  12.090  1.00 10.01 10.36     N    
ATOM     19  CA  GLN A   2      20.848  22.096  10.790  1.00 10.36 10.36     C    
ATOM     20  C   GLN A   2      20.523  23.577  11.104  1.00  9.48 10.36     C    
ATOM     21  O   GLN A   2      19.483  23.784  11.751  1.00  9.88 10.36     O    
ATOM     22  CB  GLN A   2      19.839  21.439   9.860  1.00  9.46 10.36     C    
ATOM     23  CG  GLN A   2      19.997  22.014   8.451  1.00 10.39 10.36     C    
ATOM     24  CD  GLN A   2      19.124  21.359   7.433  1.00 11.31 10.36     C    
ATOM     25  OE1 GLN A   2      18.853  20.153   7.480  1.00 11.96 10.36     O    
ATOM     26  NE2 GLN A   2      18.609  22.130   6.468  1.00 10.45 10.36     N    
ATOM     27  N   ALA A   3      21.334  24.475  10.596  1.00  9.29  9.15     N    
ATOM     28  CA  ALA A   3      21.051  25.904  10.815  1.00  9.15  9.15     C    
ATOM     29  C   ALA A   3      20.012  26.344   9.800  1.00  9.68  9.15     C    
ATOM     30  O   ALA A   3      20.253  26.155   8.562  1.00 11.79  9.15     O    
ATOM     31  CB  ALA A   3      22.339  26.688  10.684  1.00 11.79  9.15     C    
ATOM     32  N   ILE A   4      18.911  26.884  10.201  1.00  8.39  8.72     N    
ATOM     33  CA  ILE A   4      17.818  27.322   9.338  1.00  8.72  8.72     C    
ATOM     34  C   ILE A   4      17.469  28.769   9.682  1.00  8.88  8.72     C    
ATOM     35  O   ILE A   4      17.202  29.056  10.870  1.00 10.24  8.72     O    
ATOM     36  CB  ILE A   4      16.576  26.401   9.508  1.00  9.53  8.72     C    
ATOM     37  CG1 ILE A   4      16.904  24.950   9.073  1.00 10.08  8.72     C    
ATOM     38  CG2 ILE A   4      15.347  26.971   8.765  1.00 10.36  8.72     C    
ATOM     39  CD1 ILE A   4      15.720  23.972   9.288  1.00 11.15  8.72     C

Can anyone help me? Very much appreciated for your time.

Egy

View Public Profile for Egy

Find all posts by Egy

07-13-2017

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Hi, try:

Code:

awk '
  NR==FNR {
    if($3=="CA")
      A[$6]=substr($0,61,6)
    next
  }
  {
    print substr($0,1,66) A[$6] substr($0,73)
  }
'  file.pdb file.pdb

--
Note: the file needs to be specified twice

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

07-13-2017

Registered User

7, 0

Join Date: Apr 2016

Last Activity: 13 July 2017, 6:57 PM EDT

Posts: 7

Thanks Given: 4

Thanked 0 Times in 0 Posts

Thanks very much! It works great.
Is it possible to have it write to a new file, rather than overwrite the current file? I tried to modify the command trivially, but couldn't get anything to work.
If not easy to do, it is fine, I can make just make two files.
Very much appreciated.

Egy

View Public Profile for Egy

Find all posts by Egy

07-13-2017

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

You are welcome. Yes, you can simply redirect the output to a new file

Code:

  }
'  file.pdb file.pdb > new_file.pdb

Otherwise, I am not sure if I understand what you mean. The awk script itself was not overwriting a file, it was reading the same file twice, therefore the file name needs to be specified twice.

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

07-13-2017

Registered User

7, 0

Join Date: Apr 2016

Last Activity: 13 July 2017, 6:57 PM EDT

Posts: 7

Thanks Given: 4

Thanked 0 Times in 0 Posts

That worked! Thanks very much.

Egy

View Public Profile for Egy

Find all posts by Egy

UNIX for Beginners Questions & Answers

Assign a particular value to all items in a group that have the same identifier

9 More Discussions You Might Find Interesting

1. AIX

Restvg does not assign the correct PP size to volume group

Discussion started by: omonoiatis9

2. Shell Programming and Scripting

need a one liner to grep a group info from /etc/group and use that result to search passwd file

Discussion started by: chidori

3. Shell Programming and Scripting

is not an identifier

Discussion started by: Phuti

4. Solaris

how to assign group policy to user in solaris

Discussion started by: meet2muneer

5. Shell Programming and Scripting

not an identifier

Discussion started by: gyanibaba

6. Shell Programming and Scripting

Merge group numbers and add a column containing group names

Discussion started by: Lucky Ali

7. Solaris

-sh: is not an identifier

Discussion started by: megh

8. Shell Programming and Scripting

awk between items including items

Discussion started by: Ikon

9. UNIX for Dummies Questions & Answers

File creation and auto-group assign

Discussion started by: dhinge