Please help to fix awk script


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Please help to fix awk script
# 1  
Old 04-17-2013
Please help to fix awk script

Good morning, fellows. I would need to ask for your help in editing my awk script. Here is the original version:

BEGIN { printf ("CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1\n")
maxatoms=1000
natom=0
found_struct = 0
found_bond = 0
}
{
if( NF == 5 )
{
foundff=0
natom++
fftype[natom]="UNKNOWN"
if ($1 ~ /CT/)
{
fftype[natom] = "C"
foundff=1
}
else if ($1 ~ /OH/)
{
fftype[natom] = "O"
foundff=1
}
else if ($1 ~ /HC/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 ~ /N/)
{
fftype[natom] = "N"
foundff=1
}

else if ($1 ~ /H1/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 ~ /HO/)
{
fftype[natom] = "H"
foundff=1
}
else if ($1 = "C")
{
fftype[natom] = "C"
foundff=1
}
else if ($1 = "O")
{
fftype[natom] = "O"
foundff=1
}

next

x[natom] = $1
y[natom] = $2
z[natom] = $3


if (foundff == 0)
printf("PROBLEM : Atom ff type %s not known\n", $6)
}

}

END {
for (iatom=1; iatom <= natom; iatom++)
{
printf("HETATM %d %2s %d %14.9f %14.9f %14.9f\n" ,
iatom, fftype[iatom], iatom, x[iatom], y[iatom], z[iatom])
}
printf ("END\n")
}


And this is type of file I am working with.

0 3 186 200 75202
timestep 500 186 0 3 0.002000 1.000000
40.0000000000 0.0000000000 0.0000000000
-0.0000000034 40.0000000000 0.0000000000
-0.0000000034 -0.0000000034 40.0000000000
CT_1 1 12.011000 0.061000 1.087513
-1.961325738 1.828501682 -8.933652557
CT_1 2 12.011000 0.061000 0.789711
-3.851025437 3.495427316 -10.05849230
CT_1 3 12.011000 0.061000 0.581330
-5.804493575 4.589489777 -8.369482861

ect

I would like to get this as an output:

CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1
HETATM 1 C 1 -1.961325738 1.828501682 -8.933652557
HETATM 2 C 2 -3.851025437 3.495427316 -10.05849230
HETATM 3 C 3 -5.804493575 4.589489777 -8.369482861

ect

But coordinates are not really picking up well (next line after CT_1 1 12.011000 0.061000 1.087513). Can you please have a look and suggest any solutions. Thanks.
# 2  
Old 04-17-2013
Please use code tags as required by forum rules!

Replace the next command with getline. BTW, there's much opportunities to improve / trim / prune your code. First, indent code so the logics become clearer at first sight. Second, try to avoid that many if - else if branches. There may be better ways to get the logics coded.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 04-17-2013
In the sample input, I don't see any of the patterns the script is looking for, like /OH/, /HC/ etc.
# 4  
Old 04-17-2013
Thanks a lot for your help. The thing is I am just a newbie in programming, so I might use come over-complicated algorithm and ask stupid questions. One of them is how to use getline in this case, cause I couldn't figure it out even after reading manual on it.

As for input file - it is huge and there will be more lines with O,N,H.

---------- Post updated at 03:39 AM ---------- Previous update was at 03:26 AM ----------

Just "getline". Now it works. Thanks.
# 5  
Old 04-17-2013
Both next and getline get the next record (next line).

The difference is:
- next restarts the script
- getline continues the script
This User Gave Thanks to hanson44 For This Post:
# 6  
Old 04-17-2013
Thanks. One last thing - I have a multiple enteries in file -
Code:
timestep       500        25         0         3    0.001000    0.500000
       20.0000000000        0.0000000000        0.0000000000
       -0.0000000017       20.0000000000        0.0000000000
       -0.0000000017       -0.0000000017       20.0000000000
CT_1             1   12.011000   -0.019000    0.640298
    0.9849963161E-01    -3.415112416         1.363374178
HC               2    1.008000    0.080000    1.479956
   -0.2121662555        -3.863455405        0.3080721923
HC               3    1.008000    0.080000    1.523069
    -1.497545961        -4.274924074         2.443930026
HC               4    1.008000    0.080000    1.333436
   -0.2046782675E-01    -3.884045345         3.393900210
HC               5    1.008000    0.080000    0.475593
    0.6645083761E-01    -5.274859543         2.207872210
HC               6    1.008000    0.080000    2.148578
     2.012043588        -4.255526008         1.282247861
HC               7    1.008000    0.080000    2.318435
     2.030435317        -2.972730305         2.435165857
HC               8    1.008000    0.080000    2.579209
     1.986543784        -2.614081904        0.6651575328
HC               9    1.008000    0.080000    2.835455
    -4.516000060       -0.6390584439        0.1821449846
HC              10    1.008000    0.080000    2.900618
    -3.847187217        -2.017469231       -0.6873319288
HC              11    1.008000    0.080000    3.014232
    -4.558244417       -0.6516313832        -1.645346738
CT_2            12   12.011000   -0.240000    2.414372
    -4.027556611       -0.9305956823       -0.7567879539
CT_2            13   12.011000   -0.240000    1.889602
     1.660000184        -3.278731061         1.464017002
CT_2            14   12.011000   -0.240000    0.463260
   -0.3736403707        -4.330624723         2.450383850
C               15   12.011000    0.569000    0.560350
   -0.5366914029        -1.980928569         1.574219882
O               16   15.999000   -0.570000    1.607385
   -0.4278922248        -1.357753029         2.570890083
N               17   14.007000   -0.730000    0.140561
    -1.208805366        -1.532235679        0.5673463632
H               18    1.008000    0.370000    0.907631
    -1.136311398        -2.139234863       -0.2786154629
CT_3            19   12.011000    0.140000    0.388223
    -1.903616044       -0.2739084042        0.4907961156
CT_4            20   12.011000    0.200000    0.889058
    -2.713660887       -0.1945789844       -0.8233791846
H1              21    1.008000    0.080000    1.790695
    -2.144431000       -0.6193961442        -1.696199396
H1              22    1.008000    0.080000    1.251414
    -1.113333313        0.4597479795        0.4873591014
H1              23    1.008000    0.080000    1.401424
    -2.605232776       -0.2539148024         1.355851660
OH              24   15.999000   -0.680000    0.170902
    -2.883721417         1.170430901        -1.063007617
HO              25    1.008000    0.400000    0.802468
    -3.755698183         1.390412071       -0.7269985042
timestep      1500        25         0         3    0.001000    1.500000
       20.0000000000        0.0000000000        0.0000000000
       -0.0000000017       20.0000000000        0.0000000000
       -0.0000000017       -0.0000000017       20.0000000000
CT_1             1   12.011000   -0.019000    1.108827
    0.5930813433        -3.059647383        0.9093170534
HC               2    1.008000    0.080000    1.035396
    0.4325261101        -2.863254539         1.968758844
HC               3    1.008000    0.080000    4.510480
     1.943889135        -1.400824972        0.4446815567
HC               4    1.008000    0.080000    5.038050
     1.997587984        -2.723903724       -0.6330481166
HC               5    1.008000    0.080000    3.495500
     2.776704671        -2.790314949        0.9971734449
HC               6    1.008000    0.080000    2.646696
     1.303777680        -5.015186286         1.216356852
HC               7    1.008000    0.080000    4.899226
    0.5444482658        -4.839385307       -0.3926739911
HC               8    1.008000    0.080000    4.126721
   -0.3351590642        -4.945216690         1.087335425
HC               9    1.008000    0.080000    5.025144
    -2.335162001         1.703747125         1.766118970
HC              10    1.008000    0.080000    5.259478
    -1.485614704        0.5232332483         2.667235397
HC              11    1.008000    0.080000    6.747800
    -3.119713679         1.071097349         3.299766915
CT_2            12   12.011000   -0.240000    4.989691
    -2.493930014        0.8599303479         2.445446594
CT_2            13   12.011000   -0.240000    3.045725
    0.5382711509        -4.518100324        0.7268131019
CT_2            14   12.011000   -0.240000    3.428692
     1.873711712        -2.498659898        0.4361924427
C               15   12.011000    0.569000    1.644059
   -0.6081926085        -2.279925921        0.1863969525
O               16   15.999000   -0.570000    3.887154
   -0.7777844810        -2.458871627        -1.043373240
N               17   14.007000   -0.730000    0.487297
    -1.381080530        -1.599443295        0.9939790523
H               18    1.008000    0.370000    2.277922
    -1.248085227        -1.605021545         2.010060279
CT_3            19   12.011000    0.140000    0.355424
    -2.357481513       -0.7177380924        0.4916404910
CT_4            20   12.011000    0.200000    2.979709
    -3.243082218       -0.1626320330         1.574424959
H1              21    1.008000    0.080000    4.505487
    -3.471572515       -0.9883864713         2.248890966

And now it is stucking them all in one, what will be the best way to display separatly each frame(starting from timestep) and setting natom to 1 each new frame. Thanks.
# 7  
Old 04-17-2013
This may be close to what you want, with some modifications to better handle the initial part of the data in each timestep section.
Code:
$ cat atoms.awk
BEGIN { printf ("CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1\n")
  maxatoms = 1000;
  natom = 0
  found_struct = 0
  found_bond = 0
  }

{ if (NF == 5) {
  foundff = 0
  natom++
  fftype[natom] = "UNKNOWN"
  if ($1 ~ /CT/) {
    foundff=1; fftype[natom] = "C"
    }
  else if ($1 ~ /OH/) {
    foundff=1; fftype[natom] = "O"
    }
  else if ($1 ~ /HC/) {
    foundff=1; fftype[natom] = "H"
    }
  else if ($1 ~ /N/) {
    foundff=1; fftype[natom] = "N"
    }
  else if ($1 ~ /H1/) {
    foundff=1; fftype[natom] = "H"
    }
  else if ($1 ~ /HO/) {
    foundff=1; fftype[natom] = "H"
    }
  else if ($1 = "C") {
    foundff=1; fftype[natom] = "C"
    }
  else if ($1 = "O") {
    foundff=1; fftype[natom] = "O"
    }

  getline

  x[natom] = $1
  y[natom] = $2
  z[natom] = $3

  if (foundff == 0)
    printf("PROBLEM : Atom ff type %s not known\n", $6)
    }
  }

function report_frame (iatom) {
  for (iatom=1; iatom <= natom; iatom++) {
    printf ("HETATM %d %2s %d %14.9f %14.9f %14.9f\n",
    iatom, fftype[iatom], iatom, x[iatom], y[iatom], z[iatom])
    }
  printf ("END of frame\n")
  natom = 0
  }

END { report_frame() }

/timestep/ { report_frame() }

Code:
$ awk -f atoms.awk atoms.dat
CRYST1 200.000 200.000 200.000 90.00 90.00 90.00 P 1 1
HETATM 1  C 1    0.000000000  500.000000000   25.000000000
END of frame
HETATM 1  C 1    0.098499632   -3.415112416    1.363374178
HETATM 2  H 2   -0.212166256   -3.863455405    0.308072192
HETATM 3  H 3   -1.497545961   -4.274924074    2.443930026
HETATM 4  H 4   -0.020467827   -3.884045345    3.393900210
HETATM 5  H 5    0.066450838   -5.274859543    2.207872210
HETATM 6  H 6    2.012043588   -4.255526008    1.282247861
HETATM 7  H 7    2.030435317   -2.972730305    2.435165857
HETATM 8  H 8    1.986543784   -2.614081904    0.665157533
HETATM 9  H 9   -4.516000060   -0.639058444    0.182144985
HETATM 10  H 10   -3.847187217   -2.017469231   -0.687331929
HETATM 11  H 11   -4.558244417   -0.651631383   -1.645346738
HETATM 12  C 12   -4.027556611   -0.930595682   -0.756787954
HETATM 13  C 13    1.660000184   -3.278731061    1.464017002
HETATM 14  C 14   -0.373640371   -4.330624723    2.450383850
HETATM 15  C 15   -0.536691403   -1.980928569    1.574219882
HETATM 16  C 16   -0.427892225   -1.357753029    2.570890083
HETATM 17  N 17   -1.208805366   -1.532235679    0.567346363
HETATM 18  C 18   -1.136311398   -2.139234863   -0.278615463
HETATM 19  C 19   -1.903616044   -0.273908404    0.490796116
HETATM 20  C 20   -2.713660887   -0.194578984   -0.823379185
HETATM 21  H 21   -2.144431000   -0.619396144   -1.696199396
HETATM 22  H 22   -1.113333313    0.459747980    0.487359101
HETATM 23  H 23   -2.605232776   -0.253914802    1.355851660
HETATM 24  O 24   -2.883721417    1.170430901   -1.063007617
HETATM 25  H 25   -3.755698183    1.390412071   -0.726998504
END of frame
HETATM 1  C 1    0.593081343   -3.059647383    0.909317053
HETATM 2  H 2    0.432526110   -2.863254539    1.968758844
HETATM 3  H 3    1.943889135   -1.400824972    0.444681557
HETATM 4  H 4    1.997587984   -2.723903724   -0.633048117
HETATM 5  H 5    2.776704671   -2.790314949    0.997173445
HETATM 6  H 6    1.303777680   -5.015186286    1.216356852
HETATM 7  H 7    0.544448266   -4.839385307   -0.392673991
HETATM 8  H 8   -0.335159064   -4.945216690    1.087335425
HETATM 9  H 9   -2.335162001    1.703747125    1.766118970
HETATM 10  H 10   -1.485614704    0.523233248    2.667235397
HETATM 11  H 11   -3.119713679    1.071097349    3.299766915
HETATM 12  C 12   -2.493930014    0.859930348    2.445446594
HETATM 13  C 13    0.538271151   -4.518100324    0.726813102
HETATM 14  C 14    1.873711712   -2.498659898    0.436192443
HETATM 15  C 15   -0.608192609   -2.279925921    0.186396952
HETATM 16  C 16   -0.777784481   -2.458871627   -1.043373240
HETATM 17  N 17   -1.381080530   -1.599443295    0.993979052
HETATM 18  C 18   -1.248085227   -1.605021545    2.010060279
HETATM 19  C 19   -2.357481513   -0.717738092    0.491640491
HETATM 20  C 20   -3.243082218   -0.162632033    1.574424959
HETATM 21  H 21   -3.471572515   -0.988386471    2.248890966
END of frame

This User Gave Thanks to hanson44 For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

awk file subtracted by a fix value - conditioned

Hi all... i have been trying to make this work but I have been failing for 6 hours .. I know it should be something simple that I am missing to it would be great if you can help me ... I want to subtract a fixed value (lets set 1) from any value >=1 from the whole file my file looks like ... (4 Replies)
Discussion started by: A-V
4 Replies

2. Shell Programming and Scripting

awk if condition match and fix print decimal place

Hi All, I have problem in the middle of implementing to users, whereby the complaint is all about the decimal place which is too long. I need two decimal places only, but the outcome from command is always fixed to 6. See the sample : before: Sort Total Site Sort SortName Parts ... (3 Replies)
Discussion started by: horsepower
3 Replies

3. Shell Programming and Scripting

My script failed and can't fix it ?

Hi , I'd like to give you a little bit idea about my script which is used to get any generated file from remote server using ftp session then organized those file into directories based on their date ( at the end I supposed to have 1 months directories 20130401 20130402 ....20130430 ,... (27 Replies)
Discussion started by: arm
27 Replies

4. Shell Programming and Scripting

Fix timestamp with Sed or Awk

Hi I am dealing with the following string: Date: Thur, 13 March 2011 01:01:10 +0000 I asked for help in another topic that converted a similar string: Date: Thur, 13 March 2011 9:50 AM To a 24 hr standard. The problem is that it comes out as: Date: Thur, 13 March 2011 9:50:00 +0000... (4 Replies)
Discussion started by: duonut
4 Replies

5. Shell Programming and Scripting

How to fix this awk

I have a script which will mask the 9th and 15th column in a record starting with BPR. The record looks like below before my script BPR*C*160860.04*C*ACH*CTX*01*072000326*DA*1548843*3006968523**01*071000013*DA*5529085*100323*VEN The record will be masked after my script parses this... (19 Replies)
Discussion started by: Muthuraj K
19 Replies

6. Shell Programming and Scripting

how to fix the column length in a file using Awk Prog

Hi I use the following code to read the file and to fix the length of the column of the record in the file 'Sample.txt' ls Samp* | awk ' { a=$1 } END{ FS="n" for(i=1;i<=NR;i++) { while( getline < a ) { f1=$0; print("Line::",f1); f2=substr(f1,1,10) print("Field1::",f2);... (10 Replies)
Discussion started by: meva
10 Replies

7. Shell Programming and Scripting

how to fix this awk script?

i have a log file while looks like this ++ user_a blabla blabla nas_b blabla user_d this is a user_a junk line another junk line user_c nas_m blabla ++ basically most of the lines contain a "user" keywords, and the rest of the lines do not have "user" at all. So I have the... (17 Replies)
Discussion started by: fedora
17 Replies

8. Shell Programming and Scripting

AWK record length fix

Hi Friends, Need some help in AWK. Working on AIX 5 Have been trying the following functionality to make the record length fixed: if( length(record) < 300 ) { printf("%-300s\n", record); } In my opinion it will apply some fillers in the end. Its is not making any... (4 Replies)
Discussion started by: kanu_pathak
4 Replies

9. Shell Programming and Scripting

fix a problem in this script

z=9 i=0 h=02 min=55 while do cat /home/barmecha/test | grep $h:$min >> /home/barmecha/file1 min=`expr $min + 1` if ; then h=`expr $h + 1` fi i=`expr $i + 1` done i have a log file with time wise log in it, this script help me to pull out logs of the give time interval...but the... (8 Replies)
Discussion started by: abhishek27
8 Replies

10. Shell Programming and Scripting

awk / shell - Fix broken lines and data

Gurus, I am struggling with a issue and thought I could use some of your expertise. Need Help with this I have a flat file that has millions of records 24|john|account ~ info |56| 25|kuo|account ~ journal |58| 27|kim|account ~ journal |59| 28|San|account ~ journal |60|... (3 Replies)
Discussion started by: rimss
3 Replies
Login or Register to Ask a Question