AWK script - extracting min and max values from selected lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK script - extracting min and max values from selected lines
# 1  
Old 02-02-2012
AWK script - extracting min and max values from selected lines

Hi guys!
I'm new to scripting and I need to write a script in awk.

Here is example of file on which I'm working

HTML Code:
 ATOM   4688  HG1 PRO A 322      18.080  59.680 137.020  1.00  0.00            
ATOM   4689  HG2 PRO A 322      18.850  61.220 137.010  1.00  0.00            
ATOM   4690  CD  PRO A 322      18.800  60.090 135.140  1.00  0.00            
ATOM   4691  HD1 PRO A 322      17.770  60.020 134.790  1.00  0.00            
ATOM   4692  HD2 PRO A 322      19.330  60.890 134.620  1.00  0.00            
ATOM   4693  C   PRO A 322      20.020  56.920 136.410  1.00  0.00            
ATOM   4694  O1  PRO A 322      18.780  56.610 136.300  1.00  0.00            
ATOM   4695  O2  PRO A 322      20.890  56.130 136.870  1.00  0.00            
ATOM   4696  C1   TB B 323      85.140  34.010  62.880  1.00  0.00            
ATOM   4697  C2   TB B 323      84.350  35.240  62.580  1.00  0.00            
ATOM   4698  C3   TB B 323      84.790  35.750  61.220  1.00  0.00            
ATOM   4699  C4   TB B 323      83.900  36.900  60.810  1.00  0.00            
ATOM   4700  O5   TB B 323      84.420  38.000  60.510  1.00  0.00            
ATOM   4701  O6   TB B 323      82.490  36.550  60.770  1.00  0.00            
ATOM   4702  C7   TB B 323      81.780  37.770  60.540  1.00  0.00            
ATOM   4703  C8   TB B 323      80.240  37.650  60.340  1.00  0.00
I would like to obtain min and max values of column 9 from lines having TB in 4th column. I was able to extract min and max values from whole file but couldn't restrict it to the lines with TB in 4th column. Any advice?

TY in advance!
# 2  
Old 02-02-2012
Hi grincz,

One way using awk:
Code:
$ cat infile
 ATOM   4688  HG1 PRO A 322      18.080  59.680 137.020  1.00  0.00            
ATOM   4689  HG2 PRO A 322      18.850  61.220 137.010  1.00  0.00            
ATOM   4690  CD  PRO A 322      18.800  60.090 135.140  1.00  0.00            
ATOM   4691  HD1 PRO A 322      17.770  60.020 134.790  1.00  0.00            
ATOM   4692  HD2 PRO A 322      19.330  60.890 134.620  1.00  0.00            
ATOM   4693  C   PRO A 322      20.020  56.920 136.410  1.00  0.00            
ATOM   4694  O1  PRO A 322      18.780  56.610 136.300  1.00  0.00            
ATOM   4695  O2  PRO A 322      20.890  56.130 136.870  1.00  0.00            
ATOM   4696  C1   TB B 323      85.140  34.010  62.880  1.00  0.00            
ATOM   4697  C2   TB B 323      84.350  35.240  62.580  1.00  0.00            
ATOM   4698  C3   TB B 323      84.790  35.750  61.220  1.00  0.00            
ATOM   4699  C4   TB B 323      83.900  36.900  60.810  1.00  0.00            
ATOM   4700  O5   TB B 323      84.420  38.000  60.510  1.00  0.00            
ATOM   4701  O6   TB B 323      82.490  36.550  60.770  1.00  0.00            
ATOM   4702  C7   TB B 323      81.780  37.770  60.540  1.00  0.00            
ATOM   4703  C8   TB B 323      80.240  37.650  60.340  1.00  0.00
$ awk 'BEGIN { max = -1; min = -1 }
  $4 == "TB" {
    max = max < $9 ? $9 : max;
    min = (min == -1 || min > $9) ? $9 : min
  }
  END { printf "max = %.3f\nmin = %.3f\n", max, min }
' infile
max = 62.880
min = 60.340

Regards,
Birei
This User Gave Thanks to birei For This Post:
# 3  
Old 02-02-2012
Birei, thank You very much!
I just noticed that some of the results must be from 8th column - some lines lack 5th column from early lines. Maybe it is possible to count columns backwards? Or make a condition saying that if in the 5th column there is a letter script should take value from 9th column, else from 8th?
Code:
 ATOM   4688  HG1 PRO A 322      18.080  59.680 137.020  1.00  0.00            
ATOM   4689  HG2 PRO A 322      18.850  61.220 137.010  1.00  0.00            
ATOM   4690  CD  PRO A 322      18.800  60.090 135.140  1.00  0.00            
ATOM   5178  C21  TB X 345      78.520  55.030  66.630  1.00  0.00            
ATOM   5179  C1   TB Y 346      54.110  41.980  81.650  1.00  0.00            
ATOM   5180  C2   TB Y 346      55.480  42.240  82.250  1.00  0.00            
ATOM   5181  C3   TB Y 346      56.470  41.760  81.170  1.00  0.00            
ATOM   5182  C4   TB Y 346      57.930  41.990  81.460  1.00  0.00            
ATOM   5183  O5   TB Y 346      58.450  41.810  82.590  1.00  0.00            
ATOM   5184  O6   TB Y 346      58.660  42.140  80.220  1.00  0.00            
ATOM   5185  C7   TB Y 346      60.070  42.430  80.520  1.00  0.00            
ATOM   5186  C8   TB Y 346      61.130  42.800  79.430  1.00  0.00            
ATOM   5187  O9   TB Y 346      62.430  43.290  79.860  1.00  0.00            
ATOM   5188  C10  TB Y 346      63.400  42.500  80.470  1.00  0.00                      
ATOM   5198  C20  TB Y 346      58.830  43.040  74.180  1.00  0.00            
ATOM   5199  C21  TB Y 346      59.260  42.850  72.710  1.00  0.00            
ATOM   5200  C1   TB Z 347      66.200  64.420  74.140  1.00  0.00            
ATOM   5201  C2   TB Z 347      65.770  63.120  73.420  1.00  0.00            
ATOM   5202  C3   TB Z 347      65.520  62.060  74.480  1.00  0.00            
ATOM   5203  C4   TB Z 347      65.220  60.710  73.880  1.00  0.00            
ATOM   5204  O5   TB Z 347      65.740  60.380  72.810  1.00  0.00            
ATOM   5205  O6   TB Z 347      64.790  59.800  74.890  1.00  0.00                      
ATOM   5221  C1   TB   348      82.400  42.410  76.490  1.00  0.00            
ATOM   5222  C2   TB   348      81.300  43.360  76.020  1.00  0.00            
ATOM   5223  C3   TB   348      81.800  44.780  75.790  1.00  0.00            
ATOM   5224  C4   TB   348      80.550  45.700  75.480  1.00  0.00            
ATOM   5225  O5   TB   348      80.390  46.740  76.150  1.00  0.00            
ATOM   5226  O6   TB   348      79.690  45.360  74.310  1.00  0.00            
ATOM   5227  C7   TB   348      78.480  46.220  74.270  1.00  0.00            
ATOM   5228  C8   TB   348      77.460  46.020  73.150  1.00  0.00            
ATOM   5229  O9   TB   348      76.250  46.810  73.220  1.00  0.00            
ATOM   5230  C10  TB   348      76.160  47.920  72.370  1.00  0.00            
ATOM   5231  O11  TB   348      77.230  48.280  71.940  1.00  0.00            
ATOM   5232  C12  TB   348      74.880  48.650  72.320  1.00  0.00            
ATOM   5233  C13  TB   348      74.880  49.800  71.380  1.00  0.00            
ATOM   5234  C14  TB   348      73.520  50.550  71.610  1.00  0.00            
ATOM   5235  C15  TB   348      77.190  44.510  73.170  1.00  0.00            
ATOM   5236  O16  TB   348      76.640  44.120  71.890  1.00  0.00            
ATOM   5237  C17  TB   348      75.870  42.970  72.030  1.00  0.00            
ATOM   5238  O18  TB   348      75.580  42.440  73.130  1.00  0.00            
ATOM   5239  C19  TB   348      75.490  42.400  70.680  1.00  0.00            
ATOM   5240  C20  TB   348      76.150  41.020  70.640  1.00  0.00            
ATOM   5241  C21  TB   348      75.980  40.350  69.280  1.00  0.00            
ATOM   5242  C1   TB   349      74.410  58.030  87.730  1.00  0.00

# 4  
Old 02-02-2012
Try with this. What modified? In case fifth field is an alphabetic character, asign the ninth column to value variable, else assign the eigth one. And work with that.
Code:
$ awk 'BEGIN { max = -1; min = -1 }
  $4 == "TB" {
    value = $5 ~ /^[[:alpha:]]$/ ? $9 : $8;  
    max = max < value ? value : max;
    min = (min == -1 || min > value) ? value : min
  }
  END { printf "max = %.3f\nmin = %.3f\n", max, min }
' infile
max = 87.730
min = 66.630

Regards,
Birei
# 5  
Old 02-02-2012
Unfortunately it doesn't work. I have much larger file - around 100k lines and it gives me such results...
Code:
max = 99,000
min = 100,000

Any ideas? I checked and there are no more variants of the TB lines...
# 6  
Old 02-02-2012
Try this...
Code:
awk '/TB/{f=$(NF-2);!min?min=f:0;max=f>max?f:max;min=f>min?min:f}END{print max":"min}' infile

--ahamed
# 7  
Old 02-03-2012
Hi ahamed,
results are more possible bus still not there..
Code:
max = 99,000
min = -1,000

I know that max should be over 160 and min around 40-50.
@Birei script is working on a smaller files but with the bigger files it doesn't work.

Maybe the problem is that some lines have 4digit values in the 5th column - like this?
Code:
ATOM  25935  O9   TB  1334      -0.810  62.300  67.500  1.00  0.00            
ATOM  25936  C10  TB  1334       0.460  61.650  67.370  1.00  0.00            
ATOM  25937  O11  TB  1334       1.350  61.990  66.560  1.00  0.00            
ATOM  25938  C12  TB  1334       0.690  60.320  68.140  1.00  0.00            
ATOM  25939  C13  TB  1334       1.260  60.630  69.520  1.00  0.00            
ATOM  25940  C14  TB  1334       1.220  59.500  70.510  1.00  0.00            
ATOM  25941  C15  TB  1334      98.740  64.820  67.620  1.00  0.00            
ATOM  25942  O16  TB  1334      98.500  66.090  67.050  1.00  0.00            
ATOM  25943  C17  TB  1334      98.350  67.110  68.070  1.00  0.00            
ATOM  25944  O18  TB  1334      98.930  67.020  69.120  1.00  0.00            
ATOM  25945  C19  TB  1334      97.930  68.370  67.370  1.00  0.00            
ATOM  25946  C20  TB  1334      96.460  68.870  67.670  1.00  0.00            
ATOM  25947  C21  TB  1334      96.170  70.140  66.910  1.00  0.00

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk Sort 2d histogram output from min(X,Y) to max(X,Y)

I've got Gnuplot-format 2D histogram data output which looks as follows. 6.5 -1.25 10.2804 6.5404 -1.25 10.4907 6.58081 -1.25 10.8087 6.62121 -1.25 10.4686 6.66162 -1.25 10.506 6.70202 -1.25 10.3084 6.74242 -1.25 9.68256 6.78283 -1.25 9.41229 6.82323 -1.25 9.43078 6.86364 -1.25 9.62408... (1 Reply)
Discussion started by: chrisjorg
1 Replies

2. Shell Programming and Scripting

How to get min and max values using awk?

Hi, I need your kind help to get min and max values from file based on value in $5 . File1 SP12.3 stc 2240806 2240808 + ID1_N003 ID2_N003T0 SP12.3 sto 2241682 2241684 + ID1_N003 ID2_N003T0 SP12.3 XE 2239943 2240011 + ID1_N003 ID2_N003T0 SP12.3 XE 2240077 2241254 + ID1_N003 ... (12 Replies)
Discussion started by: redse171
12 Replies

3. Shell Programming and Scripting

awk script to find min and max value

I need to find the max/min of columns 1 and 2 of a 2 column file what contains the special character ">". I know that this will find the max value of column 1. awk 'BEGIN {max = 0} {if ($1>max) max=$1} END {print max}' input.file But what if I needed to ignore special characters in the... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

4. Shell Programming and Scripting

Get the min avg and max with awk

aaa: 3 ms aaa: 2 ms aaa: 5 ms aaa: 10 ms .......... to get the 3 2 5 10 ...'s min avg and max something like min: 2 ms avg: 5 ms max: 10 ms (2 Replies)
Discussion started by: yanglei_fage
2 Replies

5. UNIX for Dummies Questions & Answers

[Solved] Print a line using a max and a min values of different columns

Hi guys, I already search on the forum but i can't solve this on my own. I have a lot of files like this: And i need to print the line with the maximum value in last column but if the value is the same (2 in this exemple for the 3 last lines) i need get the line with the minimum value in... (4 Replies)
Discussion started by: MetaBolic0
4 Replies

6. Shell Programming and Scripting

trying to print selected fields of selected lines by AWK

I am trying to print 1st, 2nd, 13th and 14th fields of a file of line numbers from 29 to 10029. I dont know how to put this in one code. Currently I am removing the selected lines by awk 'NR==29,NR==10029' File1 > File2 and then doing awk '{print $1, $2, $13, $14}' File2 > File3 Can... (3 Replies)
Discussion started by: ananyob
3 Replies

7. UNIX for Dummies Questions & Answers

Iterate a min/max awk script over time-series temperature data

I'm trying to iterate a UNIX awk script that returns min/max temperature data for each day from a monthly weather data file (01_weath.dat). The temperature data is held in $5. The temps are reported each minute so each day contains 1440 temperature enteries. The below code has gotten me as far as... (5 Replies)
Discussion started by: jgourley
5 Replies

8. UNIX for Dummies Questions & Answers

Awk search for max and min field values

hi, i have an awk script and I managed to figure out how to search the max value but Im having difficulty in searching for the min field value. BEGIN {FS=","; max=0} NF == 7 {if (max < $6) max = $6;} END { print man, min} where $6 is the column of a field separated by a comma (3 Replies)
Discussion started by: Kirichiko
3 Replies

9. UNIX for Dummies Questions & Answers

extracting selected few lines through perl

How can I extract few lines(like 10 to 15, top 10 and last 10) from a file using perl. I do it with sed, head and tail in unix scripting. I am new to perl. Appreciate your help. (2 Replies)
Discussion started by: paruthiveeran
2 Replies

10. Shell Programming and Scripting

max values amd min values

Hello every one, I have following data ***CAMPAIGN 1998 CONTRIBUTIONS*** --------------------------------------------------------------------------- NAME PHONE Jan | Feb | Mar | Total Donated ... (12 Replies)
Discussion started by: devmiral
12 Replies
Login or Register to Ask a Question