Issue using awk to print min values for unique ids


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Issue using awk to print min values for unique ids
# 8  
Old 04-23-2019
I see. To be more specific I need the script to look at each field separately for each unique ID. So the min value for just Field 3 or just Field 4.
# 9  
Old 04-24-2019
Quote:
Originally Posted by vgersh99
a bit verbose, but..... sometyhing along these lines.
awk -f ncw.awk myFile where ncw.awk is:
Code:
{
  for(i=3;i<=NF;i++) {
     key[$1]
     if ( $i >0 && (!(($1, i) in f1) || f1[key[$1],i] > $i)) {
       f1[$1,i] = $i
       if (!($1,i) in f2) f2[$1,i] = $2
     }
  }
  nf=i
}
END {
  for( k in key)
    for(i=3; i<= nf; i++) {
      printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
    }
}

results in (given your sample input file):
Code:
01001 012017 10.64 3.96 4.45 3.48 9.58 10.76 4.44 5.03 2.78 1.45 1.26 3.24
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05

Thanks for the help, but this prints out the first line for each unique ID. Not necessarily the min value.

Input
Code:
01001   011895  7.03    2.96    8.36    3.53    3.96    5.40    3.92    3.36    0.73    2.03    1.44    3.66
01001   011896  5.86    5.42    5.54    3.98    3.77    6.24    4.38    2.57    0.82    1.66    2.89    1.94
01001   011897  3.27    6.63    10.94   4.35    0.81    1.57    3.96    5.02    0.87    0.75    1.84    4.38
01001   011898  2.33    2.07    2.60    4.56    0.54    3.13    5.80    6.02    1.51    3.21    6.66    3.91
01001   011899  5.80    6.94    3.35    2.22    2.93    2.31    6.80    2.90    0.63    3.02    1.98    5.25
.....
01003   011895  6.46    3.53    6.69    4.63    3.54    11.71   7.64    8.02    2.45    3.21    1.35    4.05
01003   011896  3.80    5.88    5.63    2.45    2.83    6.98    5.24    5.60    2.90    5.38    4.09    2.45
01003   011897  3.95    6.62    8.19    4.43    1.09    2.44    7.06    14.69   2.08    1.29    2.45    3.91
01003   011898  2.82    4.25    2.04    3.64    0.73    5.87    6.72    11.96   9.90    1.46    7.88    5.09
01003   011899  5.64    4.47    3.37    0.97    1.16    5.90    7.81    6.53    0.31    2.23    3.75    4.79
.....

Output
Code:
01001 011895 7.03 2.96 8.36 3.53 3.96 5.40 3.92 3.36 0.73 2.03 1.44 3.66
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05

# 10  
Old 04-24-2019
sorry - my bad. try this version:
Code:
{
  for(i=3;i<=NF;i++) {
     key[$1]
     if ( $i >0 && (!(($1, i) in f1) || f1[$1,i] > $i)) {
       f1[$1,i] = $i
       if (!($1,i) in f2) f2[$1,i] = $2
     }
  }
  nf=i
}
END {
  for( k in key)
    for(i=3; i<= nf; i++) {
      printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
    }
}

# 11  
Old 04-24-2019
These appear to be the same. Did I miss something?
# 12  
Old 04-24-2019
Adaptation of post #7 for the new requirements:

Code:
awk -v c=4 '
  !($1 in M) {
    M[$1]=$c
  }
  {
    if($c>-99.99 && $c<=M[$1]) {
      M[$1]=$c
      V[$1]=$2
    }
  }
  END {
    for(i in M) print i, M[i], V[i]
  }
' file

c is the column number
This User Gave Thanks to Scrutinizer For This Post:
# 13  
Old 04-24-2019
Thanks Scrutinizer. The output is as desired.
# 14  
Old 04-24-2019
Quote:
Originally Posted by ncwxpanther
These appear to be the same. Did I miss something?
it's not the same:
Code:
{
  for(i=3;i<=NF;i++) {
     key[$1]
     if ( $i >0 && (!(($1, i) in f1) || f1[$1,i] > $i)) {
       f1[$1,i] = $i
       if (!($1,i) in f2) f2[$1,i] = $2
     }
  }
  nf=i
}
END {
  for( k in key)
    for(i=3; i<= nf; i++) {
      printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
    }
}

given your sample file:
Code:
01001   011895  7.03    2.96    8.36    3.53    3.96    5.40    3.92    3.36    0.73    2.03    1.44    3.66
01001   011896  5.86    5.42    5.54    3.98    3.77    6.24    4.38    2.57    0.82    1.66    2.89    1.94
01001   011897  3.27    6.63    10.94   4.35    0.81    1.57    3.96    5.02    0.87    0.75    1.84    4.38
01001   011898  2.33    2.07    2.60    4.56    0.54    3.13    5.80    6.02    1.51    3.21    6.66    3.91
01001   011899  5.80    6.94    3.35    2.22    2.93    2.31    6.80    2.90    0.63    3.02    1.98    5.25
01003   011895  6.46    3.53    6.69    4.63    3.54    11.71   7.64    8.02    2.45    3.21    1.35    4.05
01003   011896  3.80    5.88    5.63    2.45    2.83    6.98    5.24    5.60    2.90    5.38    4.09    2.45
01003   011897  3.95    6.62    8.19    4.43    1.09    2.44    7.06    14.69   2.08    1.29    2.45    3.91
01003   011898  2.82    4.25    2.04    3.64    0.73    5.87    6.72    11.96   9.90    1.46    7.88    5.09
01003   011899  5.64    4.47    3.37    0.97    1.16    5.90    7.81    6.53    0.31    2.23    3.75    4.79

produces:
Code:
01001 011895 2.33 2.07 2.60 2.22 0.54 1.57 3.92 2.57 0.63 0.75 1.44 1.94
01003 011895 2.82 3.53 2.04 0.97 0.73 2.44 5.24 5.60 0.31 1.29 1.35 2.45

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt: PS003,001 MZMWR/ L-DWD// * PS003,001... (4 Replies)
Discussion started by: jvoot
4 Replies

2. Shell Programming and Scripting

Print count of unique values

Hello experts, I am converting a number into its binary output as : read n echo "obase=2;$n" | bc I wish to count the maximum continuous occurrences of the digit 1. Example : 1. The binary equivalent of 5 = 101. Hence the output must be 1. 2. The binary... (3 Replies)
Discussion started by: H squared
3 Replies

3. Shell Programming and Scripting

awk to print unique text in field

I am trying to use awk to print the unique entries in $2 So in the example below there are 3 lines but 2 of the lines match in $2 so only one is used in the output. File.txt chr17:29667512-29667673 NF1:exon.1;NF1:exon.2;NF1:exon.38;NF1:exon.4;NF1:exon.46;NF1:exon.47 703.807... (5 Replies)
Discussion started by: cmccabe
5 Replies

4. Shell Programming and Scripting

awk to filter out lines containing unique values in a specified column

Hi, I have multiple files that each contain four columns of strings: File1: Code: 123 abc gfh 273 456 ddff jfh 837 789 ghi u4u 395 File2: Code: 123 abc dd fu 456 def 457 nd 891 384 djh 783 I want to compare the strings in Column 1 of File 1 with each other file and Print in... (3 Replies)
Discussion started by: owwow14
3 Replies

5. Shell Programming and Scripting

How to get min and max values using awk?

Hi, I need your kind help to get min and max values from file based on value in $5 . File1 SP12.3 stc 2240806 2240808 + ID1_N003 ID2_N003T0 SP12.3 sto 2241682 2241684 + ID1_N003 ID2_N003T0 SP12.3 XE 2239943 2240011 + ID1_N003 ID2_N003T0 SP12.3 XE 2240077 2241254 + ID1_N003 ... (12 Replies)
Discussion started by: redse171
12 Replies

6. Shell Programming and Scripting

Print unique records in 2 columns using awk

Is it possible to print the records that has only 1 value in 2nd column. Ex: input awex1 1 awex1 2 awex1 3 assww 1 ader34 1 ader34 2 output assww 1 (5 Replies)
Discussion started by: quincyjones
5 Replies

7. UNIX for Dummies Questions & Answers

[Solved] Print a line using a max and a min values of different columns

Hi guys, I already search on the forum but i can't solve this on my own. I have a lot of files like this: And i need to print the line with the maximum value in last column but if the value is the same (2 in this exemple for the 3 last lines) i need get the line with the minimum value in... (4 Replies)
Discussion started by: MetaBolic0
4 Replies

8. Shell Programming and Scripting

AWK script - extracting min and max values from selected lines

Hi guys! I'm new to scripting and I need to write a script in awk. Here is example of file on which I'm working ATOM 4688 HG1 PRO A 322 18.080 59.680 137.020 1.00 0.00 ATOM 4689 HG2 PRO A 322 18.850 61.220 137.010 1.00 0.00 ATOM 4690 CD ... (18 Replies)
Discussion started by: grincz
18 Replies

9. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies

10. UNIX for Dummies Questions & Answers

Awk search for max and min field values

hi, i have an awk script and I managed to figure out how to search the max value but Im having difficulty in searching for the min field value. BEGIN {FS=","; max=0} NF == 7 {if (max < $6) max = $6;} END { print man, min} where $6 is the column of a field separated by a comma (3 Replies)
Discussion started by: Kirichiko
3 Replies
Login or Register to Ask a Question