Integrate MIN and MAX in a string


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Integrate MIN and MAX in a string
# 1  
Old 02-13-2013
Integrate MIN and MAX in a string

I need to use awk for this task !

input (fields are separated by ";"):
Code:
1%2%3%4%;AA
5%6%7%8%9;AA
1%2%3%4%5%6;BB
7%8%9%10%11%12;BB

In the 1st field there are patterns composed of numbers separated by "%".
The 2nd field define groups (here two different groups called "AA" and "BB").
Records are not necessarily sorted by groups like here.

For each group independently, I would need to get the max and min values at every position where there is a number and return the result in a third field like that:
Code:
1%2%3%4%;AA;1-5%2-6%3-7%4-8%0-9   #note if there is no number after a "%" it counts as "0"
5%6%7%8%9;AA;1-5%2-6%3-7%4-8%0-9
1%2%3%4%5%6;BB;1-7%2-8%3-9%4-10%5-11%6-12
7%8%9%10%11%12;BB;1-7%2-8%3-9%4-10%5-11%6-12

What I did so far but cannot manage to get the entire result in the 3rd field for each group:
Code:
awk 'BEGIN{FS=OFS=";"}

{
    a = split($1,b,"%")

    for (i=1; i<=a; i++){
        if(b[i] ~ ""){b[i] = "0"}
        else{b[i] = b[i]}
        
        if((!MIN[b[i]]) || (MIN[b[i]] > b[i])) MIN[b[i]] = b[i]
        if((!MAX[b[i]]) || (MAX[b[i]] < b[i])) MAX[b[i]] = b[i]
    }

END{for (i=1; length(MIN[b[i]]); i++)
            printf("%s",$1 ";" $2 ";" MIN[b[i]]"-"MAX[b[i]]%")
      }'

Any help would be greatly appreciated !!

Last edited by beca123456; 02-13-2013 at 08:18 AM.. Reason: Typo in "BB" group output
# 2  
Old 02-13-2013
Try:
Code:
awk -F\; 'FNR==NR{
for(i in a) delete a[i]
n=split($1,a,/%/)
if(!($2 in firstdone)) {firstdone[$2]; maxf[$2]=n; for(i=1;i<=n;i++) max[$2,i]=min[$2,i]=a[i]; next }
if(n>maxf[$2]) { for(i=maxf[$2]+1;i<=n;i++) max[$2,i]=min[$2,i]=a[i]; maxf[$2]=n }
for(i=1;i<=maxf[$2];i++) {
 if(a[i]>max[$2,i]) max[$2,i]=a[i]
 if(a[i]<min[$2,i]) min[$2,i]=a[i]
}
next
}
{str=""
for(i=1;i<=maxf[$2];i++)
 str = str ( str"" ? "%" : "" ) sprintf("%d-%d",min[$2,i],max[$2,i])
print $0,str}' OFS=\; file file


Last edited by elixir_sinari; 02-13-2013 at 09:33 AM..
This User Gave Thanks to elixir_sinari For This Post:
# 3  
Old 02-13-2013
@elixir_sinari: 2 fast 4 me.
Nevertheless, this might at least be worth a shot:
Code:
awk     'BEGIN{FS=OFS=";"}
         {line[NR] = $0
          a = split ($1, b, "%")
          for (i=1; i<=a; i++)
                {b[i]=b[i]?b[i]:"0";
                 if ((!MIN [$2,i]) || (MIN [$2,i] > b[i])) MIN [$2,i] = b[i]
                 if ((!MAX [$2,i]) || (MAX [$2,i] < b[i])) MAX [$2,i] = b[i]
                }
          if (a > maxfld[$2]) maxfld[$2] = a
         }
         END    {for (i=1; i<=NR; i++)
                  {printf "%s;", line[i]
                   split (line[i], tmp)
                   for (j=1; j<=maxfld[tmp[2]]; j++)
                      printf ("%s-%s%%", MIN[tmp[2],j], MAX[tmp[2],j])
                   printf "\n"
                  }
                }
        ' file
1%2%3%4%;AA;1-5%2-6%3-7%4-8%0-9%
5%6%7%8%9;AA;1-5%2-6%3-7%4-8%0-9%
1%2%3%4%5%6;BB;1-7%2-9%3-10%4-11%5-12%6-6%
7%9%10%11%12;BB;1-7%2-9%3-10%4-11%5-12%6-6%

There's an inconsistency on value 6 in the BB records: the req. has 6-12 (where does the 12 come from?), and elixir_sinari has 0-6, which might sound logical, but the value is missing entirely unlike value 5 in AA's first line ... so, what to do?

Last edited by RudiC; 02-13-2013 at 03:32 PM.. Reason: Typo
This User Gave Thanks to RudiC For This Post:
# 4  
Old 02-13-2013
@RudiC: you are right, my mistake. I edited my first post. It works perfectly now !
@elixir: it almost worked except it didn't return "%".

Thanks RudiC & elixir for your help !

Last edited by beca123456; 02-13-2013 at 08:22 AM..
# 5  
Old 02-13-2013
Oops..I put in a semi-colon instead of a percentage symbol. Corrected now.
# 6  
Old 02-13-2013
@RudiC:
I don't really get the END section of your code. Specially the "tmp[2]" !??? And what is the separator in "split(line[i],temp)" ? Is it "\n" by default ?
I tried to modify your code in the case I have a third field specifying subgroup in the input file:
Code:
1%2%3%4%;AA;1
5%6%7%8%9;AA;1
1%2%3%4%5%6;BB;2
7%8%9%10%11%12;BB;2
13%14%15%16%17%18;BB;3      # note te third field is different then the previous "BB" groups

and obtain that at the end:
Code:
1%2%3%4%;AA;1;1-5%2-6%3-7%4-8%0-9%     # unchanged
5%6%7%8%9;AA;1;1-5%2-6%3-7%4-8%0-9%     # unchanged
1%2%3%4%5%6;BB;2;1-7%2-8%3-9%4-10%5-11%6-12%     #unchanged
7%8%9%10%11%12;BB;2;1-7%2-8%3-9%4-10%5-11%6-12%     # unchanged
13%14%15%16%17%18;BB;3;13%14%15%16%17%18  # $4 is same as $1 as this record is the only one in "BB;3" group

I changed your previous code as follow:
Code:
BEGIN{FS=OFS=";"}
         {line[NR] = $0
          a = split ($1, b, "%")
          for (i=1; i<=a; i++)
                {b[i]=b[i]?b[i]:"0";
                 if ((!MIN [$2$3,i]) || (MIN [$2$3,i] > b[i])) MIN [$2$3,i] = b[i]
                 if ((!MAX [$2$3,i]) || (MAX [$2$3,i] < b[i])) MAX [$2$3,i] = b[i]
                }
          if (a > maxfld[$2$3]) maxfld[$2$3] = a
         }
         END    {for (i=1; i<=NR; i++)
                  {printf "%s;", line[i]
                   split (line[i], tmp)
                   for (j=1; j<=maxfld[tmp[3]]; j++)
                      printf ("%s-%s%%", MIN[tmp[3],j], MAX[tmp[3],j])
                   printf "\n"
                  }
                }


Last edited by beca123456; 02-13-2013 at 10:50 PM..
# 7  
Old 02-14-2013
Here is another way to do what I think you're trying to do. (If there is only one line for a given key ($2$3), or if there are multiple lines but the min and max values found on all of the lines are the same then only the min will be printed instead of min-max. This code doesn't print a trailing % when multiple lines are found for a given key:
Code:
awk 'BEGIN {FS = OFS = ";"}
FNR == NR {
        # Save the 1st and 2nd fields from each input line...
        l[NR] = $1      # Save Leftmost field for output.
        k[NR] = $2 OFS $3 # Save middle output field and Key for matching.
        # Split the 1st field on percent characters.
        n = split($1, a, /%/)
        if(!(kc[k[NR]]++)) {
                # This is the first time we have seen this key.
                # Set the Count of fields for this key and set the min & Max
                # values for each subfield to the values on this line.
                c[k[NR]] = n
                for(i = 1; i <= n; i++) {
                        # Adding 0 makes the comparisons work even if some of
                        # the values are negative, and allows us to just print
                        # strings at the end without needing to use sprintf %d
                        m[k[NR], i] = M[k[NR], i] = a[i] + 0
                }
        } else {# We have seen this key before, update the min and max values
                # for each currently saved subfield.
                for(i = 1; i <= c[k[NR]]; i++) {
                        # formats to convert empty fields to zeros.
                        if(a[i] + 0 < m[k[NR], i]) m[k[NR], i] = a[i] + 0
                        if(a[i] + 0 > M[k[NR], i]) M[k[NR], i] = a[i] + 0
                }
        }
        if(n > c[k[NR]]) {
                # We have more subfields than we have seen before for this key.
                # Set min and Max values for the additional subfields.
                for(i = c[k[NR]] + 1; i <= n; i++)
                        m[k[NR], i] = M[k[NR], i] = a[i] + 0
                c[k[NR]] = n
        }
}
END {   # Set the min/Max output field for each key we have seen...
        for(i in kc) 
                for(j = 1; j <= c[i]; j++) {
                        mM[i] = (j > 1 ? mM[i] "%" : "") m[i, j]
                        if(m[i, j] < M[i, j]) mM[i] = mM[i] "-" M[i, j]
                }
        # Print each input line with the added min/Max values for each subfield.
        for(i = 1; i <= NR; i++) print l[i], k[i], mM[k[i]]
}' file

With your latest input file, the output produced is:
Code:
1%2%3%4%;AA;1;1-5%2-6%3-7%4-8%0-9
5%6%7%8%9;AA;1;1-5%2-6%3-7%4-8%0-9
1%2%3%4%5%6;BB;2;1-7%2-8%3-9%4-10%5-11%6-12
7%8%9%10%11%12;BB;2;1-7%2-8%3-9%4-10%5-11%6-12
13%14%15%16%17%18;BB;3;13%14%15%16%17%18

This User Gave Thanks to Don Cragun For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Get min and max value in column

Gents, I have a big file file like this. 5100010002 5100010004 5100010006 5100010008 5100010010 5100010012 5102010002 5102010004 5102010006 5102010008 5102010010 5102010012 The file is sorted and I would like to find the min and max value, taking in the consideration key1... (3 Replies)
Discussion started by: jiam912
3 Replies

2. Shell Programming and Scripting

How to get min and max values using awk?

Hi, I need your kind help to get min and max values from file based on value in $5 . File1 SP12.3 stc 2240806 2240808 + ID1_N003 ID2_N003T0 SP12.3 sto 2241682 2241684 + ID1_N003 ID2_N003T0 SP12.3 XE 2239943 2240011 + ID1_N003 ID2_N003T0 SP12.3 XE 2240077 2241254 + ID1_N003 ... (12 Replies)
Discussion started by: redse171
12 Replies

3. Shell Programming and Scripting

Get the min avg and max with awk

aaa: 3 ms aaa: 2 ms aaa: 5 ms aaa: 10 ms .......... to get the 3 2 5 10 ...'s min avg and max something like min: 2 ms avg: 5 ms max: 10 ms (2 Replies)
Discussion started by: yanglei_fage
2 Replies

4. Shell Programming and Scripting

Print min and max value from two column

Dear All, I have data like this, input: 1254 10125 1254 10126 1254 10127 1254 10128 1254 10129 1255 10130 1255 10131 1255 10132 1255 10133 1256 10134 1256 10135 1256 10137... (3 Replies)
Discussion started by: aksin
3 Replies

5. Homework & Coursework Questions

Perl max and min issues

I have to find the min and max on a specific column in a file after sending that column and one other to a output file but I keep getting a maximum of zero below is what i have so far if anyone can give me advice on what i am doing wrong the help would be much appreciated # ! /usr/bin/perl -w... (2 Replies)
Discussion started by: dstewie
2 Replies

6. Shell Programming and Scripting

to find min and max value for each column!

Hello Experts, I have got a txt files which has multiple columns, I want to get the max, min and diff (max-min) for each column in the same txt file. Example: cat file.txt a 1 4 b 2 5 c 3 6 I want ouput like: cat file.txt a 1 4 b 2 5 c 3 6 Max 3 6 Min 1 4 Diff 2 2 awk 'min=="" ||... (4 Replies)
Discussion started by: dixits
4 Replies

7. Shell Programming and Scripting

Data stream between min and max

Hi, I have a text file containing numbers. There are up to 6 numbers per row and I need to read them, check if they are 0 and if they are not zero check if they are within a given interval (min,max). If they exceed the max or min they should be set to max or min respectively, if they are in the... (4 Replies)
Discussion started by: f_o_555
4 Replies

8. Shell Programming and Scripting

get min, max and average value

hi! i have a file like the attachement. I'd like to get for each line the min, max and average values. (there is 255 values for each line) how can i get that ? i try this, is it right? BEGIN {FS = ","; OFS = ";";max=0;min=0;moy=0;total=0;freq=890} $0 !~ /Trace1:/ { ... (1 Reply)
Discussion started by: riderman
1 Replies

9. Shell Programming and Scripting

Help in finding the max and min position

Hi, I have this input file called ttbitnres (which is catenated and sorted):- 8 0.4444 213 10 0.5555 342 11 0.5555 321 12 0.5555 231 13 0.4444 400 My code is at :- #!/bin/bash echo -e Version "\t" Number of Pass "\t" Number of Fail "\t" Rank Position "\t"Min "\t" Max... (1 Reply)
Discussion started by: ahjiefreak
1 Replies

10. Shell Programming and Scripting

min and max value of process id

We are running a AIX 5.2 OS. Would anyone happen to know what the max value for a process id could be? Thanks jerardfjay :) (0 Replies)
Discussion started by: jerardfjay
0 Replies
Login or Register to Ask a Question