Finding maximum occurrence value using awk


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Finding maximum occurrence value using awk
# 1  
Old 04-01-2013
Finding maximum occurrence value using awk

Hi everyone, I'm a new member at the forum I mistakenly posted this elsewhere too.
I have a file like this: field 2 values are either 0 or negative. file test4:
Code:
100815    -20
118125    0
143616    0
154488    0
154488    0
154488    -6
196492    -5
196492    -9
196492    -7
27332    0
29397    0

I would like to print a line containing the maximum value in field 2 of all occurrences for each value in field 1. So the desired output should be:
Code:
29397    0
27332    0
143616    0
154488    0
118125    0
100815    -20
196492    -5

I am using awk to do this. But, there are two problems: first awk prints nothing when I try to get the maximum for the values with negative numbers but it has no problem when I try the same with positive numbers. This made me take the absolute values which I can later turn back to the original values. The second problem, I get the following output with my code:

Code:
29397    0
27332    0
143616    0
154488    6
118125    0
100815    20
196492    7

Although with absolute values I should get:

Code:
29397    0
27332    0
143616    0
154488    0
118125    0
100815    20
196492    5

My code is:
Code:
awk '{$2>0?$2=$2:$2=-$2} $2==0 {$2=0} {print}' test4 | 
   awk 'NR==1 {a[$1]=$2} {a[$1]=$2 ; if ($2<a[$1]) a[$1]=$2; else a[$1]=a[$1];} 
  END {for(i in a) print i"\t"a[i];}'

I am sure I'm missing something basic and this could probably be done in a much simpler way. Any help is appreciated
Best reagrds to all

Last edited by jim mcnamara; 04-01-2013 at 11:09 AM.. Reason: to get tab delimited separation
# 2  
Old 04-01-2013
Your original post

This is your original post, showing each dataset on its own line.

Hi everyone, I'm a new member at the forum
I have a file like this: field 2 values are either 0 or negative. file test4:


Code:
100815    -20
118125    0
143616    0
154488    0
154488    0
154488    -6
196492    -5
196492    -9
196492    -7
27332    0
29397    0

I would like to print a line containing the maximum value in field 2 of all occurrences for each value in field 1. So the desired output should be:


Code:
100815    -20
118125    0
143616    0
154488    0
196492    -5
27332    0
29397    0

I am using awk to do this. But, there are two problems: first awk prints nothing when I try to get the maximum for the values with negative numbers but it has no problem when I try the same with positive numbers. This made me take the absolute values which I can later turn back to the original values. The second problem, I get the following output with my code:



Code:
100815    20
118125    0
143616    0
154488    6
196492    7
27332    0
29397    0


Although with absolute values I should get:


Code:
100815    20
118125    0
143616    0
154488    0
196492    5
27332    0
29397    0


My code is:
Code:
awk '{$2>0?$2=$2:$2=-$2} $2==0 {$2=0} {print}' test4 | awk 'NR==1 {a[$1]=$2} {a[$1]=$2 ; if ($2<a[$1]) a[$1]=$2; else a[$1]=a[$1];} END {for(i in a) print i"\t"a[i];}'

I am sure I'm missing something basic and this could probably be done in a much simpler way. Any help is appreciated
Best reagrds to all
# 3  
Old 04-01-2013
Code:
awk '
{
        if ( $1 in A )
        {
                if ( $2 > A[$1] )
                        A[$1] = $2
        }
        else
                A[$1] = $2
} END {
        for ( i in A )
                print i, A[i]
} ' file

# 4  
Old 04-01-2013
I understand that it gives the required output.!

But, Why is the order changed while printing output?
# 5  
Old 04-01-2013
Thank you so much.
# 6  
Old 04-01-2013
That is because by default, the order in which a for (i in array) loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside awk.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding the maximum timestamp in a folder

I've the files in a directory in the following format having date +%Y%m%d%H YR_MNTH_2013061205 YR_MNTH_2013060107 and i need the latest file i.e; YR_MNTH_2013061205 to be moved to another folder #!/bin/ksh # Ksh 88 Version for test_time in YR* do --- done How can i achieve that !... (2 Replies)
Discussion started by: smile689
2 Replies

2. Answers to Frequently Asked Questions

Finding maximum occurrence value using awk

Hi everyone, I'm a new member at the forum I have a file like this: field 2 values are either 0 or negative. file test4: 100815 -20 118125 0 143616 0 154488 0 154488 0 154488 -6 196492 -5 196492 -9 196492 -7 27332 0 29397 0 I would like to print a... (1 Reply)
Discussion started by: meet77
1 Replies

3. Shell Programming and Scripting

Finding minimum maximum and average

I am trying to find the minimum maximum and average from one file which has values Received message from https://www.demandmatrix.net/app/dm/xml] in milliseconds. Received message from https://www.demandmatrix.net/app/dm/xml] in milliseconds. Received message from... (5 Replies)
Discussion started by: aroragaurav.84
5 Replies

4. Shell Programming and Scripting

Perl- Finding average "frequency" of occurrence of duplicate lines

Hello, I am working with a perl script that tries to find the average "frequency" in which lines are duplicated. So far I've only managed to find the way to count how many times the lines are repeated, the code is as follows: perl -ae' my $filename= $ENV{'i'}; open (FILE, "$filename") or... (10 Replies)
Discussion started by: acsg
10 Replies

5. Shell Programming and Scripting

Finding Maximum value in a column

Hello, I am trying to get a script to work which will find the maximum value of the fourth column and assign that value to all rows where the first three columns match. For example: 1111 2222 AAAA 0.3 3333 4444 BBBB 0.7 1111 2222 AAAA 0.9 1111 2222 AAAA 0.5 3333 4444 BBBB 0.4 should... (8 Replies)
Discussion started by: jaysean
8 Replies

6. Shell Programming and Scripting

Awk regular expression - I need exactly 1 occurrence of it

Hi all, I am processing a file with awk that looks like this: " 0.0021 etc 0.0123 etc 0.1234 etc ... 0.5324 etc 0.5434 etc 0.6543 etc ... 1.0344 etc 1.1344 etc ... 1.5345 etc 1.5632 etc " I need to print out only the lines that have '0' or '5' after the comma, plus I need only... (11 Replies)
Discussion started by: ioannisp
11 Replies

7. Shell Programming and Scripting

stop unix find on a directory structure after finding 1st occurrence

Hi, Has anyone tried to restrict Solaris 10 unix find on a large directory structure based on time to stop running after finding the first occurrence of a matching query. Basically I'm trying to build up a usage map of user workspaces based on file modification (week/month/3 months/year etc) and... (3 Replies)
Discussion started by: jm0221
3 Replies

8. UNIX for Dummies Questions & Answers

Unix shell script for finding top ten files of maximum size

I need to write a Unix shell script which will list top 10 files in a directory tree on basis of size. i.e. first file should be the biggest in the whole directory and all its sub directories. Please suggest any ideas (10 Replies)
Discussion started by: abhilashnair
10 Replies

9. UNIX for Dummies Questions & Answers

Finding nth occurrence in line and replacing it

Hi, I have several files with data that have to be imported to a database. These files contain records with separator characters. Some records are corrupt (2 separators are missing) and I need to correct them prior to importing them into the db. Example: ... (5 Replies)
Discussion started by: stresing
5 Replies

10. UNIX for Dummies Questions & Answers

awk + last occurrence

Hi, I'm attempting to search, using awk, a pattern range in a file. Something like: >awk '/first bit of text.../,/...last bit of text/' file Is it possible to print only the last (or first) occurrence of the pattern range this way? Thanks for any suggestions. Al (2 Replies)
Discussion started by: agibbs
2 Replies
Login or Register to Ask a Question