Visit Our UNIX and Linux User Community


HELP! Group by in shell script (awk/sed?)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting HELP! Group by in shell script (awk/sed?)
# 1  
Old 07-14-2009
HELP! Group by in shell script (awk/sed?)

Hello,

Could some expert soul please help me with this? I have following file format -

Code:
task             time
abc                5
xyz                4
abc                5
xyz                3
ddd                10
ddd                2

I need to generate output as -

Code:
task           min              max           avg
abc             5                  5              5
xyz              3                  4             3.5
ddd             2                  10             6


I am new to awk/sed and need help. I wrote followiing but that totals up instead of breaking down in min, max and average.

awk -F"|" '{arr[$1]+=$2} END {for (i in arr) {print i, arr[i]}}' datafile


Please help!!

Big Thanks in advance..

Last edited by vgersh99; 07-14-2009 at 02:40 PM.. Reason: code tags, PLEASE!
# 2  
Old 07-14-2009
nawk -f snc.awk myFile

snc.awk:
Code:
FNR>1 {
  min[$1]=(!($1 in min) || min[$1]> $2 )? $2 : min[$1]
  max[$1]=(max[$1]> $2)? max[$1] : $2
  cnt[$1]++
  sum[$1]+=$2
}
END {
  print "task\tmin\tmax\tavg"
  for (i in cnt)
    printf("%s\t%d\t%d\t%.1f\n", i, min[i], max[i], sum[i]/cnt[i])

}


Last edited by vgersh99; 07-14-2009 at 02:42 PM.. Reason: pretty print
# 3  
Old 07-14-2009
Thanks a lot. For my own learning, if I add one more column as following

Code:
id    task      time
1     abc       11
2     abc       14
3     xyz        10


How would the script variable look like?

Last edited by vgersh99; 07-14-2009 at 03:02 PM.. Reason: code tags, PLEASE!
# 4  
Old 07-14-2009
Quote:
Originally Posted by sncoupons
Thanks a lot. For my own learning, if I add one more column as following

id task time
1 abc 11
2 abc 14
3 xyz 10


How would the script variable look like?
Depends on your desired output.
What is the reporting 'key'?

If it's still 'task' (regardless of the id), just change '$1' to '$2'

---------- Post updated at 12:57 PM ---------- Previous update was at 12:56 PM ----------

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

---------- Post updated at 01:01 PM ---------- Previous update was at 12:57 PM ----------

If it's still 'task' (regardless of the id):

Code:
FNR>1 {
  min[$2]=(!($2 in min) || min[$2]> $3 )? $3 : min[$2]
  max[$2]=(max[$2]> $3)? max[$2] : $3
  cnt[$2]++
  sum[$2]+=$3
}
END {
  print "task\tmin\tmax\tavg"
  for (i in cnt)
    printf("%s\t%d\t%d\t%.1f\n", i, min[i], max[i], sum[i]/cnt[i])

}

# 5  
Old 07-14-2009
I need to generate the output same as earlier described -

Code:
Task        Min           Max           Avg

However, the input file is going to have task id as well and separated by a "|" that looks like -

Code:
1111     |     abc      |   10
1111     |     xyz      |   7
1112     |     abc      |   5
1112     |     xyz      |   9


Thanks you for your help! Smilie

---------- Post updated at 12:07 PM ---------- Previous update was at 12:03 PM ----------

Did not see updated look at the end of your post..works now. Thanks! Smilie ..awarded you bits!!
# 6  
Old 07-14-2009
Quote:
Originally Posted by sncoupons
I need to generate the output same as earlier described -

Code:
Task        Min           Max           Avg

However, the input file is going to have task id as well and separated by a "|" that looks like -

Code:
1111     |     abc      |   10
1111     |     xyz      |   7
1112     |     abc      |   5
1112     |     xyz      |   9

Thanks you for your help! Smilie

---------- Post updated at 12:07 PM ---------- Previous update was at 12:03 PM ----------

Did not see updated look at the end of your post..works now. Thanks! Smilie ..awarded you bits!!
Code:
BEGIN {
  FS=" *\\| *"
}
FNR>1 {
  min[$2]=(!($2 in min) || min[$2]> $3 )? $3 : min[$2]
  max[$2]=(max[$2]> $3)? max[$2] : $3
  cnt[$2]++
  sum[$2]+=$3
}
END {
  print "task\tmin\tmax\tavg"
  for (i in cnt)
    printf("%s\t%d\t%d\t%.1f\n", i, min[i], max[i], sum[i]/cnt[i])

}


Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Emulate group-by in shell script

Hello All, I saw this problem on one of the forum and solved it using group-by in oracle sql, though I am a bit curious to implement it using shell script : There is a file having number of operations : Opeation,Time-Taken operation1,83621 operation2,72321 operation3,13288... (11 Replies)
Discussion started by: mukulverma2408
11 Replies

2. Shell Programming and Scripting

Shell Script to Group by Based on Multiple Fields in a file

Hi, I want to know if there is any simple approach to SUM a field based on group by of different fields for e.g. file1.txt contains below data 20160622|XXX1||50.00||50.00|MONEY|Plan1| 20160622|XXX1||100.00||100.00|MONEY|Plan1| 20160623|XXX1||25.00||25.00|MONEY|Plan1|... (3 Replies)
Discussion started by: cnu_theprince
3 Replies

3. Shell Programming and Scripting

Shell script - group by

Hi, I have text file as shown below. root 25 oracle 25 batch 30 griduser 32 admin 35 root 25 oracle 25 batch 30 griduser 32 oracle 25 batch 30 griduser 32 xuser 45 admin 35 I want to group by based on user name, and the output need to be as below. Not necessary the username to be... (10 Replies)
Discussion started by: baladelaware73
10 Replies

4. Shell Programming and Scripting

Unable to pass shell script variable to awk command in same shell script

I have a shell script (.sh) and I want to pass a parameter value to the awk command but I am getting exception, please assist. diff=$1$2.diff id=$2 new=new_$diff echo "My id is $1" echo "I want to sync for user account $id" ##awk command I am using is as below cat $diff | awk... (2 Replies)
Discussion started by: Ashunayak
2 Replies

5. Shell Programming and Scripting

Shell Script to ignore # and take corresponding user and group

Hi, I have a following file: role.IMPACT_USER.user=admin role.IMPACT_USER.user=dd12345 role.IMPACT_USER.user=ss76767 #role.IMPACT_USER.user=root #role.IMPACT_USER.group=System role.IMPACT_USER.group=ImpactUser #Description: Allow users to login in to Impact, start and stop service... (5 Replies)
Discussion started by: dbashyam
5 Replies

6. Programming

Shell script using sed or awk

Hi, I want to read a file from the command line and remove all the spaces, tabs in it, replacing it with comma(,), and write it to a new file. can you help me out with this scenario. (1 Reply)
Discussion started by: sudhanshu12788
1 Replies

7. Shell Programming and Scripting

Split line to multiple files Awk/Sed/Shell Script help

Hi, I need help to split lines from a file into multiple files. my input look like this: 13 23 45 45 6 7 33 44 55 66 7 13 34 5 6 7 87 45 7 8 8 9 13 44 55 66 77 8 44 66 88 99 6 I want to split every 3 lines from this file to be written to individual files. (3 Replies)
Discussion started by: saint2006
3 Replies

8. Shell Programming and Scripting

Shell script to rename a group of files

Hello, I am having 1800 files in a directory with a specified format, like amms_850o_prod.000003uNy amms_850o_prod.000003u8x amms_850o_prod.000003taP amms_850o_prod.000003tKy amms_850o_prod.000003si4 amms_850o_prod.000003sTP amms_850o_prod.000003sBg amms_850o_prod.000003rvx... (12 Replies)
Discussion started by: atlantis
12 Replies

9. Shell Programming and Scripting

shell script(Preferably awk or sed) to print selected number of columns from each row

Hi Experts, The question may look very silly by seeing the title, but please have a look at it clearly. I have a text file where the first 5 columns in each row were supposed to be attributes of a sample(like sample name, number, status etc) and the next 25 columns are parameters on which... (3 Replies)
Discussion started by: ks_reddy
3 Replies

10. Shell Programming and Scripting

awk/sed/ksh script to cleanup /etc/group file

Many of my servers' /etc/group file have many userid's that does not exist in /etc/passwd file and they need to be deleted. This happened due to manual manipulation of /etc/passwd files. I need to do this for 40 servers. Can anyone help me in achieving this? Even reducing a step or two will be... (6 Replies)
Discussion started by: pdtak
6 Replies

Featured Tech Videos