Sponsored Content
Top Forums Shell Programming and Scripting AWK: how to get average based on certain column Post 302507508 by shell123 on Thursday 24th of March 2011 04:55:49 AM
Old 03-24-2011
AWK: how to get average based on certain column

Hi,

I'm new to shell programming, can anyone help me on this? I want to do following operations -

1. Average salary for each country
2. Total salary for each city

and data that looks like -

salary country city
10000 zzz BN
25000 zzz BN
30000 zzz BN
10000 yyy ZN
15000 yyy ZN

My below code only gives average and sum of salaries in total for all records.

average) echo "Average salary for each country"
awk '{ s += $1; } END { print "sum: ", s, " average: ", s/NR, "no of records: ", NR }' $FILENAME
;;
totalsal) echo "Total salary for each city"
awk '{ s += $1; } END { print "sum of salaries : ", s }' $FILENAME
;;

output -

1. sum: 90000 average: 18000 no of records: 5
2. sum of salaries : 90000


i want this to be done based on column value i.e. -

for country as zzz -
sum: 65000 average: 21666.66 no of records: 3

for country as yyy -
sum: 25000 average: 12500 no of records: 2

Similarly for total salaries based on each City.

Is there any method in awk to make the search based on the value of certain column?
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Use awk to calculate average of column 3

Suppose I have 500 files in a directory and I need to Use awk to calculate average of column 3 for each of the file, how would I do that? (6 Replies)
Discussion started by: grossgermany
6 Replies

2. UNIX for Dummies Questions & Answers

Average in awk based on time

Hi I am looking for an awk script which can compute the average of the last column based on the date and time. The file looks: site1,"2000-01-01 00:00:00", "2000-01-01 00:59:00",0.013 site2,"2000-02-01 01:00:00", "2000-02-01 01:59:00",0.035 site1,"2000-02-01 02:00:00", "2000-02-01... (15 Replies)
Discussion started by: kathy wang
15 Replies

3. Shell Programming and Scripting

Partial average of a column with awk

Hello, Let's assume I have 100 files FILE_${m} (0<m<101). Each of them contains 100 lines and 10 columns. I'd like to get in a file called "result" the average value of column 3, ONLY between lines 11 and 17, in order to plot that average as a function of the parameter m. So far I can compute... (6 Replies)
Discussion started by: DMini
6 Replies

4. Shell Programming and Scripting

Average values in a column based on range

Hi i have data with two columns like below. I want to find average of column values like if the value in column 2 is between 0-250000 the average of column 1 is some xx and average of column2 is ww then if value is 250001-5000000 average of column 1 is yy and average of column 2 is zz. And my... (5 Replies)
Discussion started by: bhargavpbk88
5 Replies

5. Shell Programming and Scripting

awk based script to find the average of all the columns in a data file

Hi All, I need the modification for the below mentioned code (found in one more post https://www.unix.com/shell-programming-scripting/27161-script-generate-average-values.html) to find the average values for all the columns(but for a specific rows) and print the averages side by side. I have... (4 Replies)
Discussion started by: ks_reddy
4 Replies

6. UNIX for Dummies Questions & Answers

Find the average based on similar names in the first column

I have a table, say this: name1 num1 num2 num3 num4 name2 num5 num6 num7 num8 name3 num1 num3 num4 num9 name2 num8 num9 num1 num2 name2 num4 num5 num6 num4 name4 num4 num5 num7 num8 name5 num1 num3 num9 num7 name5 num6 num8 num3 num4 I want a code that will sort my data according... (4 Replies)
Discussion started by: FelipeAd
4 Replies

7. Shell Programming and Scripting

Calculate the average of a column based on the value of another column

Hi, I would like to calculate the average of column 'y' based on the value of column 'pos'. For example, here is file1 id pos y c 11 1 220 aa 11 4333 207 f 11 5333 112 ee 11 11116 305 e 11 11117 310 r 11 22228 781 gg 11 ... (2 Replies)
Discussion started by: jackken007
2 Replies

8. UNIX for Dummies Questions & Answers

Average by specific column value, awk

Hi, I am searching for an awk-script that computes the mean values for the $2 column, but addicted to the values in the $1 column. It also should delete the unnecessary lines after computing... An example (for some reason I cant use the code tag button): cat list.txt 1 10 1 30 1 20... (2 Replies)
Discussion started by: bjoern456
2 Replies

9. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

10. Shell Programming and Scripting

Check first column - average second column based on a condition

Hi, My input file Gene1 1 Gene1 2 Gene1 3 Gene1 0 Gene2 0 Gene2 0 Gene2 4 Gene2 8 Gene3 9 Gene3 9 Gene4 0 Condition: If the first column matches, then look in the second column. If there is a value of zero in the second column, then don't consider that record while averaging. ... (5 Replies)
Discussion started by: jacobs.smith
5 Replies
acctcom(8)						      System Manager's Manual							acctcom(8)

NAME
acctcom - Displays selected process accounting record summaries SYNOPSIS
/usr/bin/acctcom [-abfhikmqrtv] [-C seconds] [-e time] [-E time] [-g group] [-H factor] [-I number] [-l line] [select_flag -o file] [-n pattern] [-O seconds] [-s time] [-S time] [-u username] [file ...] FLAGS
Displays average statistics about the selected processes. Statistics are displayed at the end of the output records in the format var=# (for example, CMDS=439), where the value (#) is given to the nearest hundredth. The var specifies the following: Value Total number of commands listed in the specified file Average real time per process Average CPU time per process Average user CPU time per process Average system CPU time per process Average number of characters transferred Average number of blocks transferred Average CPU factor (average user time divided by total CPU time) Average hog factor (average CPU time divided by average elapsed time) Displays information about the most recently executed commands first. This flag has no effect when the acctcom command reads from the default input device or if more than one process accounting file is specified. The column heading format is the same as the default column heading format. Lists processes whose total CPU time (system time + user time) is greater than the value specified by seconds. The column heading format is the same as the default column heading format. Displays information only about processes that start at or before the specified time, which is specified as hh[:mm[:ss]]. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy START BEFORE: day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) Displays information only about processes that end at or before the specified time, which is specified as hh[:mm[:ss]]. If you specify the same time for both the -E and -S flags, the acctcom command displays processes that existed at the specified time. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy END BEFORE : day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) Displays information about the fork/exec flag (used to execute another process) in the F column and the system exit value STAT, which can be zero (0) or an error code, in the STAT column in addition to the default column heading format. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) F STAT Displays only the processes that belong to the specified group. You may specify either the group ID or the group name. The column heading format is the same as the default column heading format. Displays the hog factor instead of the mean memory size. The hog factor is the CPU time used by the process divided by the real time. The output is the same as the default column format output except the MEAN SIZE(K) column heading is replaced by the .HOG FACTOR heading. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU HOG NAME USER TTYNAME TIME TIME (SECS) (SECS) FACTOR Displays information about processes that exceed the value specified by the hogfactor variable. The output format is the same as the default column heading format. Displays the number of characters and blocks transferred during read or write I/O operations. The output is similar to the default column heading format, except the CHARS TRANSFD column replaces the MEAN SIZE(K) column, and the BLOCKS READ column is added to the output. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU CHARS BLOCKS NAME USER TTYNAME TIME TIME (SECS) (SECS) TRANSFD READ Displays information about the processes that transfer more than the number of characters specified by the number variable. The output format is the same as the default column heading format. Displays the total number of K-core min- utes, which is the number of kilobytes of memory used by the process multiplied by the buffer time used. The output format is the same as the default column heading format, except the KCORE MIN column heading replaces the MEAN SIZE(K) column heading. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU KCORE NAME USER TTYNAME TIME TIME (SECS) (SECS) MIN Displays information about the processes that belong to the workstation whose tty line is specified by the line variable (for example, ttyp0). The heading format is the same as the default column heading format. Displays the median amount of process memory used. If you also specify the -h or -k flag, the -m flag is ignored. The output format is the same as the default column heading format. Displays information only about the processes whose names include the regular expression specified by the pattern variable. The output format is the same as the default column heading format. Copies selected process records to the specified filename. The select_flag variable specifies the following process selection flags: -C, -e, -E, -g, -H, -I, -l, -n, -O, -s, -S, and -u. If you do not specify a selection flag with the select_flag variable, all process records are copied to filename. The output format includes only the date and time: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy Displays information about the processes that have a CPU system time exceeding the time specified by the seconds variable. The output format is the same as the default column heading format. Displays only the average statistics, which are shown at the end of the command output when you use the -a flag. Displays the CPU factor, which is the user time divided by the total CPU time. The output format is the same as the default column heading format, except the CPU FACTOR column replaces the MEAN SIZE(K) column. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU CPU NAME USER TTYNAME TIME TIME (SECS) (SECS) FACTOR Displays information about the processes that existed on or after time, which is specified as hh[:mm[:ss]]. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy END AFTER : day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) Displays information only about the processes that started at or after time, which is specified as hh[:mm[:ss]]. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy START AFTER : day mon date hh:mm:ss yy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) Displays system and user CPU times under separate column headings. The CPU SYS column heading, which shows the system CPU time, replaces the CPU (SECS) default column heading. The column heading format is as follows: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yy COMMAND START END REAL CPU (SECS) NAME USER TTYNAME TIME TIME (SECS) SYS USER Displays information about the processes that are owned by user username. You can specify a user identification number, a login name converted to a user identification number, a number sign (#) to specify root or a question mark (?), which selects pro- cesses associated with unknown user identification numbers. The output format is the same as the default column heading format. Removes column headings from the output; otherwise the output is the same as the default column heading format. DESCRIPTION
The acctcom command displays process accounting records from files specified by the file parameter, from standard input, or from the /var/adm/pacct file. If you do not specify a file and if standard input is assigned to a workstation or to /dev/null (for example, if a process runs in the background), the acctcom command reads the /var/adm/pacct file. You do not have to be root to use the acctcom command, which is located in the /usr/bin directory. If you specify more than one filename, the acctcom process reads each file chronologically in time-descending order according to process completion time. Usually, the /var/adm/pacct file is used, but you can have several /var/adm/pacct/*Vn files, which are created by the ckpacct command. Each record specifies the execution times for a completed process. The default output format includes the command name, user name, tty name, process start time, process end time, real seconds, CPU seconds, and mean memory size (in kilobytes). The process summary output has the following default column heading format: ACCOUNTING RECORDS FROM: day mon date hh:mm:ss yyyy COMMAND START END REAL CPU MEAN NAME USER TTYNAME TIME TIME (SECS) (SECS) SIZE(K) If a specified time is later than the current time, it is interpreted as occurring on the previous day. You can use flags to display the state of the fork/exec flag, F column; the system exit value, STAT column; the ratio of total CPU time to elapsed time, HOG FACTOR column; the product of memory used and elapsed time, KCORE MIN column; the ratio of user time to total (system plus user) time, CPU FACTOR column; the number of characters transferred during I/O operations, CHARS TRNSFD column; and the total number of blocks read or written, BLOCKS READ column. If a process is run under root or su authority, the command name is prefixed with a number sign (#). If a process is not assigned to a known tty (for example, if the cron daemon runs the process), a question mark (?) is displayed in the TTYNAME column. The acctcom command reports only on processes that have completed. Use the ps command to examine the status of active processes. For any flag value that produces a timestamp in an output heading, the order of date and time information is locale dependent. The time- stamps shown in the examples use the default format for date and time values. EXAMPLES
The following command displays information about processes that exceed 2.0 seconds of CPU time: /usr/sbin/acct/acctcom -O 2 < /var/adm/pacct The following command displays information about processes belonging to the Finance group: /usr/sbin/acct/acctcom -g Finance < /var/adm/pacct The following command displays information about processes belonging to tty /dev/console that run after 5:00 p.m.: /usr/sbin/acct/acctcom -l /dev/console -s 17:00 FILES
Specifies the command path. The active process accounting database file. User and group database files. Accounting header files that define formats for writing accounting files. RELATED INFORMATION
Commands: ed(1), ps(1), su(1), acct(8), cron(8), runacct(8) Functions: acct(2) delim off acctcom(8)
All times are GMT -4. The time now is 07:55 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy