Sponsored Content
Top Forums Shell Programming and Scripting How to print median values of matrix -awk? Post 302999550 by drl on Thursday 22nd of June 2017 11:32:07 AM
Old 06-22-2017
Hi.

Utility datamash makes the median and other statistical calculations fairly easy. Aside from the scaffolding code, the operative line is datamash:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate statistical calculations, median, datamash.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C datamash

FILE=${1-data1}
E=expected-output.txt

pl " Input data file $FILE:"
cat $FILE

pl " Expected output:"
cat $E

pl " Results (adjusted for visual with code align):"
datamash -H -g1 median 2 median 3 median 4 median 5 < $FILE |
align | 
tee f1

pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2

pl " Some details for datamash:"
dixf datamash

exit $?

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.7 (jessie) 
bash GNU bash 4.3.30
datamash (GNU datamash) 1.0.6

-----
 Input data file data1:
name    s1      s2      s3      s4
g1      2       8       6       5
g1      5       7       9       9
g1      6       7       8       9
g2      8       8       8       8
g2      7       7       7       7
g2      10      10      10      10
g3      3       12      1       24
g3      5       5       24      48
g3      12      3       12      12
g3      2       3       3       3

-----
 Expected output:
name    s1      s2      s3      s4
g1      5       7       8       9
g2      7       7       7       7
g3      4       4       7.5     18

-----
 Results:
GroupBy(name)   median(s1)      median(s2)      median(s3)      median(s4)
g1              5               7               8               9
g2              8               8               8               8
g3              4               4               7.5             18

-----
 Verify results if possible:

-----
 Comparison of 4 created lines with 4 lines of desired results:
f1 expected-output.txt differ: char 1, line 1
 Failed -- files f1 and expected-output.txt not identical -- detailed comparison follows.
1c1
< name  s1      s2      s3      s4
---
> GroupBy(name) median(s1)      median(s2)      median(s3)      median(s4)
3c3
< g2    7       7       7       7
---
> g2            8               8               8               8

 Results cannot be verified.

-----
 Some details for datamash:
datamash        command-line calculations (man)
Path    : /usr/bin/datamash
Version : 1.0.6
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with -h,--help
Repo    : Debian 8.7 (jessie) 
Home    : https://savannah.gnu.org/projects/datamash/ (pm)

There is a disagreement about the headers and group g2. I would tend to trust datamash, but you can do the calculations again to verify your answer. I tried it again sorting the file, as well as interchanging lines for g2 and got the same result.

Best wishes ... cheers, drl
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk to print distinct col values

Hi Guys... I am newbie to awk and would like a solution to probably one of the simple practical questions. I have a test file that goes as: 1,2,3,4,5,6 7,2,3,8,7,6 9,3,5,6,7,3 8,3,1,1,1,1 4,4,2,2,2,2 I would like to know how AWK can get me the distinct values say for eg: on col2... (22 Replies)
Discussion started by: anduzzi
22 Replies

2. Shell Programming and Scripting

awk to median

hi! i have a file like the attachement. you can see on the last column, there is a marker from 1 to 64 for each time. I'd like to have the median for each marker: i want to get a median every 128 values the result is : for an hour and marker x, i have the median value thank you for... (5 Replies)
Discussion started by: riderman
5 Replies

3. Shell Programming and Scripting

Print a key with its all values using awk/others

input COL1 a1 b1 c1 d1 e1 f1 C1 10 10 10 100 100 1000 C2 20 20 200 200 200 2000 output C1 a1 10 1 C1 b1 10 1 C1 c1 10 1 C1 d1 100 2 C1 e1 100 2 C1 f1 1000 3 C2 ... (12 Replies)
Discussion started by: ruby_sgp
12 Replies

4. Shell Programming and Scripting

Help fixing awk code to print values from 2 files

Hi everyone, Please help on this: I have file1: <file title="Title 1 and 2"> <report> <title>Title 1</title> <number>No. 1234</number> <address>Address 1</address> <date>October 07, 2009</date> <description>Some text</description> </report> ... (6 Replies)
Discussion started by: Ophiuchus
6 Replies

5. UNIX for Advanced & Expert Users

Awk to print values of second file

Hello, I have a data file with 300,000 records in it, and another file which contains only the line numbers of roughly 13,000 records in the data file which have data integrity issues. I'm trying to find a way to print the original data by line number identified in the second file. How can I do... (2 Replies)
Discussion started by: peteroc
2 Replies

6. Shell Programming and Scripting

Print minimum and maximum values using awk

Can I print the minimum and maximum values of values in first 4 columns ? input 3038669 3038743 3037800 3038400 m101c 3218627 3218709 3217600 3219800 m290 ............. output 3037800 3038743 m101c 3217600 3219800 m290 (2 Replies)
Discussion started by: quincyjones
2 Replies

7. Shell Programming and Scripting

How to print in awk matching $1 values ,to $1,$4 example given.?

Hi Experts, I am trying to get the output from a matching pattern but unable to construct the awk command: file : aa bb cc 11 dd aa cc 33 cc 22 45 68 aa 33 44 44 dd aa cc 37 aa 33 44 67 I want the output to be : ( if $1 match to "aa" start of the line,then print $4 of that line, and... (3 Replies)
Discussion started by: rveri
3 Replies

8. Shell Programming and Scripting

awk print values between consecutive lines

I have a file in below format: file01.txt TERM TERM TERM ABC 12315 68.53 12042013 165144 ABC 12315 62.12 12042013 165145 ABC 12315 122.36 12052013 165146 ABC 12315 582.18 12052013 165147 ABC 12316 2.36 12052013 165141 ABC 12316 ... (8 Replies)
Discussion started by: alex2005
8 Replies

9. Shell Programming and Scripting

awk print odd values

value=$(some command) for all in `echo $value` do awk checks each value (all) to see if it is a odd number. if so, prints the value done sounds easy enough but i've been unable to find anything on google. (2 Replies)
Discussion started by: SkySmart
2 Replies

10. Shell Programming and Scripting

Print values within groups of lines with awk

Hello to all, I'm trying to print the value corresponding to the words A, B, C, D, E. These words could appear sometimes and sometimes not inside each group of lines. Each group of lines begins with "ZYX". My issue with current code is that should print values for 3 groups and only is... (6 Replies)
Discussion started by: Ophiuchus
6 Replies
All times are GMT -4. The time now is 03:48 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy