Sponsored Content
Top Forums Shell Programming and Scripting Calculate average of top n% of values - UNIX Post 302902557 by @man on Wednesday 21st of May 2014 06:18:49 AM
Old 05-21-2014
Quote:
What's your longest number?

Here is a update with assumption of 7 or less digits (adjust red value as you need):


Code:
gawk -v p=50 '{a[sprintf("%07d",$3)]}
END{asorti(a,as);l=length(a);for(i=l;(i/l)*100>p;i--)
t+=as[i]; printf "%.2f\n", t/l/2}' infile

For multi files:


Code:
for file in *.txt
do
gawk -v p=50 '
{a[sprintf("%07d",$3)]}
END{
asorti(a,as)
l=length(a)
for(i=l;(i/l)*100>p;i--) t+=as[i]
printf "%s\t%.2f\n", FILENAME, t/l/2}' "$file"
done
I don't think I have more than 7 digits. I tried it with two following sample files:
file1.txt
Code:
chr2L	10	23
chr2L	20	20
chr2L	35	15
chr2L	36	10
chr3R	12	10
chrX	10	15

file2.txt
Code:
chr2L	10	230
chr2L	20	20
chr2L	35	1.5
chr2L	36	1000
chr3R	12	100
chr3R	20	300
chrX	10	15
chrX	26	1500

and this is what I got as output when p=50:
Code:
file1.txt 21.50
file2.txt 1515.00

I think something is wrong but I don't know what! For sure the average of top 50% (1515 for second file) cannot be more than the maximum(1500)!!

I also tried it with different percentages. (p=12.5,p=10,...) but it does not seem to work properly. my desired percentage is p=0.1 if that is important to know.

Thanks once again for helping me.

---------- Post updated at 12:18 PM ---------- Previous update was at 12:10 PM ----------

Quote:
It would probably be more beneficial if it is in the tools that you are most comfortable with rather than a bespoke one-off that you dare not adjust.
Dear rbatte1,

You are absolutely right with no doubt!
I am a student in "bioinformatics" and my knowledge in programming is sub zero which is a shame. you asked what I did. What I could think of was to extract the third column first and using 'pipe' sort it numerically and then again use pipe to have only unique values and then sort them again and count the numbers of values take 0.1% of them and another pipe and then the average function!!
you see how stupid one can be!!
But even for each step of this dumb way I have to Google and such simple script takes me a half a day or more to complete! and this is part of a huge analysis of-course...

Now you see why I decided to ask for help! :|

Kindest regards,
aman
 

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

calculate average of column 2

Hi I have fakebook.csv as following: F1(current date) F2(popularity) F3(name of book) F4(release date of book) 2006-06-21,6860,"Harry Potter",2006-12-31 2006-06-22,,"Harry Potter",2006-12-31 2006-06-23,7120,"Harry Potter",2006-12-31 2006-06-24,,"Harry Potter",2006-12-31... (0 Replies)
Discussion started by: onthetopo
0 Replies

2. Programming

calculate average

I have a file which is 2 3 4 5 6 6 so i am writing program in C to calculate mean.. #include<stdio.h> #include<string.h> #include <math.h> double CALL mean(int n , double x) main (int argc, char **argv) { char Buf,SEQ; int i; double result = 0; FILE *fp; (3 Replies)
Discussion started by: cdfd123
3 Replies

3. Shell Programming and Scripting

Calculate average time using a script

Hello, I'm hoping to get some help on calculating an average time from a list of times (hour:minute:second). Here's what my list looks like right now, it will grow (I can get the full date or change the formatting of this as well): 07:55:31 09:42:00 08:09:02 09:15:23 09:27:45 09:49:26... (4 Replies)
Discussion started by: jaredhanks
4 Replies

4. Shell Programming and Scripting

AWK novice - calculate the average

Hi, I have the following data in a file for example: P1 XXXXXXX.1 YYYYYYY.1 ZZZ.1 P1 XXXXXXX.2 YYYYYYY.2 ZZZ.2 P1 XXXXXXX.3 YYYYYYY.3 ZZZ.3 P1 XXXXXXX.4 YYYYYYY.4 ZZZ.4 P1 XXXXXXX.5 YYYYYYY.5 ZZZ.5 P1 XXXXXXX.6 YYYYYYY.6 ZZZ.6 P1 XXXXXXX.7 YYYYYYY.7 ZZZ.7 P1 XXXXXXX.8 YYYYYYY.8 ZZZ.8 P2... (6 Replies)
Discussion started by: alex2005
6 Replies

5. Shell Programming and Scripting

Calculate Average AWK

I want to calculate the average line by line of some files with several lines on them, the files are identical, just want to average the 3rd columns of those files.:wall: Example file: File 1 001 0.046 0.667267 001 0.047 0.672028 001 0.048 0.656025 001 0.049 ... (2 Replies)
Discussion started by: AriasFco
2 Replies

6. Shell Programming and Scripting

Calculate average, azimut and distance

Gents, Please i will to get the distance and azimut from 2 coordinates: Usig excel formula i get the correct values, but i will like to do it using awk. Example A 35089.0 50345.016 9 75 1 2101774 77 70 79 483911.6 2380106.9 137.4 1 1 6 1 A 35089.0 50345.01620 75... (8 Replies)
Discussion started by: jiam912
8 Replies

7. Shell Programming and Scripting

Calculate the average per block.

My old school way is a one liner. And will search for average from SAR, to get the data receive rate. But, I dont think it is practical or accurate,. Because it calculates to off peak hours. I am planning to change it. My cron runs every 30 mins. When my cron runs, and my time is 14:47pm,, it will... (1 Reply)
Discussion started by: invinzin21
1 Replies

8. UNIX for Beginners Questions & Answers

Calculate average from a given set of keys and values

Hello, I am writing a script which expects as its input a hash with student names as the keys and marks as the values. The script then returns array of average marks for student scored 60-70, 70-80, and over 90. Output expected 50-70 1 70-90 3 over 90 0 The test script so far... (4 Replies)
Discussion started by: nans
4 Replies
IGAWK(1)							 Utility Commands							  IGAWK(1)

NAME
igawk - gawk with include files SYNOPSIS
igawk [ all gawk options ] -f program-file [ -- ] file ... igawk [ all gawk options ] [ -- ] program-text file ... DESCRIPTION
Igawk is a simple shell script that adds the ability to have ``include files'' to gawk(1). AWK programs for igawk are the same as for gawk, except that, in addition, you may have lines like @include getopt.awk in your program to include the file getopt.awk from either the current directory or one of the other directories in the search path. OPTIONS
See gawk(1) for a full description of the AWK language and the options that gawk supports. EXAMPLES
cat << EOF > test.awk @include getopt.awk BEGIN { while (getopt(ARGC, ARGV, "am:q") != -1) ... } EOF igawk -f test.awk SEE ALSO
gawk(1) Effective AWK Programming, Edition 1.0, published by the Free Software Foundation, 1995. AUTHOR
Arnold Robbins (arnold@skeeve.com). Free Software Foundation Nov 3 1999 IGAWK(1)
All times are GMT -4. The time now is 08:10 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy