Calculate average of top n% of values - UNIX


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Calculate average of top n% of values - UNIX
# 8  
Old 05-21-2014
strange, for your sample files AND the shell wrapper with gawk mentioned above, I get:
Code:
file1.txt        5.38
file2.txt        189.38

Could you post the output of cat -vet file2.txt using code tags, please!
# 9  
Old 05-21-2014
Code:
cat -vet file2.txt
chr2L^I10^I230^M$
chr2L^I20^I20^M$
chr2L^I35^I1.5^M$
chr2L^I36^I1000^M$
chr3R^I12^I100^M$
chr3R^I20^I300^M$
chrX^I10^I15^M$
chrX^I26^I1500

I am so thankful for your valuable helps. The thing is that now I realized I want it even simpler and I was dumb enough not to know!!

What I want is basically sort the values in third column and take average of top 0.1% of them! (no matter if there are repetitions or not)

with my subzero knowledge I started like this:
Code:
awk '{print $3}' test.txt | sort -nk1

now I don't know how to get the average of top 0.1% values (the last 0.001 lines)

Bunch of thanks and regards! Smilie
# 10  
Old 05-21-2014
you have ^M-s in your file - probably from ftp-ing from the Windows box.
Get rid of ^M-s in BOTH files and re-run the script, e.g. tr -d '\015' < file2 >file2_new
# 11  
Old 05-21-2014
I still get the same...
Code:
tr -d '\015' <file1.txt > file1_new.txt
tr -d '\015' <file2.txt > file2_new.txt

Code:
file1_new.txt	21.50
file1.txt	21.50
file2_new.txt	1515.00
file2.txt	1515.00

I'm sure that I do something stupid!!
when I think about it it's very simple but when I try to do it you saw how much I could! :|
and when I try to do what you suggested I'm not able to solve a tiny issue since it's beyond my knowledge.

Thanks for your time anyway.
# 12  
Old 05-21-2014
Apologies there was a slight bug in there (I wasn't dividing by the number of values used).

I've tried to make it a little easier to understand and output the number of unique values found and what is summed, you can take out the red line if you don't want this extra info:

Code:
for file in *.txt
do
   gawk -v p=50 '
    {a[sprintf("%07d",$3)]}
    END{
      asorti(a,as)
      end=length(a)
      start=end-int(end*p/100)+1
      if(start>end) start=end
      printf("File has %d unique values, average from %d to %d\n", end, start, end)
      for(i=start;i<=end;i++) total=total + as[i]
      printf "%s\t%.2f\n", FILENAME, total/(end-start+1)}' "$file"
done


Last edited by Chubler_XL; 05-21-2014 at 05:53 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Calculate average from a given set of keys and values

Hello, I am writing a script which expects as its input a hash with student names as the keys and marks as the values. The script then returns array of average marks for student scored 60-70, 70-80, and over 90. Output expected 50-70 1 70-90 3 over 90 0 The test script so far... (4 Replies)
Discussion started by: nans
4 Replies

2. Shell Programming and Scripting

Calculate the average per block.

My old school way is a one liner. And will search for average from SAR, to get the data receive rate. But, I dont think it is practical or accurate,. Because it calculates to off peak hours. I am planning to change it. My cron runs every 30 mins. When my cron runs, and my time is 14:47pm,, it will... (1 Reply)
Discussion started by: invinzin21
1 Replies

3. Shell Programming and Scripting

Calculate average, azimut and distance

Gents, Please i will to get the distance and azimut from 2 coordinates: Usig excel formula i get the correct values, but i will like to do it using awk. Example A 35089.0 50345.016 9 75 1 2101774 77 70 79 483911.6 2380106.9 137.4 1 1 6 1 A 35089.0 50345.01620 75... (8 Replies)
Discussion started by: jiam912
8 Replies

4. Shell Programming and Scripting

Calculate Average AWK

I want to calculate the average line by line of some files with several lines on them, the files are identical, just want to average the 3rd columns of those files.:wall: Example file: File 1 001 0.046 0.667267 001 0.047 0.672028 001 0.048 0.656025 001 0.049 ... (2 Replies)
Discussion started by: AriasFco
2 Replies

5. Shell Programming and Scripting

AWK novice - calculate the average

Hi, I have the following data in a file for example: P1 XXXXXXX.1 YYYYYYY.1 ZZZ.1 P1 XXXXXXX.2 YYYYYYY.2 ZZZ.2 P1 XXXXXXX.3 YYYYYYY.3 ZZZ.3 P1 XXXXXXX.4 YYYYYYY.4 ZZZ.4 P1 XXXXXXX.5 YYYYYYY.5 ZZZ.5 P1 XXXXXXX.6 YYYYYYY.6 ZZZ.6 P1 XXXXXXX.7 YYYYYYY.7 ZZZ.7 P1 XXXXXXX.8 YYYYYYY.8 ZZZ.8 P2... (6 Replies)
Discussion started by: alex2005
6 Replies

6. Shell Programming and Scripting

Calculate average time using a script

Hello, I'm hoping to get some help on calculating an average time from a list of times (hour:minute:second). Here's what my list looks like right now, it will grow (I can get the full date or change the formatting of this as well): 07:55:31 09:42:00 08:09:02 09:15:23 09:27:45 09:49:26... (4 Replies)
Discussion started by: jaredhanks
4 Replies

7. Programming

calculate average

I have a file which is 2 3 4 5 6 6 so i am writing program in C to calculate mean.. #include<stdio.h> #include<string.h> #include <math.h> double CALL mean(int n , double x) main (int argc, char **argv) { char Buf,SEQ; int i; double result = 0; FILE *fp; (3 Replies)
Discussion started by: cdfd123
3 Replies

8. UNIX for Dummies Questions & Answers

calculate average of column 2

Hi I have fakebook.csv as following: F1(current date) F2(popularity) F3(name of book) F4(release date of book) 2006-06-21,6860,"Harry Potter",2006-12-31 2006-06-22,,"Harry Potter",2006-12-31 2006-06-23,7120,"Harry Potter",2006-12-31 2006-06-24,,"Harry Potter",2006-12-31... (0 Replies)
Discussion started by: onthetopo
0 Replies
Login or Register to Ask a Question