the smallest number from 90% of highest numbers from all numbers in file

05-21-2011

Registered User

8, 0

Join Date: Apr 2011

Last Activity: 21 May 2011, 7:14 PM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

the smallest number from 90% of highest numbers from all numbers in file

Hello All,
I am having problem to find what is the smallest number from 90% of highest numbers from all numbers in file. I am having file with thousands of lines and hundreds of columns.
I am familiar mainly with bash but I am open to whatever suggestion witch will lead to the solutions.

If I explain it differently I have fx 1000 numbers between 0 and 10000. The results could be:

90% of numbers are bigger than 1000
80% of numbers are bigger than 2342
70% of numbers are bigger than 5674
etc.

I am looking for numbers like 1000, 2342, 5674 as in this example.

I am sure that there is some statistical method how to do this, but I cannot remember and can find it how it is called. If I know what method can be used to do this I may find the way to calculate it too.

Thank you for help

Apfik

View Public Profile for Apfik

Find all posts by Apfik

05-21-2011

Registered User

164, 39

Join Date: Sep 2010

Last Activity: 1 April 2015, 7:46 AM EDT

Posts: 164

Thanks Given: 4

Thanked 39 Times in 38 Posts

Hi,

Could you please give us an input file and desired output example ?

Chirel

View Public Profile for Chirel

Find all posts by Chirel

05-21-2011

Registered User

8, 0

Join Date: Apr 2011

Last Activity: 21 May 2011, 7:14 PM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi

INPUT can looks like this, but much bigger (in columns and rows), the numbers are not sorted in any way (it may looks like that here however)

Code:

0.35156582 0.36767924 0.40942771 1.15580244 1.20877668

1.21842761 1.27427217 1.41896056 1.16207427 1.21533599

1.41774799 1.22634608 1.28255355 1.42818227 1.19181428

2.08513847 1.78348512 1.86522813 2.07701713 1.78747556

OUTPUT
here is 20numbers, fx I would like to have 5bands. Each band will have 20% of numbers, meaning

Code:

100% of numbers is bigger then 0 or seeking number
80% of numbers is bigger then (seeking number)
60% of numbers is bigger then (seeking number)
40% of numbers is bigger then (seeking number)
20% of numbers is bigger then (seeking number)

I hope that it it is more clear now.
I am slowlly find it way around, but it is not that much elegant and I am creating lots of rubbish around. The have to do this for tens of files with 50000numbers in each file. That reason why I am looking for elegant and quick solution.

Thank you

Last edited by radoulov; 05-22-2011 at 05:49 AM.. Reason: Code tags.

Apfik

View Public Profile for Apfik

Find all posts by Apfik

05-21-2011

Administrator Emeritus

9,926, 461

Join Date: Aug 2001

Last Activity: 26 February 2016, 12:31 PM EST

Location: Ashburn, Virginia

Posts: 9,926

Thanks Given: 63

Thanked 461 Times in 270 Posts

I don't see a quick solution. You need to put the numbers in a list, sort them, count them, then see what is at each 10% of the list.

Perderabo

View Public Profile for Perderabo

Find all posts by Perderabo

05-22-2011

Registered User

45, 2

Join Date: May 2011

Last Activity: 25 September 2012, 4:23 AM EDT

Location: Chennai, India

Posts: 45

Thanks Given: 0

Thanked 2 Times in 2 Posts

For a real quick solution, I would
(1) Put the data one on a line.
(2) Sort.
(3) Pass it to 'awk' with the required percentile value as a parameter.
(4) Use pattern $1 < parameter.
(5) For each record make it the minimum if needed.
(6) On END, print the value.

ananthap

View Public Profile for ananthap

Find all posts by ananthap

05-22-2011

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

Try this script:

Code:

#!/usr/bin/perl
open I, "$ARGV[0]";
while (<I>){
  chomp;
  push @x, split / /, $_;
}
@x=sort {$a<=>$b} @x;
for ($i=0;$i<=$#x;$i+=($#x+1)/5){
  printf "%d%s of numbers is bigger than %s\n", 100-$i/($#x+1)*100,"%",$x[$i];
}

Run it like this: ./script.pl data_file

bartus11

View Public Profile for bartus11

Find all posts by bartus11

05-22-2011

Administrator Emeritus

9,926, 461

Join Date: Aug 2001

Last Activity: 26 February 2016, 12:31 PM EST

Location: Ashburn, Virginia

Posts: 9,926

Thanks Given: 63

Thanked 461 Times in 270 Posts

The OP does not know what the limits are... he or she needs to find them. Consider:

Code:

1 2 3 4 7 8 9
1 2 3 6 7 8 9

Now find the middle point. It the first list 4 is the mid point. But in the second list its 6. You don't know 4 or 6 ahead of time. The mid point is the 50% point. Now image a much longer list and you need to find the data element at 10%, 20%, 30%...90% points in the list.

Perderabo

View Public Profile for Perderabo

Find all posts by Perderabo

Shell Programming and Scripting

the smallest number from 90% of highest numbers from all numbers in file

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Decimal numbers and letters in the same collums: round numbers

Discussion started by: echo manolis

2. Shell Programming and Scripting

Adding (as in arithmetic) to numbers in columns in file, and writing new file with new numbers

Discussion started by: crunchgargoyle

3. Shell Programming and Scripting

Print numbers between two number ranges

Discussion started by: AK47

4. UNIX for Dummies Questions & Answers

Print numbers and associated text belonging to an interval of numbers

Discussion started by: lucasvs

5. Programming

Help with find highest and smallest number in a file with c

Discussion started by: cpp_beginner

6. Shell Programming and Scripting

trying to make an AWK code for ordering numbers in a column from least to highest

Discussion started by: ananyob

7. Shell Programming and Scripting

read numbers from file and output which numbers belongs to which range

Discussion started by: thepurple

8. UNIX for Dummies Questions & Answers

seperating records with numbers from a set of numbers

Discussion started by: Shiv@jad

9. Shell Programming and Scripting

Perl ? - How to find and print the lowest and highest numbers punched in by the user?

Discussion started by: some124one

10. AIX

How to replace many numbers with one number in a file

Discussion started by: vpandey