the smallest number from 90% of highest numbers from all numbers in file
Hello All,
I am having problem to find what is the smallest number from 90% of highest numbers from all numbers in file. I am having file with thousands of lines and hundreds of columns.
I am familiar mainly with bash but I am open to whatever suggestion witch will lead to the solutions.
If I explain it differently I have fx 1000 numbers between 0 and 10000. The results could be:
90% of numbers are bigger than 1000
80% of numbers are bigger than 2342
70% of numbers are bigger than 5674
etc.
I am looking for numbers like 1000, 2342, 5674 as in this example.
I am sure that there is some statistical method how to do this, but I cannot remember and can find it how it is called. If I know what method can be used to do this I may find the way to calculate it too.
INPUT can looks like this, but much bigger (in columns and rows), the numbers are not sorted in any way (it may looks like that here however)
OUTPUT
here is 20numbers, fx I would like to have 5bands. Each band will have 20% of numbers, meaning
I hope that it it is more clear now.
I am slowlly find it way around, but it is not that much elegant and I am creating lots of rubbish around. The have to do this for tens of files with 50000numbers in each file. That reason why I am looking for elegant and quick solution.
Thank you
Last edited by radoulov; 05-22-2011 at 05:49 AM..
Reason: Code tags.
For a real quick solution, I would
(1) Put the data one on a line.
(2) Sort.
(3) Pass it to 'awk' with the required percentile value as a parameter.
(4) Use pattern $1 < parameter.
(5) For each record make it the minimum if needed.
(6) On END, print the value.
The OP does not know what the limits are... he or she needs to find them. Consider:
Now find the middle point. It the first list 4 is the mid point. But in the second list its 6. You don't know 4 or 6 ahead of time. The mid point is the 50% point. Now image a much longer list and you need to find the data element at 10%, 20%, 30%...90% points in the list.
Hi!
I found and then adapt the code for my pipeline...
awk -F"," -vOFS="," '{printf "%0.2f %0.f\n",$2,$4}' xxx > yyy
I add -F"," -vOFS="," (for input and output as csv file) and I change the columns and the number of decimal...
It works but I have also some problems... here my columns
... (7 Replies)
Hi again. Sorry for all the questions — I've tried to do all this myself but I'm just not good enough yet, and the help I've received so far from bartus11 has been absolutely invaluable. Hopefully this will be the last bit of file manipulation I need to do.
I have a file which is formatted as... (4 Replies)
Hi, I have a list.txt file with number ranges and want to print/save new all.txt file with all the numbers and between the numbers.
== list.txt ==
65936
65938
65942 && 65943
65945 ... (7 Replies)
Hi all,
I have a large column of numbers like
5.6789
2.4578
9.4678
13.5673
1.6589
.....
I am trying to make an awk code so that awk can easily go through the column and arrange the numbers from least to highest like
1.6589
2.4578
5.6789
.......
can anybody suggest, how can I do... (5 Replies)
Howdy experts,
We have some ranges of number which belongs to particual group as below.
GroupNo StartRange EndRange
Group0125 935300 935399
Group2006 935400 935476
937430 937459
Group0324 935477 935549
... (6 Replies)
I have two files one (numbers file)contains the numbers(approximately 30000) and the other file(record file) contains the records(approximately 40000)which may or may not contain the numbers from that file.
I want to seperate the records which has the field 1=(any of the number from numbers... (15 Replies)
How to replace many numbers with one number in a file.
Many numbers like 444565,454678,443298,etc. i want to replace these with one number (300).Please halp me out. (2 Replies)