The UNIX and Linux Forums Finding data value that contains x% of points

 UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

# Finding data value that contains x% of points

## UNIX for Dummies Questions & Answers

#1
12-13-2012
 ida1215 Registered User Join Date: Dec 2011 Last Activity: 11 November 2013, 8:31 PM EST Posts: 120 Thanks: 59 Thanked 1 Time in 1 Post
Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24).

Code:
```manually to find the data value that encompasses 60% of data points, I tried something like:

awk '\$1 >= 0.233 {print \$0}' > threshold_0.233.txt
awk '\$1 >= 0.234 {print \$0}' > threshold_0.234.txt
awk '\$1 >= 0.235 {print \$0}' > threshold_0.235.txt

then I counted all the data if it correspond to 60% of n.
trial-and-error until I get all the values I needed at different %.```

Code:
```0.222568470365
0.221756265888
0.219760388204
0.242798143771
0.238352821721
0.241443756619
0.223094316003
0.228262624788
0.216889793498
0.210031152159
0.21097303707
0.207019965666
0.217014341085
0.239244868006
0.240522828032
0.237227034969
0.257647932043
0.248749576572
0.246545881317
0.247231196664
0.234222785343
0.235188699739
0.254819829246
0.250148878221
0.275682631829
0.287082318457
0.252075020326
0.412756783786
0.402542710592
0.227780278349```

Any suggestion on how to go through it? Thanks much.
#2
12-13-2012
 Corona688   Mead Rotor Join Date: Aug 2005 Last Activity: 6 December 2013, 4:58 PM EST Location: Saskatchewan Posts: 17,714 Thanks: 597 Thanked 2,840 Times in 2,693 Posts
There's probably a way to do that statistically. Checking...
#3
12-13-2012
 ida1215 Registered User Join Date: Dec 2011 Last Activity: 11 November 2013, 8:31 PM EST Posts: 120 Thanks: 59 Thanked 1 Time in 1 Post
Thanks for having a check, .
#4
12-13-2012
 Corona688   Mead Rotor Join Date: Aug 2005 Last Activity: 6 December 2013, 4:58 PM EST Location: Saskatchewan Posts: 17,714 Thanks: 597 Thanked 2,840 Times in 2,693 Posts
Your data doesn't seem to have a normal distribution.

There's a much more obvious way anyway, don't know why it didn't occur to me before Sort it, then look past the % number of lines you want for the threshold.

Code:
```sort -n data > sorted

awk 'NR==FNR { N++; next } FNR > (.8*N) { print \$1 ; exit }' sorted sorted

rm -f sorted```

 The Following User Says Thank You to Corona688 For This Useful Post: ida1215 (12-13-2012)