Finding data value that contains x% of points | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Finding data value that contains x% of points

UNIX for Dummies Questions & Answers


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 12-13-2012
ida1215 ida1215 is offline
Registered User
 
Join Date: Dec 2011
Last Activity: 3 June 2014, 3:14 AM EDT
Posts: 127
Thanks: 63
Thanked 1 Time in 1 Post
Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24).

Code:
manually to find the data value that encompasses 60% of data points, I tried something like:

awk '$1 >= 0.233 {print $0}' > threshold_0.233.txt
awk '$1 >= 0.234 {print $0}' > threshold_0.234.txt
awk '$1 >= 0.235 {print $0}' > threshold_0.235.txt

then I counted all the data if it correspond to 60% of n.
trial-and-error until I get all the values I needed at different %.


Code:
0.222568470365
0.221756265888
0.219760388204
0.242798143771
0.238352821721
0.241443756619
0.223094316003
0.228262624788
0.216889793498
0.210031152159
0.21097303707
0.207019965666
0.217014341085
0.239244868006
0.240522828032
0.237227034969
0.257647932043
0.248749576572
0.246545881317
0.247231196664
0.234222785343
0.235188699739
0.254819829246
0.250148878221
0.275682631829
0.287082318457
0.252075020326
0.412756783786
0.402542710592
0.227780278349

Any suggestion on how to go through it? Thanks much.
Sponsored Links
    #2  
Old 12-13-2012
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 23 November 2014, 5:03 PM EST
Location: Saskatchewan
Posts: 19,912
Thanks: 845
Thanked 3,416 Times in 3,201 Posts
There's probably a way to do that statistically. Checking...
Sponsored Links
    #3  
Old 12-13-2012
ida1215 ida1215 is offline
Registered User
 
Join Date: Dec 2011
Last Activity: 3 June 2014, 3:14 AM EDT
Posts: 127
Thanks: 63
Thanked 1 Time in 1 Post
Thanks for having a check, .
    #4  
Old 12-13-2012
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 23 November 2014, 5:03 PM EST
Location: Saskatchewan
Posts: 19,912
Thanks: 845
Thanked 3,416 Times in 3,201 Posts
Your data doesn't seem to have a normal distribution.

There's a much more obvious way anyway, don't know why it didn't occur to me before Sort it, then look past the % number of lines you want for the threshold.


Code:
sort -n data > sorted

awk 'NR==FNR { N++; next } FNR > (.8*N) { print $1 ; exit }' sorted sorted

rm -f sorted

The Following User Says Thank You to Corona688 For This Useful Post:
ida1215 (12-13-2012)
Sponsored Links
    #5  
Old 12-13-2012
ida1215 ida1215 is offline
Registered User
 
Join Date: Dec 2011
Last Activity: 3 June 2014, 3:14 AM EDT
Posts: 127
Thanks: 63
Thanked 1 Time in 1 Post
Thank you very much Corona688. The data I posted is just a part of the whole data and those were extracted prediction values at certain points and might explain the non-normal distribution (?). Anyways, bunch of thanks.
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
GNUPLOT- how to change the style of data points natasha Programming 0 05-17-2010 09:03 PM
How to get data only inside polygon created by points which is part of whole data from file? reva UNIX for Dummies Questions & Answers 7 04-12-2010 12:27 PM
Writing an algorithm to recode data points doobedoo Shell Programming and Scripting 10 10-27-2009 12:51 PM
recoding data points using SED?? doobedoo Shell Programming and Scripting 7 10-12-2009 03:34 PM
to extarct data points cdfd123 Shell Programming and Scripting 5 01-12-2008 09:39 AM



All times are GMT -4. The time now is 05:09 AM.