Visit Our UNIX and Linux User Community


Finding data value that contains x% of points


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Finding data value that contains x% of points
# 1  
Old 12-13-2012
Finding data value that contains x% of points

Hi, I need help on finding the value of my data that encompasses certain percentage of my total data points (n). Attached is an example of my data, n=30. What I want to do is for instance is find the minimum threshold that still encompasses 60% (n=18), 70% (n=21) and 80% (n=24).
Code:
manually to find the data value that encompasses 60% of data points, I tried something like:

awk '$1 >= 0.233 {print $0}' > threshold_0.233.txt
awk '$1 >= 0.234 {print $0}' > threshold_0.234.txt
awk '$1 >= 0.235 {print $0}' > threshold_0.235.txt

then I counted all the data if it correspond to 60% of n.
trial-and-error until I get all the values I needed at different %.

Code:
0.222568470365
0.221756265888
0.219760388204
0.242798143771
0.238352821721
0.241443756619
0.223094316003
0.228262624788
0.216889793498
0.210031152159
0.21097303707
0.207019965666
0.217014341085
0.239244868006
0.240522828032
0.237227034969
0.257647932043
0.248749576572
0.246545881317
0.247231196664
0.234222785343
0.235188699739
0.254819829246
0.250148878221
0.275682631829
0.287082318457
0.252075020326
0.412756783786
0.402542710592
0.227780278349

Any suggestion on how to go through it? Thanks much.
# 2  
Old 12-13-2012
There's probably a way to do that statistically. Checking...
# 3  
Old 12-13-2012
Thanks for having a check, Smilie.
# 4  
Old 12-13-2012
Your data doesn't seem to have a normal distribution.

There's a much more obvious way anyway, don't know why it didn't occur to me before Smilie Sort it, then look past the % number of lines you want for the threshold.

Code:
sort -n data > sorted

awk 'NR==FNR { N++; next } FNR > (.8*N) { print $1 ; exit }' sorted sorted

rm -f sorted

This User Gave Thanks to Corona688 For This Post:
# 5  
Old 12-13-2012
Thank you very much Corona688. The data I posted is just a part of the whole data and those were extracted prediction values at certain points and might explain the non-normal distribution (?). Anyways, bunch of thanks.
 

Previous Thread | Next Thread
Test Your Knowledge in Computers #366
Difficulty: Medium
Bill Joy was responsible for creating the third BSD Unix release in March 1978.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Ranking data points from multiple files

I need to rank a large number of data points that exist in multiple files. My data points (Column 3) are based on unique values in columns 1 and 2. I need to rank the values that are in File 1, Column 3. For instance: Input File 1 AAA BBB 10 CCC DDD 16 EEE FFF 20 Input File 2 ... (47 Replies)
Discussion started by: ncwxpanther
47 Replies

2. Shell Programming and Scripting

Grabbing data between 2 points in text file

I have a text file that shows the output of my solar inverters. I want to separate this into sections. overview , device 1 , device 2 , device 3. Each device has different number of lines. but they all have unique starting points. Overview starts with 6 #'s, Devices have 4#'s and their data starts... (6 Replies)
Discussion started by: Mikey
6 Replies

3. Shell Programming and Scripting

Calculate difference between consecutive data points in a column from a file

Hi, I have a file with one column data (sample below) and I am trying to write a shell script to calculate the difference between consecutive data valuse i.e Var = Ni -N(i-1) 0.3141 -3.6595 0.9171 5.2001 3.5331 3.7022 -6.1087 -5.1039 -9.8144 1.6516 -2.725 3.982 7.769 8.88 (5 Replies)
Discussion started by: malandisa
5 Replies

4. Programming

GNUPLOT- how to change the style of data points

Hi, I am trying to arrange my graphs with GNUPLOT. Although it looked like simple at the beginning, I could not figure out an answer for the following: I want to change the style of my data points (not the line, just exact data points) The terminal assigns first + and then x to them but what I... (0 Replies)
Discussion started by: natasha
0 Replies

5. UNIX for Dummies Questions & Answers

How to get data only inside polygon created by points which is part of whole data from file?

hiii, Help me out..i have a huge set of data stored in a file.This file has has 2 columns which is latitude & longitude of a region. Now i have a program which asks for the number of points & based on this number it asks the user to enter that latitude & longitude values which are in the same... (7 Replies)
Discussion started by: reva
7 Replies

6. Shell Programming and Scripting

Group search (multiple data points) in Linux

Hi All I have a data set like this tab delimited: weft fgr-1 345 -1 fgrythdgd weft fgr-3 456 -2 ghjdklflllff weft fgr-11 456 -3 ghtjuffl weft fgr-1 213 -2 ghtyjdkl weft fgr-34 567 -5 fghytkflf frgt fgr-36 567 -1 ghrjufjf frgt fgr-45 678 -2 ghjruir frgt fgr-34 546 -5 gjjjgkldlld frgt... (4 Replies)
Discussion started by: Lucky Ali
4 Replies

7. Shell Programming and Scripting

Writing an algorithm to recode data points

I have a file that has been partially recoded so that data points that were formerly letter combinations are now -1, 0, or 1. I need to finish recoding the GG and CC data points. The file looks like this: ID 1 2 3 4 5 6 7 8 83845676 0 0 0 0 CC -1 CC CC 838469. -1 -1 1 GG CC 0 CC 1 83847041... (10 Replies)
Discussion started by: doobedoo
10 Replies

8. Shell Programming and Scripting

recoding data points using SED??

Hello all, I have a data file that needs some serious work...I have no idea how to implement the changes that are needed! The file is a genotypic file with >64,000 columns representing genetic markers, a header line, and >1100 rows that looks like this: ID 1 2 3 4 ... (7 Replies)
Discussion started by: doobedoo
7 Replies

9. Shell Programming and Scripting

to extarct data points

suppose u have a file which consist of many data points separated by asterisk Question is to extract third part in each line . 0.0002*0.003*-0.93939*0.0202*0.322*0.3332*0.2222*0.22020 0.003*0.3333*0.33322*-0.2220*0.3030*0.2222*0.3331*-0.3030 0.0393*0.3039*-0.03038*0.033*0.4033*0.30384*0.4048... (5 Replies)
Discussion started by: cdfd123
5 Replies

10. UNIX and Linux Applications

Gnuplot question: how to plot 3D points as colored points in map view?

I have a simple gnuplot question. I have a set of points (list of x,y,z values; irregularly spaced, i.e. no grid) that I want to plot. I want the plot to look like this: - points in map view (no 3D view) - color of each point should depend on its z-value. - I want to define my own color scale -... (0 Replies)
Discussion started by: karman
0 Replies

Featured Tech Videos