Calculate 5th percentile based on another column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Calculate 5th percentile based on another column
# 1  
Old 04-20-2016
Calculate 5th percentile based on another column

I would like to have some help in calculating 5th percentile value of column 2 for each site, the input is like below:
Code:
site val1 val2
002 10 25.3
002 20 25.3
002 30 25.3
002 40 20
002 50 20
002 60 20
002 70 20
002 80 30
002 90 30
002 100 30
002 120 30
003 20 30.3
003 20 30.3
003 30 20
003 40 40

Based on what i found, I could write sth like:awk '{s[NR]=$2} END{print s[int(NR*0.05+0.5)]}', but this only works for the same site (i.e.,column 1 is identical), how to do this for multiple sites? The desired output should be:
site val
002 10
003 20

Thank you.
# 2  
Old 04-20-2016
Try this, please feel free to correct any errors in my calculations:-
Code:
awk '
        NR > 1 {
                ++T[$1]
                A[$1 FS T[$1]] = $2
        }
        END {
                print "Site", "Val"
                for ( k in T )
                {
                        idx = sprintf( "%.0f", T[k] * 0.05 )
                        idx = ( idx == 0 ? 1 : idx )
                        print k, A[k FS idx]
                }
        }
' OFS='\t' file

# 3  
Old 04-20-2016
It seems working. Thank you.

@Yoda, It seems working. Thank you.
@Yoda, could you please add explanation to each line? Thank you.

Last edited by wuhuai; 04-22-2016 at 11:27 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Search spaces in 5th column in large file

i have a file having 5 columns with more than million records. And i want to search using UNIX command to find if there are any spaces in 5th column. any please help. (1 Reply)
Discussion started by: sivakumar.p
1 Replies

2. Linux

Filter a .CSV file based on the 5th column values

I have a .CSV file with the below format: "column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10 "12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""... (2 Replies)
Discussion started by: dhruuv369
2 Replies

3. Shell Programming and Scripting

Total of 5th column using awk or any other utility in UNIX??

Hi I have this file which contains Al,AADESH,id1_0,23,2013-01-28,2,2 Al,AADESH,id1_0,23,2013-01-29,4,4 Al,AADESH,id1_0,23,2013-01-30,2,1 Al,AADESH,id1_0,31,2013-01-29,1,1 Al,AESH,id1_0,31,2013-01-31,2,2 Al,AESH,id2_2,23,2013-01-29,1,1 Al,AESH,id2_2,31,2013-01-31,1,1 ... (5 Replies)
Discussion started by: nikhil jain
5 Replies

4. Shell Programming and Scripting

Calculate the average of a column based on the value of another column

Hi, I would like to calculate the average of column 'y' based on the value of column 'pos'. For example, here is file1 id pos y c 11 1 220 aa 11 4333 207 f 11 5333 112 ee 11 11116 305 e 11 11117 310 r 11 22228 781 gg 11 ... (2 Replies)
Discussion started by: jackken007
2 Replies

5. Shell Programming and Scripting

Calculate 2nd Column Based on 1st Column

Dear All, I have input file like this. input.txt CE2_12-15 3950.00 589221.0 9849709.0 768.0 CE2_12_2012 CE2_12-15 3949.00 589199.0 9849721.0 768.0 CE2_12_2012 CE2_12-15 3948.00 589178.0 9849734.0 768.0 CE2_12_2012 CE2_12-52 1157.00 ... (3 Replies)
Discussion started by: attila
3 Replies

6. Shell Programming and Scripting

Calculate difference in timestamps based on unique column value

Hi Friends, Require a quick help to write the difference between 2 timestamps based on a unique column value: Input file: 08/23/2012 12:36:09,JOB_5340,08/23/2012 12:36:14,JOB_5340 08/23/2012 12:36:22,JOB_5350,08/23/2012 12:36:26,JOB_5350 08/23/2012 13:08:51,JOB_5360,08/23/2012... (4 Replies)
Discussion started by: asnandhakumar
4 Replies

7. Shell Programming and Scripting

Transpose timestamp based on column values and calculate time difference

Hello Expert, I need to transpose Date-Timestamp based on same column values and calculate time difference. The input file would be as below and required output is mentioned in the bottom INPUT File ======== 08/23/2012 12:36:09 JOB_5340 08/23/2012 12:36:14 JOB_5340 08/23/2012... (2 Replies)
Discussion started by: asnandhakumar
2 Replies

8. Shell Programming and Scripting

top 10 highest and lowest percentile from a column

Hi, I want to extract the the top 10 and lowest 10 percentile for a column of values. For example in column 2 for this file: JOE 1 JAY 5 JAM 6 JIL 8 JIB 4 JIH 3 JIG 2 JIT 7 JAM 9 MAR 10 The top 10 lowest will be: JOE 1 and the top 10 highest will be: (2 Replies)
Discussion started by: kylle345
2 Replies

9. Shell Programming and Scripting

shell script to sort the 5th column

hi folks, I have this data in a data.txt file and i want to sort the 5th column and in descending order: Jun 15 119.167.247.40 = 23 Jun 15 119.167.247.40 = 3 Jun 15 208.115.46.125 = 12 Jun 15 208.115.46.125 = 6 Jun 15 210.51.10.160 = 20 I want this sample output: Jun... (2 Replies)
Discussion started by: linuxgeek
2 Replies

10. Shell Programming and Scripting

Can we use 'tr' command to print 5th column of output of 'ls -l'

Hi All, I know awk command can do it, but can we use tr command to print 5th column of out put 'ls -l' command???? Regards, Nidhi... (4 Replies)
Discussion started by: Nidhi2177
4 Replies
Login or Register to Ask a Question