How to print values that are greater than 0.1 in at least 80% of the samples?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to print values that are greater than 0.1 in at least 80% of the samples?
# 29  
Old 04-07-2015
Sorry for the headers. For the group counts, I didn't consider that one line can have several groups fulfilling the requirements. Try
Code:
awk     'NR==1  {for (i=1; i<=NF; i++)  {GRCNT[$i]++
                                         if (!GRMIN[$i]) GRMIN[$i]=i+1
                                         GRMAX[$i]=i+1
                                        }
                 if (debug) for (i in GRCNT) print "Verteilung: ", $1, i, GRCNT[i], GRMIN[i], GRMAX[i]
                 print
                }
         NR==2  {print}
         NR>2   {for (gc in GRCNT)      {TOT[gc]=0
                                         for (i=GRMIN[gc];i<=GRMAX[gc];i++)
                                                {TOT[gc]+=($i>0.1)
                                                 if (debug) print $1, gc, i, $i, ($i>0.1)
                                                }
                                        }
                }
                {for (gc in TOT)        {if (debug) print NR, $1, gc, GRCNT[gc], TOT[gc]
                                         if (TOT[gc] >= GRCNT[gc] * 0.8) {Pr=1; CNT[gc]++}
                                        }
                 if (Pr) {print; Pr=0}
                }
         END    {for (c in CNT) print c, CNT[c]}
        ' file
.
.
.
g4_x 2
g1_lpx 1
gn_m 2
g2_edfj 1

This User Gave Thanks to RudiC For This Post:
# 30  
Old 04-08-2015
this will be the final request. is it possible to print no.of groups and no.of rows in common from thesame input above. great thanks.


output

Code:
no.of groups    no.of rows in common
0       0
1       3
2       2
3       0
4       0
5       0

# 31  
Old 04-08-2015
Not clear. Please elaborate.
# 32  
Old 04-08-2015
Hope its clear .... please let me know if not. thanks

input
Code:
        g1_lpx  g1_lpx  g1_lpx  g1_lpx  g2_edfj g2_edfj g2_edfj g3_pp   g3_pp   g3_pp   g3_pp   g4_x    g4_x    gn_m    gn_m    gn_m    gn_m
        qwe100  qwe101  qwe133  qwe44   qweq33  qweq44  qwe77   qwexc2  qwe34   qwe55   qwe77   qwe99   qwe88   qwer5   qwer6   qwer8   qwer9
l1      1       1       1       0       0       0       0       0       0       0       0       1       1       0       0       0       0
l2      0       0       0       1       1       1       1       0       0       0       0       1       1       0       0       0       0
l3      0       0       0       0       0       0       0       0       0       0       0       0       0       1       1       1       1
l4      0       0       0       0       0       0       0       0       0       0       0       0       0       0.3     0.3     0.3     0.3
l5      0.4     0.4     0.4     0.4     0       0       0       0       0       0       0       0       0       0       0       0       0

output description
Quote:
0 (none of the groups)
1 (1 of the 5 groups g1_lpx or g2_edfj or g3_pp or g4_x or gn_m)
2 (2 of the 5 groups)
3 (3 of the 5 groups)
4 (4 of the 5 groups)
5 (5 of the 5 groups)

calculate how many of keys (l1 or l2 or l3 or l4 or l5) satisfy the condition 80% in groups 1 or 2 or 3 or 4 or 5

therefore

no.of groups no.of rows in common
0 0
1 3 (l3/l4/l5)
2 2(l1/l2)
3 0
4 0
5 0
# 33  
Old 04-09-2015
Code:
awk     'NR==1  {for (i=1; i<=NF; i++)  {if (!($i in GRCNT)) GR++
                                         GRCNT[$i]++
                                         if (!GRMIN[$i]) GRMIN[$i]=i+1
                                         GRMAX[$i]=i+1
                                        }
                 if (debug) for (i in GRCNT) print "Verteilung: ", $1, i, GRCNT[i], GRMIN[i], GRMAX[i]
                 print
                }
         NR==2  {print}
         NR>2   {for (gc in GRCNT)      {TOT[gc]=0
                                         for (i=GRMIN[gc];i<=GRMAX[gc];i++)
                                                {TOT[gc]+=($i>0.1)
                                                 if (debug) print $1, gc, i, $i, ($i>0.1)
                                                }
                                        }
                }
                {for (gc in TOT)        {if (debug) print NR, $1, gc, GRCNT[gc], TOT[gc]
                                         if (TOT[gc] >= GRCNT[gc] * 0.8) {Pr=1; GCNT[gc]++; LCNT[$1]++}
                                        }
                 if (Pr) {print; Pr=0}
                }
         END    {for (g in GCNT) print g, GCNT[g]
                 for (l in LCNT) {print l, LCNT[l]
                                  NCNT[LCNT[l]]++
                                 }
                 print "no. grp\tno.of rows in common"
                 for (i=0; i<=GR; i++) print i "\t" NCNT[i]+0
                }
        ' file
        g1_lpx  g1_lpx  g1_lpx  g1_lpx  g2_edfj g2_edfj g2_edfj g3_pp   g3_pp   g3_pp   g3_pp   g4_x    g4_x    gn_m    gn_m    gn_m    gn_m
        qwe100  qwe101  qwe133  qwe44   qweq33  qweq44  qwe77   qwexc2  qwe34   qwe55   qwe77   qwe99   qwe88   qwer5   qwer6   qwer8   qwer9
l1      1       1       1       0       0       0       0       0       0       0       0       1       1       0       0       0       0
l2      0       0       0       1       1       1       1       0       0       0       0       1       1       0       0       0       0
l3      0       0       0       0       0       0       0       0       0       0       0       0       0       1       1       1       1
l4      0       0       0       0       0       0       0       0       0       0       0       0       0       0.3     0.3     0.3     0.3
l5      0.4     0.4     0.4     0.4     0       0       0       0       0       0       0       0       0       0       0       0       0
g4_x 2
g1_lpx 1
gn_m 2
g2_edfj 1
l1 1
l2 2
l3 1
l4 1
l5 1
no. grp no.of rows in common
0       0
1       4
2       1
3       0
4       0
5       0

Please be aware that l1 only has one group satisfying the condition!
This User Gave Thanks to RudiC For This Post:
# 34  
Old 04-09-2015
Quote:
Please be aware that l1 only has one group satisfying the condition!
but l1 satisfy the condition in 2 groups i.e, g1_lpx and g4_x.
0,1,2,3,4,5 doest represent l1, l2, l3, l4, l5. they represent groups (g1_lpx.....gn_m)
# 35  
Old 04-09-2015
g1_lpx has three out of four which is 75% not exceeding the required 80%. What do you mean by
Quote:
0,1,2,3,4,5 doest represent l1, l2, l3, l4, l5. they represent groups (g1_lpx.....gn_m)
and how does it relate to my last proposal?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to replace the field values, which are greater than the specified value with TRUE?

I have a csv file as given below, org1 org2 org3 org4 org5 gene1 100 80 90 80 150 gene2 30 70 50 50 115 gene3 40 120 60 40 105 gene4 20 72 40 60 20 I need to replace the fields are having values greater than 100 with "TRUE". I used the following commands to replace... (6 Replies)
Discussion started by: dineshkumarsrk
6 Replies

2. Red Hat

EX300 - Samples Needed

May i have some samples of EX300 exam cert.? pls help ---------- Post updated at 01:12 AM ---------- Previous update was at 12:52 AM ---------- Who can share new RHCE V7 EX300 exam dumps/EX300 exam questions? ---------- Post updated at 03:48 AM ---------- Previous update was at 01:12... (1 Reply)
Discussion started by: ded325
1 Replies

3. Shell Programming and Scripting

Calculating mean for samples 1-3 4-6 etc.

Hi. I have a LOOONG list of samples but I am not sure how to write the loop/script to calculate the mean... I normally use awk... ...................MEAN Sample1 25.82 40.61333 Sample1 47.6 Sample1 48.42 Sample2 54.03 54.12 Sample2 53.98 Sample2 54.35 etc..... I would like to... (4 Replies)
Discussion started by: danieladna
4 Replies

4. Shell Programming and Scripting

Help me to find a greater than or smaller than values

Hi, i need to find one of the value from my file is in between two numbers, that is the value is greater than 34 and smaller than 50, Ex: File.txt col1 col2 col3 col4 1 Name1 93 w 2 Name2 94 a 3 Name3 32 b 4 Name4 45 x 5 Name5 50 y 6 Name6 49 z here i need to find col3 values are... (7 Replies)
Discussion started by: Shenbaga.d
7 Replies

5. Shell Programming and Scripting

awk to get values greater than

data.txt August 09 17:16 2013 August 09 17:17 2013 August 09 17:19 2013 August 09 17:20 2013 August 09 17:21 2013 August 09 17:22 2013 August 09 17:23 2013 August 09 17:24 2013 to print from a point in this file, to the end of the file, i type: awk '/August 09 17:22/,0' data.txt. ... (1 Reply)
Discussion started by: SkySmart
1 Replies

6. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

- I have two files (File 1 and File 2) and the contents of the files are mentioned below. - I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2. - I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies

7. Shell Programming and Scripting

AWK/SED print if 2nd character in a column is greater than 0

We have an access log where column 8 displays the time in seconds like below: Tj8nQAoNgwsAABov9cIAAAFL - 10.13.131.80 - - (0) - "GET /aaaaa/bbbb/bbbb where column 8 is printed (0). We are trying to find how many entries are there that has column 8 greater than 0. Remember $8 is (0) and not... (5 Replies)
Discussion started by: spacemtn5
5 Replies

8. Homework & Coursework Questions

Problem with awk,not able print the file that is greater than 3000 bytes.

My Script: #!/bin/sh date=`date +%y%m%d -d"1 day ago"` in_dir=/vis/logfiles/to_solmis cp `grep -il ST~856~ $inbound_dir/*$date*` /vis/sumit/in_ASN/ for i in /vis/sumit/in_ASN/* do mkdir -p /vis/sumit/inboundasns.$date cp `echo $i`... (1 Reply)
Discussion started by: shrima.pratima
1 Replies

9. UNIX for Dummies Questions & Answers

Print lines which are greater than

I have a file which has a list of titles and then 14 lines afterwards. I need to find the 1 through 14 lines which are greater than 15k and print the title and the line which matched. Sample before: ABC.CDE.NORTH.NET 1:18427 2:302 3:15559 4:105 5:5 6:2 7:2 8:2 9:4 10:2 11:17 12:2... (3 Replies)
Discussion started by: numele
3 Replies

10. Shell Programming and Scripting

Finding absolute values greater than a certain value

Hi I am posting here for the first time. I am trying to write a script that reads a data file and tries to determine if any absolute values that are above 0.5 I was thinking it ought to be possible to do this with awk somehow. Are there any suggestions before I start reinventing the wheel? ... (4 Replies)
Discussion started by: jackygrahamez
4 Replies
Login or Register to Ask a Question