How to print values that are greater than 0.1 in at least 80% of the samples?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to print values that are greater than 0.1 in at least 80% of the samples?
# 15  
Old 02-23-2015
Quote:
Why do lines v2, v3, v4 show up in your sample output?
It is only V4 and V5. Now I corrected it.

Are there always two groups? Of identical length?
No. Some of the groups could have different number of samples.

What be the exact condition for when to print and when not?
The values should be printed if it satisfy the condition that is greater than 0.1 in 80% of the samples in at least one of the group. And the ones which do not satisfy should be ignored.
Sorry for not being so clear. Thanks.

Last edited by quincyjones; 02-23-2015 at 09:55 AM.. Reason: grammatical error
# 16  
Old 02-23-2015
This is for exactly the sample you posted - two groups of 10 members each:
Code:
awk     '       {G1=G2=0
                 for (i=2;i<=11;i++) {G1+=($i>0.1); G2+=($(i+10)>0.1)}
                }
         G1 >= 8 || G2 >= 8
        ' file
        group1  group1  group1  group1  group1  group1  group1  group1  group1  group1  group2  group2  group2  group2  group2  group2  group2  group2  group2  group2
        sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10        sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
v4    0.2     0       0       0       0       0       0       0       0       0 0.1     0.1     0.2     0.2     10      2       3       5       6       7
v5    0.1     0.1     0.2     0.2     10      2       3       5       6       7 0.2     0       0       0       0       0       0       0       0       0

NO flexibilty at all for changing group sizes or group count; count must be 10 each.
This User Gave Thanks to RudiC For This Post:
# 17  
Old 02-23-2015
Thank you RudiC, I misunderstood requirement, I thought we need to compare groups(which is correct) but didn't get about 80% concept thought user is asking any group is above 80% then it should print line.

Thanks,
R. Singh
# 18  
Old 02-23-2015
so i think it doesnt work with multiple groups with different sample sizes ?

ex:
Code:
        g1      g1      g1      g1      g1      g2      g2      g2      g2      g2      g3      g3      g3      g3      g3      g3      g3      g3      g3      g3
        s1      s2      s3      s4      s5      s1      s2      s3      s4      s5      s1      s2      s3      s4      s5      s6      s7      s8      s9      s10
v1      0       0.1     0.1     0.1     0.1     0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
v2      0.1     0.1     0.1     0.1     0       0       0       0       0       0       0       0       1       2       3       4       5       6       6       6
v3      0       0       0       0       0       0       0       0       0       0       0       0       0       1       0       1       0       0       0       0
v4      1       0       0       0       0       0       0       0       0       1       1       1       1       1       0       0       0       0       0       0
v5      0.2     0.2     0.2     0.2     0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0

output
Code:
        g1      g1      g1      g1      g1      g2      g2      g2      g2      g2      g3      g3      g3      g3      g3      g3      g3      g3      g3      g3
        s1      s2      s3      s4      s5      s1      s2      s3      s4      s5      s1      s2      s3      s4      s5      s6      s7      s8      s9      s10
v2      0.1     0.1     0.1     0.1     0       0       0       0       0       0       0       0       1       2       3       4       5       6       6       6
v5      0.2     0.2     0.2     0.2     0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0

ps:
Quote:
V2 has greater than 0.1 in group3 in atleast 80% of the samples in group3
V5 has the same in group1

Last edited by quincyjones; 02-23-2015 at 10:08 AM.. Reason: more explanation
# 19  
Old 02-23-2015
Well, try this - developed for your former sample it seems to work with the actual one:
Code:
awk     'NR==1  {for (i=1; i<=NF; i++) GRCNT[$i]++
#                                                               for (i in GRCNT) print i, GRCNT[i] 
                }

                {COL=2
                 for (gc in GRCNT)      {TOT[gc]=0
                                         STP=COL+GRCNT[gc]
                                         for (;COL<STP;COL++) TOT[gc]+=($COL>0.1)
                                        }

                 for (gc in TOT)        {#                      print gc, GRCNT[gc], TOT[gc]
                                         if (TOT[gc] >= GRCNT[gc] * 0.8) {print; break}
                                        }
                }
        ' file
        g1      g1      g1      g1      g1      g2      g2      g2      g2      g2      g3      g3      g3      g3      g3      g3      g3      g3      g3      g3
        s1      s2      s3      s4      s5      s1      s2      s3      s4      s5      s1      s2      s3      s4      s5      s6      s7      s8      s9      s10
v2      0.1     0.1     0.1     0.1     0       0       0       0       0       0       0       0       1       2       3       4       5       6       6       6
v5      0.2     0.2     0.2     0.2     0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0

The two commented out print statements are for debugging if you need some insight into the script's internal operation...
It still needs groups to be in adjacent columns and the groups to start in col 2.
This User Gave Thanks to RudiC For This Post:
# 20  
Old 02-24-2015
seems there is a bug in the script. for examples it couldn't print v4 (satisfy the condition in group2) and v5(satisfy the condition in group-n)

input
Code:
        g1      g1      g1      g1      g1      g1      g1      g1      g1      g1      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2  g2       g2      g2      g2      g2      g2      gn      gn      gn      gn      gn
        t1      t2      t3      t4      t5      t6      t7      t8      t9      t10     t1      t2      t3      t4      t5      t6      t7      t8      t9      t10     t11     t12     t13     t14     t15      t16     t17     t18     t19     t20     t1      t2      t3      t4      t5
v1    0       0       0       0       0       0       0       0       0       0.1     0.1     0.1     0.1     0.1     0       0       0       0       0       0       0       0       0       0   0     0       0       0       0       0       0       0       0       0       0
v2    0.2     0.1     0.2     0.2     0.2     2       2       2       2       2       0       0       0       0       0       0       0       0       0       0       0       0       0       0   0     0       0       0       0       0       0       0       0       0       0
v3    0       0       0       0       0       0       0       0       0       0       1       2       3       2       2       2       2       2       2       2       2       2       2       2   2     0       0       0       0       0       0       0       0       0       0
v4    0       0       0       0       0       0       0       0       0       0       0.2     0.2     2       2       2       2       2       2       2       2       2       2       2       2   2     2       2       2       2       2       0       0       0       0       0
v5    0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0   0     0       0       0       0       0       1       1       1       1       1

Output
Code:
        g1      g1      g1      g1      g1      g1      g1      g1      g1      g1      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2  g2       g2      g2      g2      g2      g2      gn      gn      gn      gn      gn
        t1      t2      t3      t4      t5      t6      t7      t8      t9      t10     t1      t2      t3      t4      t5      t6      t7      t8      t9      t10     t11     t12     t13     t14     t15      t16     t17     t18     t19     t20     t1      t2      t3      t4      t5
v2      0.2     0.1     0.2     0.2     0.2     2       2       2       2       2       0       0       0       0       0       0       0       0       0       0       0       0       0       0   0     0       0       0       0       0       0       0       0       0       0

Output should be

Code:
        g1      g1      g1      g1      g1      g1      g1       g1      g1      g1      g2      g2      g2      g2      g2      g2       g2      g2      g2      g2      g2      g2      g2      g2  g2        g2      g2      g2      g2      g2      gn      gn      gn      gn       gn
        t1      t2      t3      t4      t5      t6      t7       t8      t9      t10     t1      t2      t3      t4      t5      t6       t7      t8      t9      t10     t11     t12     t13     t14     t15       t16     t17     t18     t19     t20     t1      t2      t3      t4       t5
v2     0.2     0.1     0.2     0.2     0.2     2       2       2       2        2       0       0       0       0       0       0       0       0        0       0       0       0       0       0   0     0       0       0        0       0       0       0       0       0       0
v4    0       0        0       0       0       0       0       0       0       0       0.2      0.2     2       2       2       2       2       2       2       2        2       2       2       2   2     2       2       2       2       2        0       0       0       0       0
v5    0       0       0        0       0       0       0       0       0       0       0       0        0       0       0       0       0       0       0       0       0        0       0       0   0     0       0       0       0       0       1        1       1       1       1


Last edited by quincyjones; 02-24-2015 at 06:28 AM..
# 21  
Old 02-24-2015
Don't edit posts modifying samples pulling the rug from under me.

However, the reason was the group "gn" not being sorted after g1 and g2 but before within awk's arrays. I should have added that to the limitations (had I known this...). Try this:
Code:
awk     'NR==1  {for (i=1; i<=NF; i++)  {GRCNT[$i]++
                                         if (!GRMIN[$i]) GRMIN[$i]=i+1
                                         GRMAX[$i]=i+1
                                        }
                 if (debug) for (i in GRCNT) print "Verteilung: ", $1, i, GRCNT[i], GRMIN[i], GRMAX[i]
                }

                {for (gc in GRCNT)      {TOT[gc]=0
                                         for (i=GRMIN[gc];i<=GRMAX[gc];i++)
                                                {TOT[gc]+=($i>0.1)
                                                 if (debug) print $1, gc, i, $i, ($i>0.1)
                                                }
                                        }

                 for (gc in TOT)        {if (debug) print NR, $1, gc, GRCNT[gc], TOT[gc]
                                         if (TOT[gc] >= GRCNT[gc] * 0.8) {print; break}
                                        }
                }
        ' file
        g1      g1      g1      g1      g1      g1      g1      g1      g1      g1      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2 
        t1      t2      t3      t4      t5      t6      t7      t8      t9      t10     t1      t2      t3      t4      t5      t6      t7      t8      t9      t10     t11     t12
lnc2    0.2     0.1     0.2     0.2     0.2     2       2       2       2       2       0       0       0       0       0       0       0       0       0       0       0       0  
lnc4    0       0       0       0       0       0       0       0       0       0       0.2     0.2     2       2       2       2       2       2       2       2       2       2  
lnc5    0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to replace the field values, which are greater than the specified value with TRUE?

I have a csv file as given below, org1 org2 org3 org4 org5 gene1 100 80 90 80 150 gene2 30 70 50 50 115 gene3 40 120 60 40 105 gene4 20 72 40 60 20 I need to replace the fields are having values greater than 100 with "TRUE". I used the following commands to replace... (6 Replies)
Discussion started by: dineshkumarsrk
6 Replies

2. Red Hat

EX300 - Samples Needed

May i have some samples of EX300 exam cert.? pls help ---------- Post updated at 01:12 AM ---------- Previous update was at 12:52 AM ---------- Who can share new RHCE V7 EX300 exam dumps/EX300 exam questions? ---------- Post updated at 03:48 AM ---------- Previous update was at 01:12... (1 Reply)
Discussion started by: ded325
1 Replies

3. Shell Programming and Scripting

Calculating mean for samples 1-3 4-6 etc.

Hi. I have a LOOONG list of samples but I am not sure how to write the loop/script to calculate the mean... I normally use awk... ...................MEAN Sample1 25.82 40.61333 Sample1 47.6 Sample1 48.42 Sample2 54.03 54.12 Sample2 53.98 Sample2 54.35 etc..... I would like to... (4 Replies)
Discussion started by: danieladna
4 Replies

4. Shell Programming and Scripting

Help me to find a greater than or smaller than values

Hi, i need to find one of the value from my file is in between two numbers, that is the value is greater than 34 and smaller than 50, Ex: File.txt col1 col2 col3 col4 1 Name1 93 w 2 Name2 94 a 3 Name3 32 b 4 Name4 45 x 5 Name5 50 y 6 Name6 49 z here i need to find col3 values are... (7 Replies)
Discussion started by: Shenbaga.d
7 Replies

5. Shell Programming and Scripting

awk to get values greater than

data.txt August 09 17:16 2013 August 09 17:17 2013 August 09 17:19 2013 August 09 17:20 2013 August 09 17:21 2013 August 09 17:22 2013 August 09 17:23 2013 August 09 17:24 2013 to print from a point in this file, to the end of the file, i type: awk '/August 09 17:22/,0' data.txt. ... (1 Reply)
Discussion started by: SkySmart
1 Replies

6. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

- I have two files (File 1 and File 2) and the contents of the files are mentioned below. - I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2. - I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies

7. Shell Programming and Scripting

AWK/SED print if 2nd character in a column is greater than 0

We have an access log where column 8 displays the time in seconds like below: Tj8nQAoNgwsAABov9cIAAAFL - 10.13.131.80 - - (0) - "GET /aaaaa/bbbb/bbbb where column 8 is printed (0). We are trying to find how many entries are there that has column 8 greater than 0. Remember $8 is (0) and not... (5 Replies)
Discussion started by: spacemtn5
5 Replies

8. Homework & Coursework Questions

Problem with awk,not able print the file that is greater than 3000 bytes.

My Script: #!/bin/sh date=`date +%y%m%d -d"1 day ago"` in_dir=/vis/logfiles/to_solmis cp `grep -il ST~856~ $inbound_dir/*$date*` /vis/sumit/in_ASN/ for i in /vis/sumit/in_ASN/* do mkdir -p /vis/sumit/inboundasns.$date cp `echo $i`... (1 Reply)
Discussion started by: shrima.pratima
1 Replies

9. UNIX for Dummies Questions & Answers

Print lines which are greater than

I have a file which has a list of titles and then 14 lines afterwards. I need to find the 1 through 14 lines which are greater than 15k and print the title and the line which matched. Sample before: ABC.CDE.NORTH.NET 1:18427 2:302 3:15559 4:105 5:5 6:2 7:2 8:2 9:4 10:2 11:17 12:2... (3 Replies)
Discussion started by: numele
3 Replies

10. Shell Programming and Scripting

Finding absolute values greater than a certain value

Hi I am posting here for the first time. I am trying to write a script that reads a data file and tries to determine if any absolute values that are above 0.5 I was thinking it ought to be possible to do this with awk somehow. Are there any suggestions before I start reinventing the wheel? ... (4 Replies)
Discussion started by: jackygrahamez
4 Replies
Login or Register to Ask a Question