How to print values that are greater than 0.1 in at least 80% of the samples?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to print values that are greater than 0.1 in at least 80% of the samples?
# 22  
Old 02-24-2015
sorry about that. I was correcting the names to avoid confusion.
# 23  
Old 03-06-2015
When I run the script for 100% samples, it is giving wrong output. But it is working fine with 90% or 80% ....Any idea ?
# 24  
Old 03-06-2015
Can't believe that. lnc4 had 100% in g2...
# 25  
Old 03-06-2015
yes working. you are right. sorry about that
# 26  
Old 04-06-2015
The code is working great but is it also possible to print the number of rows that satisfy the given condition per group. thanks

ex:

Code:
g1  1(v2)
g2  1(v4)
gn  1(v5)



Here is the code, input and output

Input


Code:
        g1      g1      g1      g1      g1      g1      g1      g1      g1      g1      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2      g2  g2       g2      g2      g2      g2      g2      gn      gn      gn      gn      gn
        t1      t2      t3      t4      t5      t6      t7      t8      t9      t10     t1      t2      t3      t4      t5      t6      t7      t8      t9      t10     t11     t12     t13     t14     t15      t16     t17     t18     t19     t20     t1      t2      t3      t4      t5
v1    0       0       0       0       0       0       0       0       0       0.1     0.1     0.1     0.1     0.1     0       0       0       0       0       0       0       0       0       0   0     0       0       0       0       0       0       0       0       0       0
v2    0.2     0.1     0.2     0.2     0.2     2       2       2       2       2       0       0       0       0       0       0       0       0       0       0       0       0       0       0   0     0       0       0       0       0       0       0       0       0       0
v3    0       0       0       0       0       0       0       0       0       0       1       2       3       2       2       2       2       2       2       2       2       2       2       2   2     0       0       0       0       0       0       0       0       0       0
v4    0       0       0       0       0       0       0       0       0       0       0.2     0.2     2       2       2       2       2       2       2       2       2       2       2       2   2     2       2       2       2       2       0       0       0       0       0
v5    0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0

Script

Code:
awk     'NR==1  {for (i=1; i<=NF; i++)  {GRCNT[$i]++
                                         if (!GRMIN[$i]) GRMIN[$i]=i+1
                                         GRMAX[$i]=i+1
                                        }
                 if (debug) for (i in GRCNT) print "Verteilung: ", $1, i, GRCNT[i], GRMIN[i], GRMAX[i]
                }

                {for (gc in GRCNT)      {TOT[gc]=0
                                         for (i=GRMIN[gc];i<=GRMAX[gc];i++)
                                                {TOT[gc]+=($i>0.1)
                                                 if (debug) print $1, gc, i, $i, ($i>0.1)
                                                }
                                        }

                 for (gc in TOT)        {if (debug) print NR, $1, gc, GRCNT[gc], TOT[gc]
                                         if (TOT[gc] >= GRCNT[gc] * 0.8) {print; break}
                                        }
                }
        ' input

Ouput

Code:
        g1      g1      g1      g1      g1      g1      g1       g1      g1      g1      g2      g2      g2      g2      g2      g2       g2      g2      g2      g2      g2      g2      g2      g2  g2        g2      g2      g2      g2      g2      gn      gn      gn      gn       gn
        t1      t2      t3      t4      t5      t6      t7       t8      t9      t10     t1      t2      t3      t4      t5      t6       t7      t8      t9      t10     t11     t12     t13     t14     t15       t16     t17     t18     t19     t20     t1      t2      t3      t4       t5
v2     0.2     0.1     0.2     0.2     0.2     2       2       2       2        2       0       0       0       0       0       0       0       0        0       0       0       0       0       0   0     0       0       0        0       0       0       0       0       0       0
v4    0       0        0       0       0       0       0       0       0       0       0.2      0.2     2       2       2       2       2       2       2       2        2       2       2       2   2     2       2       2       2       2        0       0       0       0       0
v5    0       0       0        0       0       0       0       0       0       0       0       0        0       0       0       0       0       0       0       0       0        0       0       0   0     0       0       0       0       0       1        1       1       1       1

# 27  
Old 04-06-2015
Try
Code:
awk     'NR==1  {for (i=1; i<=NF; i++)  {GRCNT[$i]++
                                         if (!GRMIN[$i]) GRMIN[$i]=i+1
                                         GRMAX[$i]=i+1
                                        }
                 if (debug) for (i in GRCNT) print "Verteilung: ", $1, i, GRCNT[i], GRMIN[i], GRMAX[i]
                }

         NR>2   {for (gc in GRCNT)      {TOT[gc]=0
                                         for (i=GRMIN[gc];i<=GRMAX[gc];i++)
                                                {TOT[gc]+=($i>0.1)
                                                 if (debug) print $1, gc, i, $i, ($i>0.1)
                                                }
                                        }

                 for (gc in TOT)        {if (debug) print NR, $1, gc, GRCNT[gc], TOT[gc]
                                         if (TOT[gc] >= GRCNT[gc] * 0.8) {print; CNT[gc]++; break}
                                        }
                }
         END    {for (c in CNT) print c, CNT[c]}
        ' file

This User Gave Thanks to RudiC For This Post:
# 28  
Old 04-07-2015
I am getting the following errors. Any help would be greatly appreciated. Thanks

1. g1_lpx 1. This one should be 2 as l1 and l5 satisy the condition 80%
2. g2_edfj should be 1 as l2 satisfy the condition 80%
3. Header is not coming in the output

input
Code:
        g1_lpx  g1_lpx  g1_lpx  g1_lpx  g2_edfj g2_edfj g2_edfj g3_pp   g3_pp   g3_pp   g3_pp   g4_x    g4_x    gn_m    gn_m    gn_m    gn_m
        qwe100  qwe101  qwe133  qwe44   qweq33  qweq44  qwe77   qwexc2  qwe34   qwe55   qwe77   qwe99   qwe88   qwer5   qwer6   qwer8   qwer9
l1      1       1       1       0       0       0       0       0       0       0       0       1       1       0       0       0       0
l2      0       0       0       1       1       1       1       0       0       0       0       1       1       0       0       0       0
l3      0       0       0       0       0       0       0       0       0       0       0       0       0       1       1       1       1
l4      0       0       0       0       0       0       0       0       0       0       0       0       0       0.3     0.3     0.3     0.3
l5      0.4     0.4     0.4     0.4     0       0       0       0       0       0       0       0       0       0       0       0       0


Script

Code:
awk     'NR==1  {for (i=1; i<=NF; i++)  {GRCNT[$i]++
                                         if (!GRMIN[$i]) GRMIN[$i]=i+1
                                         GRMAX[$i]=i+1
                                        }
                 if (debug) for (i in GRCNT) print "Verteilung: ", $1, i, GRCNT[i], GRMIN[i], GRMAX[i]
                }

         NR>2   {for (gc in GRCNT)      {TOT[gc]=0
                                         for (i=GRMIN[gc];i<=GRMAX[gc];i++)
                                                {TOT[gc]+=($i>0.1)
                                                 if (debug) print $1, gc, i, $i, ($i>0.1)
                                                }
                                        }

                 for (gc in TOT)        {if (debug) print NR, $1, gc, GRCNT[gc], TOT[gc]
                                         if (TOT[gc] >= GRCNT[gc] * 0.8) {print; CNT[gc]++; break}
                                        }
                }
         END    {for (c in CNT) print c, CNT[c]}
        ' $1

ouput
Code:
l1      1       1       1       0       0       0       0       0       0       0       0       1       1       0       0       0       0
l2      0       0       0       1       1       1       1       0       0       0       0       1       1       0       0       0       0
l3      0       0       0       0       0       0       0       0       0       0       0       0       0       1       1       1       1
l4      0       0       0       0       0       0       0       0       0       0       0       0       0       0.3     0.3     0.3     0.3
l5      0.4     0.4     0.4     0.4     0       0       0       0       0       0       0       0       0       0       0       0       0
gn_m 2
g4_x 2
g1_lpx 1

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to replace the field values, which are greater than the specified value with TRUE?

I have a csv file as given below, org1 org2 org3 org4 org5 gene1 100 80 90 80 150 gene2 30 70 50 50 115 gene3 40 120 60 40 105 gene4 20 72 40 60 20 I need to replace the fields are having values greater than 100 with "TRUE". I used the following commands to replace... (6 Replies)
Discussion started by: dineshkumarsrk
6 Replies

2. Red Hat

EX300 - Samples Needed

May i have some samples of EX300 exam cert.? pls help ---------- Post updated at 01:12 AM ---------- Previous update was at 12:52 AM ---------- Who can share new RHCE V7 EX300 exam dumps/EX300 exam questions? ---------- Post updated at 03:48 AM ---------- Previous update was at 01:12... (1 Reply)
Discussion started by: ded325
1 Replies

3. Shell Programming and Scripting

Calculating mean for samples 1-3 4-6 etc.

Hi. I have a LOOONG list of samples but I am not sure how to write the loop/script to calculate the mean... I normally use awk... ...................MEAN Sample1 25.82 40.61333 Sample1 47.6 Sample1 48.42 Sample2 54.03 54.12 Sample2 53.98 Sample2 54.35 etc..... I would like to... (4 Replies)
Discussion started by: danieladna
4 Replies

4. Shell Programming and Scripting

Help me to find a greater than or smaller than values

Hi, i need to find one of the value from my file is in between two numbers, that is the value is greater than 34 and smaller than 50, Ex: File.txt col1 col2 col3 col4 1 Name1 93 w 2 Name2 94 a 3 Name3 32 b 4 Name4 45 x 5 Name5 50 y 6 Name6 49 z here i need to find col3 values are... (7 Replies)
Discussion started by: Shenbaga.d
7 Replies

5. Shell Programming and Scripting

awk to get values greater than

data.txt August 09 17:16 2013 August 09 17:17 2013 August 09 17:19 2013 August 09 17:20 2013 August 09 17:21 2013 August 09 17:22 2013 August 09 17:23 2013 August 09 17:24 2013 to print from a point in this file, to the end of the file, i type: awk '/August 09 17:22/,0' data.txt. ... (1 Reply)
Discussion started by: SkySmart
1 Replies

6. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

- I have two files (File 1 and File 2) and the contents of the files are mentioned below. - I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2. - I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies

7. Shell Programming and Scripting

AWK/SED print if 2nd character in a column is greater than 0

We have an access log where column 8 displays the time in seconds like below: Tj8nQAoNgwsAABov9cIAAAFL - 10.13.131.80 - - (0) - "GET /aaaaa/bbbb/bbbb where column 8 is printed (0). We are trying to find how many entries are there that has column 8 greater than 0. Remember $8 is (0) and not... (5 Replies)
Discussion started by: spacemtn5
5 Replies

8. Homework & Coursework Questions

Problem with awk,not able print the file that is greater than 3000 bytes.

My Script: #!/bin/sh date=`date +%y%m%d -d"1 day ago"` in_dir=/vis/logfiles/to_solmis cp `grep -il ST~856~ $inbound_dir/*$date*` /vis/sumit/in_ASN/ for i in /vis/sumit/in_ASN/* do mkdir -p /vis/sumit/inboundasns.$date cp `echo $i`... (1 Reply)
Discussion started by: shrima.pratima
1 Replies

9. UNIX for Dummies Questions & Answers

Print lines which are greater than

I have a file which has a list of titles and then 14 lines afterwards. I need to find the 1 through 14 lines which are greater than 15k and print the title and the line which matched. Sample before: ABC.CDE.NORTH.NET 1:18427 2:302 3:15559 4:105 5:5 6:2 7:2 8:2 9:4 10:2 11:17 12:2... (3 Replies)
Discussion started by: numele
3 Replies

10. Shell Programming and Scripting

Finding absolute values greater than a certain value

Hi I am posting here for the first time. I am trying to write a script that reads a data file and tries to determine if any absolute values that are above 0.5 I was thinking it ought to be possible to do this with awk somehow. Are there any suggestions before I start reinventing the wheel? ... (4 Replies)
Discussion started by: jackygrahamez
4 Replies
Login or Register to Ask a Question

Featured Tech Videos