How to print values that are greater than 0.1 in at least 80% of the samples?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to print values that are greater than 0.1 in at least 80% of the samples?
# 8  
Old 02-17-2015
NF is the number of fields.

If T is greater than 80% of NF, print.
This User Gave Thanks to Corona688 For This Post:
# 9  
Old 02-23-2015
Quote:
Originally Posted by RavinderSingh13
Hello quincyjones,

Could you please try following and let me know if this helps.(Little addition to Corona's code)
Code:
awk 'BEGIN{ T=0} ; {if(NR==1){print $0} else if(NR>1){for (i=2;i<=NF;i++) if ($i>0.1) {T++ }  if(T > (NF * 0.8)) {print;T=""}}}'   Input_file

Output will be as follows.
Code:
        sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
v5    0.1     0.1     0.2     0.2     10      2       3       5       6       7

Thanks,
R. Singh

Is it possible to extend the same code but calculating 80% in each group separately like the flowing

Input

Code:
        group1  group1  group1  group1  group1  group1  group1  group1  group1  group1  group2  group2  group2  group2  group2  group2  group2  group2  group2  group2
        sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10        sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
v1    0.2     0.1     0.1    0       1       2       3       4       9       10 0.2     0.1     0.1    0       1       2       3       4       9       10
v2    0       0       0.01    0       0       0       0       0       0       0 0       0       0.01    0       0       0       0       0       0       0
v3    0       0       0       0       0       0       0       0       0       0 0       0       0       0       0       0       0       0       0       0
v4    0.2     0       0       0       0       0       0       0       0       0 0.1     0.1     0.2     0.2     10      2       3       5       6       7
v5    0.1     0.1     0.2     0.2     10      2       3       5       6       7 0.2     0       0       0       0       0       0       0       0       0

output
Code:
         group1  group1  group1  group1  group1  group1  group1  group1  group1  group1  group2  group2  group2  group2  group2  group2  group2  group2  group2  group2
        sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10        sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
v4    0.2     0       0       0       0       0       0       0       0       0 0.1     0.1     0.2     0.2     10      2       3       5       6       7
v5    0.1     0.1     0.2     0.2     10      2       3       5       6       7 0.2     0       0       0       0       0       0       0       0       0


Last edited by quincyjones; 02-23-2015 at 09:26 AM..
# 10  
Old 02-23-2015
That certainly is possible.
Why do lines v2, v3, v4 show up in your sample output?
Are there always two groups? Of identical length?
What be the exact condition for when to print and when not?
# 11  
Old 02-23-2015
oops. I corrected it now. So each value should be greater than 0.1 in 80% of the samples in at least one group. Ex: V4 satisfy this condition in group2 and V5 in group1.
# 12  
Old 02-23-2015
Hello quincyjones,

I think output should be v1. Following may help you in same, please let me know if this helps.
Code:
 awk '{for(i=2;i<=11;i++){if($i > .1 && $(i+10) > .1){T=1}};if(T){print $0;T=""}}'  Input_file

Output will be as follows.
Code:
        group1  group1  group1  group1  group1  group1  group1  group1  group1  group1  group2  group2  group2  group2  group2  group2  group2  group2  group2  group2
        sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10        sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
v1    0.2     0.1     0.1    0       1       2       3       4       9       10 0.2     0.1     0.1    0       1       2       3       4       9       10

EDIT: Sorry typo here changed the output now.


Thanks,
R. Singh

Last edited by RavinderSingh13; 02-23-2015 at 09:43 AM.. Reason: Cleared typo now
# 13  
Old 02-23-2015
It is v4 and v5. Because v1 has three samples of either group 1 or group2 have values have <=0.1 (so it doesn't satisfy the condition "greater than 0.1 in at least 80% of the samples in a specific group). Hope that is clear.
# 14  
Old 02-23-2015
You didn't answer my second & third question.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to replace the field values, which are greater than the specified value with TRUE?

I have a csv file as given below, org1 org2 org3 org4 org5 gene1 100 80 90 80 150 gene2 30 70 50 50 115 gene3 40 120 60 40 105 gene4 20 72 40 60 20 I need to replace the fields are having values greater than 100 with "TRUE". I used the following commands to replace... (6 Replies)
Discussion started by: dineshkumarsrk
6 Replies

2. Red Hat

EX300 - Samples Needed

May i have some samples of EX300 exam cert.? pls help ---------- Post updated at 01:12 AM ---------- Previous update was at 12:52 AM ---------- Who can share new RHCE V7 EX300 exam dumps/EX300 exam questions? ---------- Post updated at 03:48 AM ---------- Previous update was at 01:12... (1 Reply)
Discussion started by: ded325
1 Replies

3. Shell Programming and Scripting

Calculating mean for samples 1-3 4-6 etc.

Hi. I have a LOOONG list of samples but I am not sure how to write the loop/script to calculate the mean... I normally use awk... ...................MEAN Sample1 25.82 40.61333 Sample1 47.6 Sample1 48.42 Sample2 54.03 54.12 Sample2 53.98 Sample2 54.35 etc..... I would like to... (4 Replies)
Discussion started by: danieladna
4 Replies

4. Shell Programming and Scripting

Help me to find a greater than or smaller than values

Hi, i need to find one of the value from my file is in between two numbers, that is the value is greater than 34 and smaller than 50, Ex: File.txt col1 col2 col3 col4 1 Name1 93 w 2 Name2 94 a 3 Name3 32 b 4 Name4 45 x 5 Name5 50 y 6 Name6 49 z here i need to find col3 values are... (7 Replies)
Discussion started by: Shenbaga.d
7 Replies

5. Shell Programming and Scripting

awk to get values greater than

data.txt August 09 17:16 2013 August 09 17:17 2013 August 09 17:19 2013 August 09 17:20 2013 August 09 17:21 2013 August 09 17:22 2013 August 09 17:23 2013 August 09 17:24 2013 to print from a point in this file, to the end of the file, i type: awk '/August 09 17:22/,0' data.txt. ... (1 Reply)
Discussion started by: SkySmart
1 Replies

6. Shell Programming and Scripting

Compare values in two files. For matching rows print corresponding values from File 1 in File2.

- I have two files (File 1 and File 2) and the contents of the files are mentioned below. - I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2. - I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies

7. Shell Programming and Scripting

AWK/SED print if 2nd character in a column is greater than 0

We have an access log where column 8 displays the time in seconds like below: Tj8nQAoNgwsAABov9cIAAAFL - 10.13.131.80 - - (0) - "GET /aaaaa/bbbb/bbbb where column 8 is printed (0). We are trying to find how many entries are there that has column 8 greater than 0. Remember $8 is (0) and not... (5 Replies)
Discussion started by: spacemtn5
5 Replies

8. Homework & Coursework Questions

Problem with awk,not able print the file that is greater than 3000 bytes.

My Script: #!/bin/sh date=`date +%y%m%d -d"1 day ago"` in_dir=/vis/logfiles/to_solmis cp `grep -il ST~856~ $inbound_dir/*$date*` /vis/sumit/in_ASN/ for i in /vis/sumit/in_ASN/* do mkdir -p /vis/sumit/inboundasns.$date cp `echo $i`... (1 Reply)
Discussion started by: shrima.pratima
1 Replies

9. UNIX for Dummies Questions & Answers

Print lines which are greater than

I have a file which has a list of titles and then 14 lines afterwards. I need to find the 1 through 14 lines which are greater than 15k and print the title and the line which matched. Sample before: ABC.CDE.NORTH.NET 1:18427 2:302 3:15559 4:105 5:5 6:2 7:2 8:2 9:4 10:2 11:17 12:2... (3 Replies)
Discussion started by: numele
3 Replies

10. Shell Programming and Scripting

Finding absolute values greater than a certain value

Hi I am posting here for the first time. I am trying to write a script that reads a data file and tries to determine if any absolute values that are above 0.5 I was thinking it ought to be possible to do this with awk somehow. Are there any suggestions before I start reinventing the wheel? ... (4 Replies)
Discussion started by: jackygrahamez
4 Replies
Login or Register to Ask a Question