Filtering my major and minor values


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filtering my major and minor values
# 1  
Old 01-03-2013
Filtering my major and minor values

I want to remove all rows with a minor repeating count less than 30% compared to the major repeating count from my table. The values of a col(starting col 2) can assume is A,T,G,C and N. Each row has at least 2 values and at most 4 repeating values(out of ATGC).
N is considered a missing value and shouldn't be considered.

These are the rules for filtering.

Consider the row which has a count of 4 for Ts and 1 for As (starting col 2).
S10_14113025 T T T A T
If the count of the minor repeating value is less than 30% of the major repeating value, delete the row.

So count(A)/count(T)=1/4=25% < 30%...this row should be removed.

Consider the row with 2 Ts, and 1 A.
S10_14113025 T N N A T
Ignoring the Ns, the minor frequency is
count(A)/count(T)=1/2=50% > 30% ....this row should NOT be removed.

Consider the row with more than 2 values (3 in this case as in G,C,A).
S10_14113072 G C A G N
this row should NOT be removed,nothing needs to be calculated.


Inp

Code:
S10_14113025        T    T    T    A    T    T
S10_14113072        A    C    C    A    A    A
S10_14113073        G    C    G    G    C    N
S10_14113079        G    C    C    C    N    N
S10_14113080        G    C    C    C    N    A
S10_14113027        T    T    N    A    N    N

desired out

Code:
S10_14113072        A    C    C    A    A    A
S10_14113073        G    C    G    G    C   N
S10_14113080        G    C    C    C    N    A
S10_14113027        T    T    N    A    N    N

# 2  
Old 01-03-2013
given your explanation - I don't understand the LAST example. Why "nothing needs to be calculated"?
Are you only considering As and Ts?
This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 01-03-2013
We only need to filter rows having biallelic nature (exactly two values excluding N).
The rows with more than two values are biologically significant and cant be filtered out.
# 4  
Old 01-03-2013
ok, I almost got it, but....
why exactly this line did NOT make it into the output?
Code:
S10_14113079        G    C    C    C    N    N

This User Gave Thanks to vgersh99 For This Post:
# 5  
Old 01-03-2013
it's a bit verbose, but can be used as a start.
awk -f newbie.awk myInputFile
newbie.awk:
Code:
function initVars()
{
  split("",n)
  split("",a)
  c=0
}

{
  for(i=2;i<=NF;i++)
    if ($i != "N") {
     if (!($i in a))
       n[++c]=$i
     a[$i]++
    }

  if (c>2) { initVars(); print;next }

  div=a[n[1]]/a[n[2]]
  div=(div>1)?1/div:div
  if ( div*100 > 30)
     print
  initVars()
}

# 6  
Old 01-03-2013
I`m sorry that line should be included...my bad Smilie
# 7  
Old 01-03-2013
Quote:
Originally Posted by newbie83
I`m sorry that line should be included...my bad Smilie
then try the suggestion
This User Gave Thanks to vgersh99 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Solaris

Major and Minor number of Virtual File System

Hi friends, Please let me know if there is any way to find out Major and Minor numbers of virtual file system like below: /devices 0K 0K 0K 0% /devices ctfs 0K 0K 0K 0% /system/contract proc 0K 0K ... (8 Replies)
Discussion started by: nitj
8 Replies

2. Shell Programming and Scripting

How to filter out major and minor?

Hi, I have line like this : proj_name/module/trunk/module_1_0 where the first "1" refers to major version and second "0" refers to minor version. any AWK or command like that so that I can filter out the major and minor ? like major= command | input line minor= command |... (4 Replies)
Discussion started by: bhaskar_m
4 Replies

3. Programming

which head file for major and minor function?

#include <sys/types.h> #include <sys/stat.h> #include <sys/termios.h> #include <stdio.h> #include <stdlib.h> #include <stddef.h> #include <string.h> #include <unistd.h> #include <signal.h> #include <sys/mkdev.h> int main(int argc, char *argv) { int i; struct stat buf; ... (4 Replies)
Discussion started by: konvalo
4 Replies

4. AIX

Difference between Major and Minor in AIX

Difference between Major and Minor in AIX (5 Replies)
Discussion started by: AIXlearner
5 Replies

5. AIX

how do I change major-minor numbers of disk devices

Good evening ... does anyone of you know how to change major/minor numbers of disk devices ? I had to migrate from raid1 to raid5 and this messed up my ASM cluster - I know which devices should have which IDs to match the content - but I have no idea how to change it. Any help would be... (2 Replies)
Discussion started by: zxmaus
2 Replies

6. Solaris

Help with Major and minor number

Hi Does anyone know what the major and minor numbers are in Solaris? (2 Replies)
Discussion started by: wisdom
2 Replies

7. Shell Programming and Scripting

sort major.minor.release_build_x

would like to order this input based on major.minor.release AND build number Label abc_def_0.0.3_build_999 2008/08/01 'Created by me.' Label abc_def_0.0.9_build_1000 2008/08/01 'Created by me.' Label abc_def_9.0.9_build_10001 2008/08/01 'Created by me.' Label abc_def_10.9.100_build_2... (4 Replies)
Discussion started by: gurpal2000
4 Replies

8. Solaris

major & minor number

Hi Can anyone tell me what is major number and minor number in the mknod command. Also what these numbers mean. I have gone through the man pages but still I couldn't understand. Regards (3 Replies)
Discussion started by: RajaRC
3 Replies

9. Programming

Device Major/Minor numbers

To further my fledgling knowledge of C, I am re-writing some of the Unix command set. My current command is an ls-style command. All works well, except for device files. How do I get the major/minor numbers for the dev files? I see from the stat struct there are st_rdev and st_dev members. Do... (1 Reply)
Discussion started by: zazzybob
1 Replies
Login or Register to Ask a Question