Read Two Columns - Apply Condition on Six other columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Read Two Columns - Apply Condition on Six other columns
# 1  
Old 12-05-2014
Read Two Columns - Apply Condition on Six other columns

Hello All,

Here is my input

Code:
univ1    chr1    100    200    -    GeneA    500    1    0    0.1    0.2    0.3    0.4    0.5
univ1    chr1    100    200    -    GeneA    600    1    0    0.0    0.0    0.0    0.0    0.1
univ1    chr1    100    200    -    GeneA    700    1    0    0.4    0.4    0.4    0.4    0.5
univ1    chr1    150    250    -    GeneB    100    1    0    0.5    0.1    0.2    0.3    0.3
univ2    chr2    300    400    +    GeneC    500    1    0    0.0    0.0    0.0    0.0    0.0
univ2    chr2    300    400    +    GeneC    700    1    0    0.1    0.1    0.1    0.1    0.1
univ2    chr2    300    400    +    GeneC    600    1    0    0.5    0.3    0.2    0.1    0.0
univ2    chr2    350    450    +    GeneD    900    1    0    0.0    0.0    0.0    0.0    0.0
univ3    chr3    500    600    -    GeneE    500    1    0    0.5    0.0    0.0    0.0    0.0
univ3    chr3    500    600    -    GeneE    800    1    0    0.1    0.2    0.3    0.4    0.4
univ3    chr3    500    600    -    GeneE    900    1    0    0.4    0.4    0.4    0.4    0.4
univ3    chr3    550    650    -    GeneF    900    1    0    1.1    2.2    3.3    4.4    5.5
univ4    chr4    500    600    +    GeneG    100    1    0    0.1    0.1    0.1    0.1    0.1
univ4    chr4    500    600    +    GeneG    200    1    0    0.2    0.2    0.2    0.2    0.2
univ4    chr4    500    600    +    GeneG    600    1    0    0.3    0.3    0.3    0.3    0.3
univ4    chr4    500    600    +    GeneG    800    1    0    0.4    0.4    0.4    0.4    0.4
univ4    chr4    500    600    +    GeneG    999    1    0    0.4    0.4    0.3    0.2    0.4

Here are the conditions

1. First look for common genes on column 6
2. Then consider the highest value in column 7 pertaining to column 6
3. Starting columns 10 through 14, even if one column has a value equal to or greater than 0.5, then print only that row and exclude all other rows of the same gene.
4. Even if the highest value in column 7 row has no values equal or greater than 0.5, then go to the next highest value and see if the condition is met.
5. For any gene, if none of the column values from 10 through 14 has a value equal to or greater than 0.5, then don’t print those records at all.

So my output will be

Code:
univ1    chr1    100    200    -    GeneA    700    1    0    0.4    0.4    0.4    0.4    0.5
univ1    chr1    150    250    -    GeneB    100    1    0    0.5    0.1    0.2    0.3    0.3
univ2    chr2    300    400    +    GeneC    600    1    0    0.5    0.3    0.2    0.1    0.0
univ3    chr3    500    600    -    GeneE    500    1    0    0.5    0.0    0.0    0.0    0.0
univ3    chr3    550    650    -    GeneF    900    1    0    1.1    2.2    3.3    4.4    5.5

What did I do so far?

1. I was using awk ‘!x[$6]++’ command after sorting on column6. And from there, I was piping the output by repeating if loops in awk for columns 10 through 14. I just realized that most useful data is being thrown out by doing it this way. After carefully looking at the data, I came up with this question.

Thanks in advance. Please comment if you have any questions.

Last edited by jim mcnamara; 12-05-2014 at 04:30 PM..
# 2  
Old 01-02-2015
You can do this in bash using 'while read -a your_array;do ... done < file_in'. You must set up the array variable in advance, and your columns will be put away in the array for each line/row. You can test the columns and decide whether to echo out (reproduce) the row.
# 3  
Old 01-02-2015
Try
Code:
sort -k6,6 -k7,7rn file |
awk     '$6==G          {next}
                        {P=0
                         for (i=10; i<=NF; i++) if ($i+0 >= 0.5) {P=1; break}
                        }
         P              {print; G=$6
                        }
        '
univ1    chr1    100    200    -    GeneA    700    1    0    0.4    0.4    0.4    0.4    0.5
univ1    chr1    150    250    -    GeneB    100    1    0    0.5    0.1    0.2    0.3    0.3
univ2    chr2    300    400    +    GeneC    600    1    0    0.5    0.3    0.2    0.1    0.0
univ3    chr3    500    600    -    GeneE    500    1    0    0.5    0.0    0.0    0.0    0.0
univ3    chr3    550    650    -    GeneF    900    1    0    1.1    2.2    3.3    4.4    5.5

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script to apply functions to multiple columns dynamically

Hello, I have a requirement to apply hashing algorithm on flat file on one or more columns dynamically based on header sample input file ID|NAME|AGE|GENDER 10|ABC|30|M 20|DEF|20|F say if i want multiple columns based on the header example id,name or id,age or name,gender and hash and... (13 Replies)
Discussion started by: mkathi
13 Replies

2. Shell Programming and Scripting

How can I apply 'date' command to specific columns, in a BASH script?

Hi everyone, I have a situation in which I have multiple (3 at last count) date columns in a CSV file (, delim), which need to be changed from: January 1 2017 (note, no comma after day) to: YYYY-MM-DD So far, I am able to convert a date using: date --date="January 12, 1990" +%Y-%m-%d ... (7 Replies)
Discussion started by: richardsantink
7 Replies

3. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns satisfy the condition

HI All, I'm embedding SQL query in Script which gives following output: Assignee Group Total ABC Group1 17 PQR Group2 5 PQR Group3 6 XYZ Group1 10 XYZ Group3 5 I have saved the above output in a file. How do i sum up the contents of this output so as to get following output: ... (4 Replies)
Discussion started by: Khushbu
4 Replies

4. UNIX for Dummies Questions & Answers

Create a file on UNIX with multiple columns on certain condition

I need to write the list of files to a new file in one column , the second column would contain the first line of that file (header record extracted through head -1 ) and the third column would contain the last record of that file (trailer record tail -1 ) . Example :- folder where the files... (8 Replies)
Discussion started by: IshuGupta
8 Replies

5. Shell Programming and Scripting

Columns to Rows - Transpose - Special Condition

Hi Friends, Hope all is well. I have an input file like this a gene1 10 b gene1 2 c gene2 20 c gene3 10 d gene4 5 e gene5 6 Steps to reach output. 1. Print unique values of column1 as column of the matrix, which will be a b c (5 Replies)
Discussion started by: jacobs.smith
5 Replies

6. Shell Programming and Scripting

Convert rows to columns based on condition

I have a file some thing like this: GN Name=YWHAB; RC TISSUE=Keratinocyte; RC TISSUE=Thymus; CC -!- FUNCTION: Adapter protein implicated in the regulation of a large CC spectrum of both general and specialized signaling pathways GN Name=YWHAE; RC TISSUE=Liver; RC ... (13 Replies)
Discussion started by: raj_k
13 Replies

7. Shell Programming and Scripting

Extracting rows and columns in a matrix based on condition

Hi I have a matrix with n rows and m columns like below example. i want to extract all the pairs with values <200. Input A B C D A 100 206 51 300 B 206 100 72 48 C 351 22 100 198 D 13 989 150 100 Output format A,A:200 A,C:51 B,B:100... (2 Replies)
Discussion started by: anurupa777
2 Replies

8. Shell Programming and Scripting

Combine columns from many files but keep them aligned in columns-shorter left column issue

Hello everyone, I searched the forum looking for answers to this but I could not pinpoint exactly what I need as I keep having trouble. I have many files each having two columns and hundreds of rows. first column is a string (can have many words) and the second column is a number.The files are... (5 Replies)
Discussion started by: isildur1234
5 Replies

9. Shell Programming and Scripting

Compare columns of 2 files based on condition defined in a different file

I have a control file which tells me which are the fields in the files I need to compare and based on the values I need to print the exact value if key =Y and output is Y , or if output is Y/N then I need to print only Y if it matches or N if it does not match and if output =N , then skip the feild... (7 Replies)
Discussion started by: newtoawk
7 Replies

10. Shell Programming and Scripting

Single command for add 2 columns and remove 2 columns in unix/performance tuning

Hi all, I have created a script which adding two columns and removing two columns for all files. Filename: Cust_information_1200_201010.txt Source Data: "1","Cust information","123","106001","street","1-203 high street" "1","Cust information","124","105001","street","1-203 high street" ... (0 Replies)
Discussion started by: onesuri
0 Replies
Login or Register to Ask a Question