I'm a experimental biochemist, but I'm few time ago, I started to do bioinformatics stuffs.
I have to filer the output results of some programs that I've been used.
I've learned bash and a little bit of AWK. But, now I have to do filters more complicated, and I don't know how!
HTML Code:
NAMES Match %Score Signals Path
--------------------------------------------------------------------------------------------------
AK154051 1612 83.5666148263 GTAG,GTAG AK154051chrX155,398,170,105,317,40,1,34,38,71,36,95,136,95,27,4,14,22,42,71,0,158,556,726,1327,1644,1788,1789,1823,1861,1933,1969,2064,2203,2302,2330,2336,2352,2375,2432
AK154051 1918 99.4297563504 GTAG,GTAG,GTAG,GTAG AK154051chr1284,128,208,192,619,696,0,11097,13260,19852,21314,21933
AK154154 378 12.3368146214 ACAC,CTGC,CTAC AK154154chr10211,17,2,36,1,440,133,0,211,228,230,30134,119467,147210
AK154154 575 18.7663185379 AATG AK154154chr10416,195,6,320,412,55,247,0,416,612,619,50578,50990,51045
AK154356 1082 35.3710362864 GCAG,GTAG,GTAG,ATAC,GTAG,ATCC,GTAG AK154356chr617,2,99,4,102,33,3,23,75,56,176,154,17,112,67,108,186,30,40,46,17,1,105,99,4,2,36,0,3528,3530,3629,16150,16253,84877,84882,84908,95995,128988,129165,129320,129338,129451,129518,151565,151758,151788,151828,151874,151891,210498,210603,210708,210721,210726
AK154356 1110 36.2863680941 GTAG,GTAG,GTAG AK154356chrX189,151,25,44,102,15,4,11,8,28,12,11,54,3,21,40,114,173,19,154,129,3,181,53,112,99,0,191,342,367,411,513,647,653,666,674,702,715,120143,120199,120207,120229,120269,120384,120558,120578,120733,131163,131170,131354,131408,131521
AK154356 2238 73.161163779 ATAC,CTTT,GTAG,GTAG,GTAG,GTAG,GTAG,ATCA,GTAG AK154356chr1722,77,88,18,53,299,14,149,24,148,51,6,81,68,57,59,30,103,123,96,7,261,234,277,17,41,106,105,233,174,0,6819,6897,6985,7004,23372,23676,23691,27402,27427,64510,64561,93606,93687,106380,106439,106505,106535,106638,106761,107259,107267,107528,107763,108043,108061,108102,165246,165352,295436
AK154365 1932 85.0352112676 GTAG,GTAG,ATAC AK154365chr12152,148,11,79,158,145,337,2,8,24,35,35,246,254,157,179,137,134,0,136504,136653,136665,136746,136904,137049,137386,137388,144711,144735,144771,144807,145054,145308,145641,145820,145964
AK154365 1970 86.7077464789 ATAC,GTAG,GTAG AK154365chr12134,137,178,417,246,35,35,24,8,2,337,145,158,79,11,112,187,0,136,273,628,1046,1293,1329,1364,8533,8541,8543,8880,9025,9184,9264,9276,94145
AK154365 2093 92.1214788732 ACCC,CTAC AK154365chr1224,276,11,332,64,341,34,38,30,213,32,602,137,134,0,25,302,314,646,719,60885,175711,175750,175781,175994,176027,176630,176777
AK154365 2109 92.8257042254 CTGC AK154365chr12135,136,340,249,246,35,33,376,396,11,300,0,162,298,638,888,1135,1171,9541,9929,10326,10338
AK154365 2109 92.8257042254 CTGC AK154365chr12300,11,396,376,33,35,246,249,340,136,135,0,301,313,721,9518,9552,9588,9835,10084,10424,10587
AK154650 2570 99.3428681871 GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG AK154650chr15797,104,58,91,160,137,158,136,169,158,188,127,237,65,0,1397,2129,2390,3305,4242,6973,7253,7878,8125,9759,12393,13559,20995
At column 1, there are groups of lines which have the same name. Inside of this groups, select the name with the highest perceptual score (column 3) and the names that have 10 or less percent of difference with the highest score.
Then, if has been selected more than one name and there are names which only have GTAG, ATAC or GCAG signals, select the one of this which have the best score.
HTML Code:
NAMES Match %Score Signals Path
--------------------------------------------------------------------------------------------------
AK154051 1918 99.4297563504 GTAG,GTAG,GTAG,GTAG AK154051chr1284,128,208,192,619,696,0,11097,13260,19852,21314,21933
AK154154 378 12.3368146214 ACAC,CTGC,CTAC AK154154chr10211,17,2,36,1,440,133,0,211,228,230,30134,119467,147210
AK154154 575 18.7663185379 AATG AK154154chr10416,195,6,320,412,55,247,0,416,612,619,50578,50990,51045
AK154356 2238 73.161163779 ATAC,CTTT,GTAG,GTAG,GTAG,GTAG,GTAG,ATCA,GTAG AK154356chr1722,77,88,18,53,299,14,149,24,148,51,6,81,68,57,59,30,103,123,96,7,261,234,277,17,41,106,105,233,174,0,6819,6897,6985,7004,23372,23676,23691,27402,27427,64510,64561,93606,93687,106380,106439,106505,106535,106638,106761,107259,107267,107528,107763,108043,108061,108102,165246,165352,295436
AK154365 1970 86.7077464789 ATAC,GTAG,GTAG AK154365chr12134,137,178,417,246,35,35,24,8,2,337,145,158,79,11,112,187,0,136,273,628,1046,1293,1329,1364,8533,8541,8543,8880,9025,9184,9264,9276,94145
AK154650 2570 99.3428681871 GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG,GTAG AK154650chr15797,104,58,91,160,137,158,136,169,158,188,127,237,65,0,1397,2129,2390,3305,4242,6973,7253,7878,8125,9759,12393,13559,20995
But, mainly I need to know the easier way to do this, AWK or python or another? Because I have to do many filters and operations like that.