Matrix parsing help !


 
Thread Tools Search this Thread
Top Forums Programming Matrix parsing help !
# 8  
Old 01-03-2012
Thanks to take time re reply me I am very grateful.
Now it's seem better ! but the third line DE I have to ignore it because my original file is very very big ! I will have repeated information in my output.
# 9  
Old 01-03-2012
-- deleted --

Last edited by ctsgnb; 01-03-2012 at 10:45 AM..
# 10  
Old 01-03-2012
This is your code :
Code:
awk 'NR>1&&$3>=80{A[$1]=$1;B[A[$1]]=(B[A[$1]]?B[A[$1]]:$1)" "$2}END{for(i in A) print B[A[i]]}' test.tttt

And this is the output :
Code:
A D E
B C
D E

--> The DE is not a single group it's normally a part of the group 1 (ADE) I don't now if I'm clear
What i want to do after it's to get every group ID and using Bioperl to check the corresponding fasta files in a database. So i need just a output with two line (for this exemple).
Thanks
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 01-03-2012 at 10:53 AM..
# 11  
Old 01-03-2012
you can get all single pairs belonging to at least one group that is 80 or more with the following :

Code:
$ cat f2
ID1 ID2 Identity
A B 70
A C 50
A D 90
A E 80
B C 95
B D 66
B E 47
C D 35
C E 25
D E 98
A B 70
A C 50
A D 90
A E 40
$ awk 'NR>1&&$3>=80{i=$1" "$2;j=$2" "$1;t=i<j?i:j;C[t]}END{for(k in C) print k}' f2
A D
A E
B C
D E
$

NOTE that this code assume that an A D association is just another D A association, letters are just displayed from lower to higher :
consider the following example :
Code:
$ cat f3
A B 10
B A 80
C D 70
E D 90
D B 80
A D 10
D A 93
$ awk 'NR>1&&$3>=80{i=$1" "$2;j=$2" "$1;t=i<j?i:j;C[t]}END{for(k in C) print k}' f3
A B
A D
B D
D E
$

---------- Post updated at 04:48 PM ---------- Previous update was at 04:24 PM ----------

you can also try the following code

Code:
$ cat f2
ID1 ID2 Identity
A B 70
A C 50
A D 90
A E 80
B C 95
B D 66
B E 47
C D 35
C E 25
D E 98
A B 70
A C 50
A D 90
A E 40
$ awk 'NR>1&&$3>=80{x=$1" "$2;for(i in A) {if (A[i]~x) next};A[$1]=(A[$1]?A[$1]:$1)" "$2}END{for(i in A) print A[i]}' f2
A D E
B C

---------- Post updated at 05:00 PM ---------- Previous update was at 04:48 PM ----------

To avoid that a same $2 appear more than once within a group you can also try :

Code:
awk 'NR>1&&$3>=80{A[$1]=(A[$1]?A[$1]:$1)(A[$1]~$2?z:" "$2)}END{for(i in A) print A[i]}' yourfile

Not sure to get what final result you expected.

Last edited by ctsgnb; 01-03-2012 at 12:07 PM..
# 12  
Old 01-03-2012
Thanks a lot it works !! but when i use the code for my initial file that i post in the first message it don't work ): ! I never use before the awk code i must learn it. It is possible to just change the A in your code with the noun of my first column ? Other thing this code can work with a very big data ? or just adapted for this specific case ?
# 13  
Old 01-03-2012
Did you make sure you've used the right threshold in your code (depending on your input file) ?
0.8 vs 80
# 14  
Old 01-03-2012
you are right sir I dont !! the output is not good unfortunately :
Code:
chromosome01_100293 chromosome01_168057 chromosome07_194379
chromosome01_29385 chromosome01_168057 chromosome07_194379 chromosome01_100293
chromosome08_116839 chromosome01_293853

---------- Post updated at 04:30 PM ---------- Previous update was at 04:26 PM ----------

the chromosome01_100293 is present for exemple in the line 1 and the line 2 in the same time

Last edited by radoulov; 01-04-2012 at 05:18 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing a subset of data from a large matrix

I do have a large matrix of the following format and it is tab delimited ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78 ch-ab1-20 0 2 3 4 5 6 ch-bb2-23 3 0 5 ... (6 Replies)
Discussion started by: Kanja
6 Replies

2. Shell Programming and Scripting

Highest value matrix parsing

Hi All I do have a matrix in the following format a_2 a_3 s_4 t_6 b 0 0.9 0.004 0 c 0 0 1 0 d 0 0.98 0 0 e 0.0023 0.96 0 0.0034 I have thousands of rows I would like to parse the maximum value in each of the row and out put that highest value along the column header of... (2 Replies)
Discussion started by: Kanja
2 Replies

3. Shell Programming and Scripting

Constructing a Matrix

Hi, I do have couple of files in a folder. The names of each of the files have a pattern. ahet_005678.txt ahet_005898.txt ahet_007678.txt ahet_004778.txt ... ... ahet_002378.txt Each of the above files have the same pattern of data with 4 columns and have an header for the last 3... (4 Replies)
Discussion started by: Kanja
4 Replies

4. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies

5. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

6. Shell Programming and Scripting

Matrix

Hi All I would like to merge multiple files with the same row and column size into a matrix format In a folder I have multiple files in the following format vi 12.txt a 1 b 5 c 7 d 0 vi 45.txt a 3 b 6 c 9 d 2 vi 9.txt a 4 (7 Replies)
Discussion started by: Lucky Ali
7 Replies

7. Shell Programming and Scripting

diagonal matrix to square matrix

Hello, all! I am struggling with a short script to read a diagonal matrix for later retrieval. 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125 1.000 0.587 0.159 0.357... (11 Replies)
Discussion started by: yifangt
11 Replies

8. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

9. Shell Programming and Scripting

Perl parsing compared to Ksh parsing

#! /usr/local/bin/perl -w $ip = "$ARGV"; $rw = "$ARGV"; $snmpg = "/usr/local/bin/snmpbulkget -v2c -Cn1 -Cn2 -Os -c $rw"; $snmpw = "/usr/local/bin/snmpwalk -Os -c $rw"; $syst=`$snmpg $ip system sysName sysObjectID`; sysDescr.0 = STRING: Cisco Internetwork Operating System Software... (1 Reply)
Discussion started by: popeye
1 Replies
Login or Register to Ask a Question