Matrix parsing help !


 
Thread Tools Search this Thread
Top Forums Programming Matrix parsing help !
# 1  
Old 01-03-2012
Matrix parsing help !

Hello every body ! I'm a new in this forum and beginner in Perl scripting and I have some problems SmilieSmilieSmilie! I have a big file like that :
Code:
ID1                   ID2                       Identity 
chromosome07_194379   chromosome01_168057       0.975
chromosome01_100293   chromosome01_168057       0.969
chromosome01_100293   chromosome07_194379       0.969
chromosome01_29385    chromosome01_168057       0.856
chromosome01_29385    chromosome07_194379       0.856
chromosome01_29385    chromosome01_100293       0.861
chromosome08_116839   chromosome01_168057       0.78
chromosome08_116839   chromosome01_100293       0.786
chromosome08_116839   chromosome01_293853       0.946

The three column are separated by tabulation (\t)

I want to cluster the IDs that share a identity more than 0.8 using Perl scripting, can someone help me ?
Thanks a lot in advance for your help

Moderator's Comments:
Mod Comment Please use code tags!

Last edited by zaxxon; 01-03-2012 at 06:53 AM.. Reason: code tags
# 2  
Old 01-03-2012
What does "cluster" stand for? sort? print only those? ?? Maybe post an example of the expected output using code tags.
This User Gave Thanks to zaxxon For This Post:
# 3  
Old 01-03-2012
Something like :

Code:
nawk 'NR==1||$3>0.8' yourfile

?

If not, please give more clue as requested by Zaxxon.
This User Gave Thanks to ctsgnb For This Post:
# 4  
Old 01-03-2012
hi ctsgnb and zaxxon thanks a lot for replying. what i need is group the id basing in the identity sequence for exemple :
Code:
ID1 ID2  Identity
A    B      70
A    C      50
A    D      90
A    E      80
B    C      95
B    D      66
B    E      47
C    D      35
C    E      25
D    E      98

The output will be like that :
Code:
A D E
B C

Note : It means that the sequence A D and E are together because they share more than 80 of identity . In the same way B and C are closed because of their identity.
Sorry for my bad english ! Smilie

Last edited by radoulov; 01-04-2012 at 05:17 AM.. Reason: Code tags!
# 5  
Old 01-03-2012
If the line order of the output doesn't matter you can give a try with something like:

(you may want to change your 0.8 to 80 depending on the format of your input file)

Code:
awk 'NR>1&&$3>=0.8{A[$1]=(A[$1]?A[$1]:$1)" "$2}END{for(i in A) print A[i]}' yourfile

Code:
$ cat f2
ID1 ID2 Identity
A B 70
A C 50
A D 90
A E 80
B C 95
B D 66
B E 47
C D 35
C E 25
D E 98
$ awk 'NR>1&&$3>=80{A[$1]=(A[$1]?A[$1]:$1)" "$2}END{for(i in A) print A[i]}' f2
A D E
B C
D E
$


Last edited by ctsgnb; 01-03-2012 at 10:11 AM..
# 6  
Old 01-03-2012
Thanks ctsgnb I try your code and the output is like that :
Code:
A C D E
B C D E
C D E
D E


Last edited by radoulov; 01-04-2012 at 05:17 AM.. Reason: Code tags!
# 7  
Old 01-03-2012
I updated my previous post : you should adapt the threshold from 0.8 to 80 and give another try
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing a subset of data from a large matrix

I do have a large matrix of the following format and it is tab delimited ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78 ch-ab1-20 0 2 3 4 5 6 ch-bb2-23 3 0 5 ... (6 Replies)
Discussion started by: Kanja
6 Replies

2. Shell Programming and Scripting

Highest value matrix parsing

Hi All I do have a matrix in the following format a_2 a_3 s_4 t_6 b 0 0.9 0.004 0 c 0 0 1 0 d 0 0.98 0 0 e 0.0023 0.96 0 0.0034 I have thousands of rows I would like to parse the maximum value in each of the row and out put that highest value along the column header of... (2 Replies)
Discussion started by: Kanja
2 Replies

3. Shell Programming and Scripting

Constructing a Matrix

Hi, I do have couple of files in a folder. The names of each of the files have a pattern. ahet_005678.txt ahet_005898.txt ahet_007678.txt ahet_004778.txt ... ... ahet_002378.txt Each of the above files have the same pattern of data with 4 columns and have an header for the last 3... (4 Replies)
Discussion started by: Kanja
4 Replies

4. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies

5. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

6. Shell Programming and Scripting

Matrix

Hi All I would like to merge multiple files with the same row and column size into a matrix format In a folder I have multiple files in the following format vi 12.txt a 1 b 5 c 7 d 0 vi 45.txt a 3 b 6 c 9 d 2 vi 9.txt a 4 (7 Replies)
Discussion started by: Lucky Ali
7 Replies

7. Shell Programming and Scripting

diagonal matrix to square matrix

Hello, all! I am struggling with a short script to read a diagonal matrix for later retrieval. 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125 1.000 0.587 0.159 0.357... (11 Replies)
Discussion started by: yifangt
11 Replies

8. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

9. Shell Programming and Scripting

Perl parsing compared to Ksh parsing

#! /usr/local/bin/perl -w $ip = "$ARGV"; $rw = "$ARGV"; $snmpg = "/usr/local/bin/snmpbulkget -v2c -Cn1 -Cn2 -Os -c $rw"; $snmpw = "/usr/local/bin/snmpwalk -Os -c $rw"; $syst=`$snmpg $ip system sysName sysObjectID`; sysDescr.0 = STRING: Cisco Internetwork Operating System Software... (1 Reply)
Discussion started by: popeye
1 Replies
Login or Register to Ask a Question