diagonal matrix to square matrix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting diagonal matrix to square matrix
# 8  
Old 09-11-2009
Thanks you so much! Sometime I found the ID were printed right before the 1.000. Do you have any clue about that? Thanks a lot again!
# 9  
Old 09-11-2009
Could you give a little more detail on how you intend to use the data once you've chosen the ID? I'd like to understand why you even need the square matrix.
# 10  
Old 09-11-2009
Thanks a lot!

Thanks a lot, Tyler! It works out perfectly with small matrix.

Yes, the memory is an issue for me with the 25000x25000 matrix. I will try to use other pc with more RAM. Thank you very much again!

---------- Post updated at 09:08 PM ---------- Previous update was at 08:58 PM ----------

The matrix is the correlation coefficiencies of the expression level of ~25000 genes of the genome. Some genes express similarly that can be indicated by high correlation coefficiency, but most of them not. The ID is used to track the gene name of the genome, and to find the pattern of expression.
Say, if I want see which genes are expressing similarly with 244901_AT, GREP it would give the single row of the correlation coefficiencies. Theoretically, the half matrix contains all the information of the square one, but it is hard for me to retrieve any specific gene(s) of my interest. By the way, I am a geneticist and just started trying programming.
# 11  
Old 09-12-2009
I can't imagine that it would be an easy thing to just grep some gene ID and look at a line with 25000 values. But it seems like that's what you want to see. ???

How high of a correlation coefficient is enough to be considered as expressing similarly? And would it be fair to say that, given a specific gene name, you just want to know which other genes, if any, express similarly? Or, given a gene name and a specified coefficient, you want to know which other genes have a coefficent greater than or equal to the specified value?
# 12  
Old 09-12-2009
Quote:
Originally Posted by yifangt
...
Yes, the memory is an issue for me with the 25000x25000 matrix. I will try to use other pc with more RAM.
...
A little bit of calculation is in order. Your actual text file is probably a huge one.

Code:
$ cat diagmtx.txt
"244901_AT" 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
"243903_AT" 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
"244501_AT" 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
"254902_AT" 1.000 0.587 0.159 0.357 0.258 0.654 0.341
"247906_AT" 1.000 0.269 0.369 0.687 0.145 0.125
"242901_AT" 1.000 0.222 0.451 0.134 0.333
"243906_AT" 1.000 0.112 0.217 0.095
"244908_AT" 1.000 0.508 0.701
"294902_AT" 1.000 0.663
"245902_AT" 1.000
$

There's an ID of 11 characters, followed by 25000 tokens, each of 6 characters in line 1, 24999 in line 2, and so on. So we are looking at at a file of size 1.88 GB approximately.

Code:
11*25000 + (6*25000 + 6*24999 + 6*24998 + ... + 6*2 + 6*1)
= 1.88 GB

The perl scripts are going to use approximately this + some more memory for temporary operations. And once done, your final file would be about twice that size ~ 3.75 GB.
That's a huge size for a text file and it's going to be a challenge to process such a file.

If you are trying this out on your home desktop/laptop, then I'd imagine you need one of those computers with 3 - 4 GB RAM (Gamer's Laptops ? - the ones that are optimized for computer games). Otherwise you may want to try it out in a server at your work place.

Finally, you may want to consider not storing the other half of your matrix. A simple script can generate the entire line of the square matrix, given a line number.

Code:
$
$ # display diagmtx.txt with the line numbers
$ cat -n diagmtx.txt
     1    "244901_AT" 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
     2    "243903_AT" 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
     3    "244501_AT" 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
     4    "254902_AT" 1.000 0.587 0.159 0.357 0.258 0.654 0.341
     5    "247906_AT" 1.000 0.269 0.369 0.687 0.145 0.125
     6    "242901_AT" 1.000 0.222 0.451 0.134 0.333
     7    "243906_AT" 1.000 0.112 0.217 0.095
     8    "244908_AT" 1.000 0.508 0.701
     9    "294902_AT" 1.000 0.663
    10    "245902_AT" 1.000
$
$ # now show me what the line no. 8 of the square matrix looks like
$
$ ##
$ perl -lne 'BEGIN {$n=8}
>           if ($. < $n) {@x=split; $s .= $x[$n-$.+1]." "}
>           elsif($. == $n) {($id, $therest)=unpack("A11xA*");
>                            print $id," ",$s,$therest}' diagmtx.txt
"244908_AT" 0.412 0.190 0.117 0.258 0.687 0.451 0.112 1.000 0.508 0.701
$
$

HTH
tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Matrix multiplication

I have two files. Row id in File1 matches the column id in file2 (starting from column7 )except the last 2 characters. File1 has 50 rows and File 2 has 56 columns. If the id matches I want to multiply the value in column3 of File1 to the entire column in File2. and in the final output print only... (11 Replies)
Discussion started by: Akang
11 Replies

2. Shell Programming and Scripting

MATRIX to CSV

Hello friends, A big question for the UNIX INTELLIGENCE I have a CSV file as follows: VALUE,USER1,relatedUSER1,relatedUSER2 -1,userA,userB,userC 1,userN,userD,userB 0,userF,userH,userG 0,userT,userH,userB 1,userN,userB,userA -1,userA,userF,userC 0,userF,userH,userG... (15 Replies)
Discussion started by: kraterions
15 Replies

3. Shell Programming and Scripting

Maybe by AWK: printing help diagonal matrix characters into line

Hi Experts, I want to print this charts diagonal data into straight lines. This is a matrix 24X24 Horizontal and vertical. - I want to print all the diagonal cutting characters into straight line: Data: E F S S A H A L L A T M C N O T S O B O D U Q H I W I B N L O C N I L N L A N S I N... (9 Replies)
Discussion started by: rveri
9 Replies

4. Shell Programming and Scripting

Square matrix to columns

Hello all, I am quite new in this but I need some help to keep going with my analysis. I am struggling with a short script to read a square matrix and convert it in two collumns. A B C D A 0.00 0.06 0.51 0.03 B 0.06 0.00 0.72 0.48 C 0.51 0.72 0.00 ... (7 Replies)
Discussion started by: EvaAM
7 Replies

5. Shell Programming and Scripting

Table to Matrix

Hi, I have a table in the format: 1 0 -1 1 0 2 0 1 -1 0 0 0 3 0 1 1 0 0 0 0 0 0 etc. I am trying to input this to a program, however it is complaining about the fact that it is not in matrix format. How do I add 0's to end of the rows to make them even? Thanks in advance! (2 Replies)
Discussion started by: Rhavin
2 Replies

6. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies

7. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

8. Shell Programming and Scripting

Matrix

Hi All I would like to merge multiple files with the same row and column size into a matrix format In a folder I have multiple files in the following format vi 12.txt a 1 b 5 c 7 d 0 vi 45.txt a 3 b 6 c 9 d 2 vi 9.txt a 4 (7 Replies)
Discussion started by: Lucky Ali
7 Replies

9. Programming

matrix pointer

Can anyone tell me what the following statements do? float (*tab); tab=(float (*)) calloc(MAXCLASS, (MAXCLASS+1)*sizeof(float)); (3 Replies)
Discussion started by: littleboyblu
3 Replies
Login or Register to Ask a Question