## diagonal matrix to square matrix

diagonal matrix to square matrix
# 1
09-08-2009
diagonal matrix to square matrix

Hello, all!

I am struggling with a short script to read a diagonal matrix for later retrieval.
1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
1.000 0.587 0.159 0.357 0.258 0.654 0.341
1.000 0.269 0.369 0.687 0.145 0.125
1.000 0.222 0.451 0.134 0.333
1.000 0.112 0.217 0.095
1.000 0.508 0.701
1.000 0.663
1.000

Actually this matrix is the correlation co-efficiency of the gene expression by microarray, so that half matrix contains the same information of the square matrix.

First, the matrix should be aligned with all the 1.000 at the diagonal, i.e.
1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.050
1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
1.000 0.587 0.159 0.357 0.258 0.654 0.341
1.000 0.269 0.369 0.687 0.145 0.125
1.000 0.222 0.451 0.134 0.333
1.000 0.112 0.217 0.095
1.000 0.508 0.701
1.000 0.663
1.000
as each gene has 1.000 correlation coefficiency with itself.
Then, I want to get a square matrix to fill the missing half by Matrix[i][j]=Matrix[j][i] e.g. Matrix[2][1]= matrix[1][2] etc. I only posted 10 out of 25,000 genes. The real file is a 25,000x25,000 square matrix.
With the sqaure matrix I can easily access any row or column for the co-efficiencies of each individual gene with the others of the genome.
Thanks a lot!

Yifang

Last edited by yifangt; 09-08-2009 at 10:34 PM..
 yifangt View Public Profile for yifangt Find all posts by yifangt
# 2
09-09-2009
Technically, unless I've forgotten my definitions, that is not actually a diagonal matrix. Please share your script and I'm sure someone can easily point out where you went wrong.
 Vi-Curious View Public Profile for Vi-Curious Find all posts by Vi-Curious
# 3
09-09-2009
Something like this would probably be better done with perl, as you can put the whole matrix in memory.

If you still want to use a shell script, here is one. By the way, I think you meant a symmetric matrix, and not a diagonal matrix.

#!/bin/bash

PATH=/usr/bin:/bin
export PATH

# length of each number in the matrix
rlen=5

# Right-justify the matrix
awk '{ if (ne == "") { ne = NF; } indent = (NR - 1) * (rlen + 1); printf("%" indent "s", ""); print }' rlen="\$rlen" > temp.\$\$

# Fill in the missing parts
cat -n temp.\$\$ | while read line; do
set -- \$line

# Get the row number
n="\$1"

# Discard the row number and the 1.000 value
shift 2

# Calculate the start and end positions of the column
# If you're using Bourne shell, you'll have to use expr or similiar.
s=\$(( ( \$n -1 ) * ( \$rlen + 1 ) + 1 ))
e=\$(( \$s + \$rlen - 1 ))

# Get the values of the column for the preceding rows in the matrix
head -\$n temp.\$\$ | cut -c\$s-\$e | tr '\n' ' '

# Output the rest of the row from the input
echo \$*
done

# Clean up
rm -f temp.\$\$
 rwu View Public Profile for rwu Find all posts by rwu
# 4
09-09-2009
Here's one way to do it in Perl:

Given below is a perl program to generate a diagonal matrix of random numbers:

HTH,
tyler_durden
 durden_tyler View Public Profile for durden_tyler Find all posts by durden_tyler
# 5
09-09-2009
Yes, you are right. The original matrix is NOT a diagonal one. It is upper half symmetric with 1.000 in all the "diagonal" positions.

---------- Post updated at 01:32 PM ---------- Previous update was at 01:30 PM ----------

Thanks, I need to digest your script first. I am just a newbe in shell script and PERL programming.

---------- Post updated at 01:50 PM ---------- Previous update was at 01:32 PM ----------

That's a great solution! Thanks you Tyler!

When I tried to convert my 25000x25000 matrix, I got the "Out of memory!" message and the program stopped. Another problem I noticed is, after I checked the original data, there is ID for each row, i.e.:
"244901_AT" 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
"243903_AT" 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
"244501_AT" 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
"254902_AT" 1.000 0.587 0.159 0.357 0.258 0.654 0.341
"247906_AT" 1.000 0.269 0.369 0.687 0.145 0.125
"242901_AT" 1.000 0.222 0.451 0.134 0.333
"243906_AT" 1.000 0.112 0.217 0.095
"244908_AT" 1.000 0.508 0.701
"294902_AT" 1.000 0.663
"245902_AT" 1.000

and the output square matrix should be like this:
"244901_AT" 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
"243903_AT" 0.234 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
"244501_AT" 0.435 0.111 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
"254902_AT" 0.123 0.412 0.205 1.000 0.587 0.159 0.357 0.258 0.654 0.341
"247906_AT" 0.012 0.115 0.542 0.587 1.000 0.269 0.369 0.687 0.145 0.125
"242901_AT" 0.102 0.058 0.335 0.159 0.269 1.000 0.222 0.451 0.134 0.333
"243906_AT" 0.325 0.091 0.054 0.357 0.369 0.222 1.000 0.112 0.217 0.095
"244908_AT" 0.412 0.190 0.117 0.258 0.687 0.451 0.112 1.000 0.508 0.701
"294902_AT" 0.087 0.045 0.203 0.654 0.145 0.134 0.217 0.508 1.000 0.663
"245902_AT" 0.098 0.058 0.125 0.341 0.125 0.333 0.095 0.701 0.663 1.000

Then I can retrieve each gene by grep the ID of the first column of each row. I should have posted this information first. Sorry about this. Thanks again Tyler!
 yifangt View Public Profile for yifangt Find all posts by yifangt
# 6
09-10-2009

 summer_cherry View Public Profile for summer_cherry Find all posts by summer_cherry
# 7
09-11-2009
Here's a Perl solution for the type of data you posted:

I think summer_cherry's program is a much more optimized version. It -
(i) does not store the entire N X N square matrix in any data structure.
(ii) stores only the information present in the file, since that is sufficient to generate the other half of the square matrix.
(iii) uses hashes for fast access.

My second version uses a multi-dimensional array to store only the necessary information i.e. everything after the "id" and "1.000" per element. It also keeps on chopping the array element right after it is printed, by using the "shift" operator. So while the run time would be higher for this, the memory consumption should be lesser.

HTH,
tyler_durden
This User Gave Thanks to durden_tyler For This Post:
 durden_tyler View Public Profile for durden_tyler Find all posts by durden_tyler

## Matrix multiplication

I have two files. Row id in File1 matches the column id in file2 (starting from column7 )except the last 2 characters. File1 has 50 rows and File 2 has 56 columns. If the id matches I want to multiply the value in column3 of File1 to the entire column in File2. and in the final output print only...

## MATRIX to CSV

Hello friends, A big question for the UNIX INTELLIGENCE I have a CSV file as follows: VALUE,USER1,relatedUSER1,relatedUSER2 -1,userA,userB,userC 1,userN,userD,userB 0,userF,userH,userG 0,userT,userH,userB 1,userN,userB,userA -1,userA,userF,userC 0,userF,userH,userG...

## Maybe by AWK: printing help diagonal matrix characters into line

Hi Experts, I want to print this charts diagonal data into straight lines. This is a matrix 24X24 Horizontal and vertical. - I want to print all the diagonal cutting characters into straight line: Data: E F S S A H A L L A T M C N O T S O B O D U Q H I W I B N L O C N I L N L A N S I N...

## Square matrix to columns

Hello all, I am quite new in this but I need some help to keep going with my analysis. I am struggling with a short script to read a square matrix and convert it in two collumns. A B C D A 0.00 0.06 0.51 0.03 B 0.06 0.00 0.72 0.48 C 0.51 0.72 0.00 ...

## Table to Matrix

Hi, I have a table in the format: 1 0 -1 1 0 2 0 1 -1 0 0 0 3 0 1 1 0 0 0 0 0 0 etc. I am trying to input this to a program, however it is complaining about the fact that it is not in matrix format. How do I add 0's to end of the rows to make them even? Thanks in advance!

## awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row...

## How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ...

## Matrix

Hi All I would like to merge multiple files with the same row and column size into a matrix format In a folder I have multiple files in the following format vi 12.txt a 1 b 5 c 7 d 0 vi 45.txt a 3 b 6 c 9 d 2 vi 9.txt a 4

## matrix pointer

Can anyone tell me what the following statements do? float (*tab); tab=(float (*)) calloc(MAXCLASS, (MAXCLASS+1)*sizeof(float));