diagonal matrix to square matrix

09-08-2009

Registered User

564, 13

Join Date: Sep 2009

Last Activity: 26 May 2021, 8:59 AM EDT

Location: Saskatchewan, Canada

Posts: 564

Thanks Given: 376

Thanked 13 Times in 12 Posts

diagonal matrix to square matrix

Hello, all!

I am struggling with a short script to read a diagonal matrix for later retrieval.
1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
1.000 0.587 0.159 0.357 0.258 0.654 0.341
1.000 0.269 0.369 0.687 0.145 0.125
1.000 0.222 0.451 0.134 0.333
1.000 0.112 0.217 0.095
1.000 0.508 0.701
1.000 0.663
1.000

Actually this matrix is the correlation co-efficiency of the gene expression by microarray, so that half matrix contains the same information of the square matrix.

First, the matrix should be aligned with all the 1.000 at the diagonal, i.e.

1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.050
1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
1.000 0.587 0.159 0.357 0.258 0.654 0.341
1.000 0.269 0.369 0.687 0.145 0.125
1.000 0.222 0.451 0.134 0.333
1.000 0.112 0.217 0.095
1.000 0.508 0.701
1.000 0.663
1.000

as each gene has 1.000 correlation coefficiency with itself.
Then, I want to get a square matrix to fill the missing half by Matrix[i][j]=Matrix[j][i] e.g. Matrix[2][1]= matrix[1][2] etc. I only posted 10 out of 25,000 genes. The real file is a 25,000x25,000 square matrix.
With the sqaure matrix I can easily access any row or column for the co-efficiencies of each individual gene with the others of the genome.
Thanks a lot!

Yifang

Last edited by yifangt; 09-08-2009 at 10:34 PM..

yifangt

View Public Profile for yifangt

Find all posts by yifangt

09-09-2009

Registered User

151, 2

Join Date: Jul 2008

Last Activity: 13 May 2014, 6:14 PM EDT

Location: Texas

Posts: 151

Thanks Given: 1

Thanked 2 Times in 2 Posts

Technically, unless I've forgotten my definitions, that is not actually a diagonal matrix. Please share your script and I'm sure someone can easily point out where you went wrong.

Vi-Curious

View Public Profile for Vi-Curious

Find all posts by Vi-Curious

09-09-2009

Registered User

5, 0

Join Date: Sep 2009

Last Activity: 25 September 2009, 3:57 AM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Something like this would probably be better done with perl, as you can put the whole matrix in memory.

If you still want to use a shell script, here is one. By the way, I think you meant a symmetric matrix, and not a diagonal matrix.

#!/bin/bash

PATH=/usr/bin:/bin
export PATH

# length of each number in the matrix
rlen=5

# Right-justify the matrix
awk '{ if (ne == "") { ne = NF; } indent = (NR - 1) * (rlen + 1); printf("%" indent "s", ""); print }' rlen="$rlen" > temp.$$

# Fill in the missing parts
cat -n temp.$$ | while read line; do
set -- $line

# Get the row number
n="$1"

# Discard the row number and the 1.000 value
shift 2

# Calculate the start and end positions of the column
# If you're using Bourne shell, you'll have to use expr or similiar.
s=$(( ( $n -1 ) * ( $rlen + 1 ) + 1 ))
e=$(( $s + $rlen - 1 ))

# Get the values of the column for the preceding rows in the matrix
head -$n temp.$$ | cut -c$s-$e | tr '\n' ' '

# Output the rest of the row from the input
echo $*
done

# Clean up
rm -f temp.$$

rwu

View Public Profile for rwu

Find all posts by rwu

09-09-2009

Registered User

2,100, 402

Join Date: Apr 2009

Last Activity: 11 February 2020, 10:24 AM EST

Posts: 2,100

Thanks Given: 26

Thanked 402 Times in 360 Posts

Here's one way to do it in Perl:

Code:

$
$ # display the diagonal matrix
$ cat diagmtx.txt
1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
1.000 0.587 0.159 0.357 0.258 0.654 0.341
1.000 0.269 0.369 0.687 0.145 0.125
1.000 0.222 0.451 0.134 0.333
1.000 0.112 0.217 0.095
1.000 0.508 0.701
1.000 0.663
1.000
$
$ # display the contents of the perl program
$ cat convert.pl
#!/usr/bin/perl -w
@mtx = ();
$file=$ARGV[0];
open (F, $file) or die "Can't open $file: $!";
while (<F>){
  chomp;
  if ($. == 1){
    @x = split/ /;
    push @mtx, [ @x ];
    $num = $#x;
  } else {
    foreach $j (0..$.-2) {
      push @y, $mtx[$j][$.-1];
    }
    @x = split/ /;
    push @mtx, [ @y, @x ];
  }
  @y = ();
}
close(F) or die "Can't close $file: $!";
#
for($row = 0; $row <= $num; $row++) {
  for($col = 0; $col <= $num; $col++) {
    printf("%5.3f ",$mtx[$row][$col]);
  }
  print "\n";
}
$
$ # run the perl program
$ perl convert.pl diagmtx.txt
1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
0.234 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
0.435 0.111 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
0.123 0.412 0.205 1.000 0.587 0.159 0.357 0.258 0.654 0.341
0.012 0.115 0.542 0.587 1.000 0.269 0.369 0.687 0.145 0.125
0.102 0.058 0.335 0.159 0.269 1.000 0.222 0.451 0.134 0.333
0.325 0.091 0.054 0.357 0.369 0.222 1.000 0.112 0.217 0.095
0.412 0.190 0.117 0.258 0.687 0.451 0.112 1.000 0.508 0.701
0.087 0.045 0.203 0.654 0.145 0.134 0.217 0.508 1.000 0.663
0.098 0.058 0.125 0.341 0.125 0.333 0.095 0.701 0.663 1.000
$
$

Given below is a perl program to generate a diagonal matrix of random numbers:

Code:

$
$
$ # display the program to generate diagonal matrix
$ cat gendiagmtx.pl
#!/usr/bin/perl -w
# Short script to generate a "diagonal matrix".
# Usage: perl gendiagmtx.pl N
# where integer N > 1
# An example: for N = 5, the output is as follows:
# 1.000 x01 x02 x03 x04
# 1.000 x11 x12 x13
# 1.000 x21 x22
# 1.000 x31
# 1.000
# where xMN = some random decimal number.
#
$num = $ARGV[0];
$iter = $num;
foreach (0..$num-1){
  printf("%5.3f ",1);
  foreach (0..$iter-2){
    printf("%5.3f ",rand(1));
  }
  print "\n";
  $iter--;
}
$
$ # generate a diagonal matrix of order 15
$ perl gendiagmtx.pl 15 > diagmtx_15.txt
$ cat diagmtx_15.txt
1.000 0.364 0.566 0.624 0.643 0.340 0.399 0.074 0.560 0.140 0.056 0.393 0.281 0.374 0.300
1.000 0.656 0.449 0.795 0.504 0.688 0.025 0.934 0.126 0.863 0.320 0.754 0.728 0.237
1.000 0.328 0.132 0.362 0.867 0.290 0.129 0.881 0.595 0.714 0.274 0.072 0.921
1.000 0.748 0.031 0.858 0.991 0.774 0.378 0.028 0.546 0.817 0.990 0.810
1.000 0.887 0.331 0.841 0.626 0.830 0.523 0.555 0.727 0.023 0.713
1.000 0.514 0.327 0.304 0.764 0.192 0.805 0.386 0.794 0.494
1.000 0.615 0.712 0.942 0.013 0.296 0.844 0.701 0.973
1.000 0.488 0.004 0.064 0.690 0.876 0.151 0.872
1.000 0.403 0.305 0.048 0.997 0.181 0.113
1.000 0.658 0.419 0.032 0.445 0.902
1.000 0.521 0.492 0.506 0.331
1.000 0.641 0.599 0.546
1.000 0.928 0.820
1.000 0.802
1.000
$
$ # test the "convert.pl" program
$ perl convert.pl diagmtx_15.txt
1.000 0.364 0.566 0.624 0.643 0.340 0.399 0.074 0.560 0.140 0.056 0.393 0.281 0.374 0.300
0.364 1.000 0.656 0.449 0.795 0.504 0.688 0.025 0.934 0.126 0.863 0.320 0.754 0.728 0.237
0.566 0.656 1.000 0.328 0.132 0.362 0.867 0.290 0.129 0.881 0.595 0.714 0.274 0.072 0.921
0.624 0.449 0.328 1.000 0.748 0.031 0.858 0.991 0.774 0.378 0.028 0.546 0.817 0.990 0.810
0.643 0.795 0.132 0.748 1.000 0.887 0.331 0.841 0.626 0.830 0.523 0.555 0.727 0.023 0.713
0.340 0.504 0.362 0.031 0.887 1.000 0.514 0.327 0.304 0.764 0.192 0.805 0.386 0.794 0.494
0.399 0.688 0.867 0.858 0.331 0.514 1.000 0.615 0.712 0.942 0.013 0.296 0.844 0.701 0.973
0.074 0.025 0.290 0.991 0.841 0.327 0.615 1.000 0.488 0.004 0.064 0.690 0.876 0.151 0.872
0.560 0.934 0.129 0.774 0.626 0.304 0.712 0.488 1.000 0.403 0.305 0.048 0.997 0.181 0.113
0.140 0.126 0.881 0.378 0.830 0.764 0.942 0.004 0.403 1.000 0.658 0.419 0.032 0.445 0.902
0.056 0.863 0.595 0.028 0.523 0.192 0.013 0.064 0.305 0.658 1.000 0.521 0.492 0.506 0.331
0.393 0.320 0.714 0.546 0.555 0.805 0.296 0.690 0.048 0.419 0.521 1.000 0.641 0.599 0.546
0.281 0.754 0.274 0.817 0.727 0.386 0.844 0.876 0.997 0.032 0.492 0.641 1.000 0.928 0.820
0.374 0.728 0.072 0.990 0.023 0.794 0.701 0.151 0.181 0.445 0.506 0.599 0.928 1.000 0.802
0.300 0.237 0.921 0.810 0.713 0.494 0.973 0.872 0.113 0.902 0.331 0.546 0.820 0.802 1.000
$
$

HTH,
tyler_durden

durden_tyler

View Public Profile for durden_tyler

Find all posts by durden_tyler

09-09-2009

Registered User

564, 13

Join Date: Sep 2009

Last Activity: 26 May 2021, 8:59 AM EDT

Location: Saskatchewan, Canada

Posts: 564

Thanks Given: 376

Thanked 13 Times in 12 Posts

Yes, you are right. The original matrix is NOT a diagonal one. It is upper half symmetric with 1.000 in all the "diagonal" positions.

---------- Post updated at 01:32 PM ---------- Previous update was at 01:30 PM ----------

Thanks, I need to digest your script first. I am just a newbe in shell script and PERL programming.

---------- Post updated at 01:50 PM ---------- Previous update was at 01:32 PM ----------

That's a great solution! Thanks you Tyler!

When I tried to convert my 25000x25000 matrix, I got the "Out of memory!" message and the program stopped. Another problem I noticed is, after I checked the original data, there is ID for each row, i.e.:
"244901_AT" 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
"243903_AT" 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
"244501_AT" 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
"254902_AT" 1.000 0.587 0.159 0.357 0.258 0.654 0.341
"247906_AT" 1.000 0.269 0.369 0.687 0.145 0.125
"242901_AT" 1.000 0.222 0.451 0.134 0.333
"243906_AT" 1.000 0.112 0.217 0.095
"244908_AT" 1.000 0.508 0.701
"294902_AT" 1.000 0.663
"245902_AT" 1.000

and the output square matrix should be like this:
"244901_AT" 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
"243903_AT" 0.234 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
"244501_AT" 0.435 0.111 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
"254902_AT" 0.123 0.412 0.205 1.000 0.587 0.159 0.357 0.258 0.654 0.341
"247906_AT" 0.012 0.115 0.542 0.587 1.000 0.269 0.369 0.687 0.145 0.125
"242901_AT" 0.102 0.058 0.335 0.159 0.269 1.000 0.222 0.451 0.134 0.333
"243906_AT" 0.325 0.091 0.054 0.357 0.369 0.222 1.000 0.112 0.217 0.095
"244908_AT" 0.412 0.190 0.117 0.258 0.687 0.451 0.112 1.000 0.508 0.701
"294902_AT" 0.087 0.045 0.203 0.654 0.145 0.134 0.217 0.508 1.000 0.663
"245902_AT" 0.098 0.058 0.125 0.341 0.125 0.333 0.095 0.701 0.663 1.000

Then I can retrieve each gene by grep the ID of the first column of each row. I should have posted this information first. Sorry about this. Thanks again Tyler!

yifangt

View Public Profile for yifangt

Find all posts by yifangt

09-10-2009

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

another perl for your reference:

Code:

open FH,"<a.txt";
while(<FH>){
  my @tmp = split;
  $hash{$.}=[@tmp];
  if ($.==1){
   print;
  }
  else{
    for(my $i=1;$i<=$.-1;$i++){
       print $hash{$i}->[$.-$i]," ";
    }
    print $_;
  }
}

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

09-11-2009

Registered User

2,100, 402

Join Date: Apr 2009

Last Activity: 11 February 2020, 10:24 AM EST

Posts: 2,100

Thanks Given: 26

Thanked 402 Times in 360 Posts

Here's a Perl solution for the type of data you posted:

Code:

$
$ cat diagmtx.txt
"244901_AT" 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
"243903_AT" 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
"244501_AT" 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
"254902_AT" 1.000 0.587 0.159 0.357 0.258 0.654 0.341
"247906_AT" 1.000 0.269 0.369 0.687 0.145 0.125
"242901_AT" 1.000 0.222 0.451 0.134 0.333
"243906_AT" 1.000 0.112 0.217 0.095
"244908_AT" 1.000 0.508 0.701
"294902_AT" 1.000 0.663
"245902_AT" 1.000
$
$ # show the Perl program
$ cat convert_2.pl
#!/usr/bin/perl -w
@mtx = ();
$file = $ARGV[0];
open (F,$file) or die "Can't open $file: $!";
while(<F>){
  @x = split;
  $id = shift @x;
  $therest = "@x\n";
  shift @x;
  push @mtx, [@x];
  if ($.==1) {
   print;
  } else {
    print $id," ";
    for($i=0;$i<=$.-2;$i++) {
      print $mtx[$i][0]," ";
      shift @{$mtx[$i]};
    }
    print $therest;
  }
}
close (F) or die "Can't close $file: $!";
$
$ perl convert_2.pl diagmtx.txt
"244901_AT" 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098
"243903_AT" 0.234 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058
"244501_AT" 0.435 0.111 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125
"254902_AT" 0.123 0.412 0.205 1.000 0.587 0.159 0.357 0.258 0.654 0.341
"247906_AT" 0.012 0.115 0.542 0.587 1.000 0.269 0.369 0.687 0.145 0.125
"242901_AT" 0.102 0.058 0.335 0.159 0.269 1.000 0.222 0.451 0.134 0.333
"243906_AT" 0.325 0.091 0.054 0.357 0.369 0.222 1.000 0.112 0.217 0.095
"244908_AT" 0.412 0.190 0.117 0.258 0.687 0.451 0.112 1.000 0.508 0.701
"294902_AT" 0.087 0.045 0.203 0.654 0.145 0.134 0.217 0.508 1.000 0.663
"245902_AT" 0.098 0.058 0.125 0.341 0.125 0.333 0.095 0.701 0.663 1.000
$
$

I think summer_cherry's program is a much more optimized version. It -
(i) does not store the entire N X N square matrix in any data structure.
(ii) stores only the information present in the file, since that is sufficient to generate the other half of the square matrix.
(iii) uses hashes for fast access.

My second version uses a multi-dimensional array to store only the necessary information i.e. everything after the "id" and "1.000" per element. It also keeps on chopping the array element right after it is printed, by using the "shift" operator. So while the run time would be higher for this, the memory consumption should be lesser.

HTH,
tyler_durden

This User Gave Thanks to durden_tyler For This Post:

durden_tyler

View Public Profile for durden_tyler

Find all posts by durden_tyler

Shell Programming and Scripting

diagonal matrix to square matrix

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Matrix multiplication

Discussion started by: Akang

2. Shell Programming and Scripting

MATRIX to CSV

Discussion started by: kraterions

3. Shell Programming and Scripting

Maybe by AWK: printing help diagonal matrix characters into line

Discussion started by: rveri

4. Shell Programming and Scripting

Square matrix to columns

Discussion started by: EvaAM

5. Shell Programming and Scripting

Table to Matrix

Discussion started by: Rhavin

6. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Discussion started by: stonemonkey

7. Ubuntu

How to convert full data matrix to linearised left data matrix?

Discussion started by: evoll

8. Shell Programming and Scripting

Matrix

Discussion started by: Lucky Ali

9. Programming

matrix pointer

Discussion started by: littleboyblu