Randomization a matrix - perl / Shell


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Randomization a matrix - perl / Shell
# 1  
Old 12-05-2012
Randomization a matrix - perl / Shell

Hello all,

I have a tricky question! (at least for me it is!). I'll try to explain it carefully here. Hope you can help me solving the whole or even parts of it! Here it is:

I have a big input 0\1 table as a very simplified one is shown below:
(The last row and column are the sum and presented just for better understanding. They do not exist in actual input data.)

table1(input):
* C1 C2 C3 c4 c5
V1 1 1 1 1 0 =4
V2 1 1 1 0 1 =4
V3 1 1 1 1 1 =5
V4 0 1 0 0 1 =2

=3 =4 =3 =2 =3 =15


As you can see the number of '1's are varied among the rows and columns.

I used the following script to calculate the co-presence of the variables in above table (Thanks to elixir_sinari).

Code:
awk 'NR>1{name[NR-1]=$1;for(i=2;i<=NF;i++) if($i==1) { oneset[NR-1,i]=1;count[NR-1]++; q++} val[NR-1]=q; q=0}
END{
for(i=1;i<=(NR-1);i++)
{
 if(i==1)
 {
  print "*"
  for(j=1;j<=(NR-1);j++)
   print name[j]
  printf "\n"
 }
 print name[i]"("val[i]")"
 for(j=1;j<=(NR-1);j++)
 {
  n=0
  for(k=2;k<=NF;k++)
   if(oneset[i,k] && oneset[j,k])
    n++
  print (count[i]==0)?"NA":(n/count[i])
 }
 printf "\n"
}
}' ORS='\t'  OFMT='%.2f' input

And here is what it gives for the mentioned input file:

table2(reference table):
* V1 V2 V3 V4
V1(4) 1 0.75 1 0.25
V2(4) 0.75 1 1 0.50
V3(5) 0.80 0.80 1 0.40
V4(2) 0.50 1 1 1


I am wondering if some high co-presence values happened by chance or not. In order to answer this question I am interested to randomize my input data couple of 1000 times (or even more). The randomization should be in a way that the sum of each row and column be the same as it is in our input.

One example of randomization could be this one:

table3(Random1)
* C1 C2 C3 c4 c5
V1 1 1 1 0 1 =4
V2 0 1 1 1 1 =4
V3 1 1 1 1 1 =5
V4 1 1 0 0 0 =2

=3 =4 =3 =2 =3 =15

As you see the sum of each row and column are the same as table1(input).

After creating each random table, the mentioned script has to apply on it which gonna give a table in the format of table2(Reference). For table3(Random1) it would be like this:

table4(Random1_co-presence)
* V1 V2 V3 V4
V1(4) 1 0.75 1 0.50
V2(4) 0.75 1 1 0.25
V3(5) 0.80 0.80 1 0.40
V4(2) 1 0.50 1 1

And here is the tricky part. each co-presence table for each randomization step has to be compared with table2(reference). If the value in each cell was equal or greater than the corresponding value in table2(reference) 1 has to be pushed to table5(output) for that cell.

with 1 randomization, table5(output) would be like this:

table5(output):
* V1 V2 V3 V4
V1 1 1 1 1
V2 1 1 1 0
V3 1 1 1 1
V4 1 0 1 1

You can see that by 1 randomization and calculating the co-presence table, all values (except two of them) are equal or greater than values in corresponding cells of table2(reference). And that's why we see in table5(output)

When it goes to the next randomization the same calculation has to be done and for those equal or greater values table5(output) has to be updated in each round of randomization.

Lets assume that we do the randomization 10 times. The table5(output) would be something similar to this:

* V1 V2 V3 V4
V1 10 8 10 6
V2 1 10 10 2
V3 10 10 10 10
V4 7 3 10 10

(In the following link I found a discussion regarding generating random matrix with similar conditions to my problem. It might be useful...
random - Randomize matrix in perl, keeping row and column totals the same - Stack Overflow)

Thanks for your patience to read this thread! Any ideas in Perl or Shell would be so helpful!

Last edited by @man; 12-05-2012 at 08:20 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Make Separated files from a single matrix - Perl

Hey Masters, Here is my input: fragmentID chromosome start end HEL25E TRIP1 r5GATC2L00037 chr2L 5301 6026 0.03 0.036 r5GATC2L00038 chr2L 6023 6882 -0.025 -0.041 r5GATC2L00040 chr2R 6921 7695 -0.031 0.005 r5GATC2L00042 chr2R 7715 8554 -0.006 -0.024 r5GATC2L00043 chr3L 8551 8798 0.042 0... (4 Replies)
Discussion started by: @man
4 Replies

2. Shell Programming and Scripting

Perl- creating a matrix from a 3 column file

Dear all, I'm new in perl scripting and I'm trying to creating a matrix from a 3 column file sorting data in a particular manner. In the final matrix I need to have the first column "IDs" on the header of the columns and the second column values on the header of each row. And the value fo the... (2 Replies)
Discussion started by: gabrysfe
2 Replies

3. Programming

Matrix addition - Perl - PDL

Hi All, I have a question. I need to add 3 matrices of size 2000 x 2000. (i.e) 2000 rows and 2000 columns using Perl::PDL module. I used the following perl script #!/usr/bin/perl -w use strict; use warnings; use PDL; use PDL::Matrix; if ( @ARGV != 3 ) { die 'Two matrix files are... (1 Reply)
Discussion started by: Fredrick
1 Replies

4. Shell Programming and Scripting

3 column .csv --> correlation matrix; awk, perl?

Greetings, salutations. I have a 3 column csv file with ~13 million rows and I would like to generate a correlation matrix. Interestingly, you all previously provided a solution to the inverse of this problem. Thread title: "awk? adjacency matrix to adjacency list / correlation matrix to list"... (6 Replies)
Discussion started by: R3353
6 Replies

5. Shell Programming and Scripting

Invert Matrix of Data - Perl

I have columnar data in arrays perl, Example - @a = (1,2,3); @array1 = (A,B,C); @array2 = (D,E,F); @array3 = (I,R,T); I want the data to be formatted and printed as 1 A D I 2 B E F 3 C F T and so on... (8 Replies)
Discussion started by: dinjo_jo
8 Replies

6. Shell Programming and Scripting

Perl- matrix problem

A C G T - A 5 -4 -4 -4 -5 C -4 5 -4 -4 -5 G -4 -4 5 -4 -5 T -4 -4 -4 5 -5 - -5 -5 -5 -5 0 So lets say I have a matrix which looks something like (above). Its basically a scoring matrix. the numbers are... (2 Replies)
Discussion started by: aj05
2 Replies

7. Shell Programming and Scripting

PERL: How do i print an associative matrix?

Hello guys, I have in PERL an associative 2-dimensional array, called matrix. The array (actually the matrix) is made up like this matrix = x; matrix = y; matrix = w; matrix = z; ... but the names a, b, c, d are set just at runtime. The question is: how can i get all the keys of... (2 Replies)
Discussion started by: foo.bar
2 Replies

8. Shell Programming and Scripting

PERL: How do I get both sizes of a matrix?

Hi everybody, I have a matrix called @matrix dinamically built in PERL, so I don't know its exact sizes. It is a 2-dimensional matrix, so its elements are for example: $matrix $matrix $matrix $matrix $matrix etc... I know i can get the number of the rows of the matrix with the... (2 Replies)
Discussion started by: foo.bar
2 Replies

9. Shell Programming and Scripting

randomization

I have two files: First file with 10 words, as follow: randomword1, randomword2, randomword3, randomword4, etc... Second File shell script. word=$(cat hangman_words | cut -d" " -f1) letters=$(echo $word | wc -c) letters=$(( $letters - 1 )) echo $letters echo $word The script... (13 Replies)
Discussion started by: keyvan
13 Replies
Login or Register to Ask a Question