awk? adjacency matrix to adjacency list / correlation matrix to list


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk? adjacency matrix to adjacency list / correlation matrix to list
# 1  
Old 10-01-2011
awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone
I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row annotation) && the header of the field (column annotation) and some spaces etc. preceding it.
In more abstract terms, I wish to convert an adjacency matrix into adjacency list, which includes the edge weights. In excel it works fine with =IF(B2>0;$A2&" (gg) "&B$1&" = "&B2;"") and some grep but the datasets became just to big. P.S 0 values should be omitted.
Input
 AAABAC...ZZ
AA100.500.7
AB010.300
AC00100.1
...00010
ZZ00001

Output
AA (gg) AA = 1
AA (gg) AC = 0.5
AA (gg) ZZ = 0.7
AB (gg) AB = 1
AB (gg) AC = 0.3
AC (gg) AC = 1
AC (gg) ZZ = 0.1
... (gg) ... = 1
ZZ (gg) ZZ = 1


Thanks a million for your efforts☺.
# 2  
Old 10-01-2011
I've assumed that your input file is a plain ascii file that looks like this:

Code:
         AA     AB      AC      BB      ZZ
AA      1       0       0.5     0       0.7
AB      0       1       0.3     0       0
AC      0       0       1       0       0.1
BB      0       0       0       1       0
ZZ      0       0       0       0       1

columns are tab or space separated. If you want to separate them with some other character that'd work, you'll just need to invoke awk with the -F option that supplies the separator character.

then this small awk programme will generate output like you've indicated:
Code:
#!/usr/bin/env ksh

awk '
    NR == 1 {       # read col headers
        for( i = 1; i <= NF; i++ )
            col_lab[i+1] = $(i);
        next;
    }

    {
        for( i = 2; i <= NF; i++ )
            if( $(i)+0 > 0 )
                printf( "%s (gg) %s  = %s\n", $1, col_lab[i], $(i) );
    }
' input-file-name

I replaced the ... with BB just to have a col/row label. The programme will work with any number of rows/columns. Running the sample file above, yields this output:

Code:
AA (gg) AA  = 1
AA (gg) AC  = 0.5
AA (gg) ZZ  = 0.7
AB (gg) AB  = 1
AB (gg) AC  = 0.3
AC (gg) AC  = 1
AC (gg) ZZ  = 0.1
BB (gg) BB  = 1
ZZ (gg) ZZ  = 1

This User Gave Thanks to agama For This Post:
# 3  
Old 10-01-2011
awk '{if(NR==1){while(i<NF){i++;a[i]=$i;}}else{for(i in a){print a[i]" (gg) "$1" = "$(i+1)}}}' input
This User Gave Thanks to ltomuno For This Post:
# 4  
Old 10-01-2011
Quote:
Originally Posted by ltomuno
awk '{if(NR==1){while(i<NF){i++;a[i]=$i;}}else{for(i in a){print a[i]" (gg) "$1" = "$(i+1)}}}' input
Note that for(i in a) doesn't preserve order, so if column order is important, this is flawed. This also prints columns when the edge value is zero which I believe is undesired as a value of 0 in a correlation matrix indicates no edge in the graph. This also prints as col (gg) row; from the original post this should be reversed.
This User Gave Thanks to agama For This Post:
# 5  
Old 10-02-2011
Using Perl on a space-delimited file:

Code:
$
$
$ cat f11
        AA      AB      AC      BB      ZZ
AA      1       0       0.5     0       0.7
AB      0       1       0.3     0       0
AC      0       0       1       0       0.1
BB      0       0       0       1       0
ZZ      0       0       0       0       1
$
$
$ perl -lane 'if ($.==1){@x=@F} else {for($i=1; $i<=$#F; $i++){print "$F[0] (gg) $x[$i-1] = $F[$i]" if $F[$i] != 0}}' f11
AA (gg) AA = 1
AA (gg) AC = 0.5
AA (gg) ZZ = 0.7
AB (gg) AB = 1
AB (gg) AC = 0.3
AC (gg) AC = 1
AC (gg) ZZ = 0.1
BB (gg) BB = 1
ZZ (gg) ZZ = 1
$
$

tyler_durden
This User Gave Thanks to durden_tyler For This Post:
# 6  
Old 10-02-2011
awk? adjacency matrix to adjacency list / correlation matrix to list

Dear all
Thanks so much for the great solutions. Life is just wonderful if things work as smoothly.
good day to all of you
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to sum the matrix using awk?

input A1 B1 A2 B2 0 0 1 1 1 0 0 1 0 1 1 0 1 1 1 1 Output label A1 B1 A2 B2 A1 2 1 1 2 B1 1 2 2 1 A2 1 2 3 2 B2 2 1 2 3 Ex: The number of times that A1 and B1 row values are both 1 should be printed as output. The last row of A1 and B1 in the input match by having 1 in both... (4 Replies)
Discussion started by: quincyjones
4 Replies

2. UNIX for Dummies Questions & Answers

List to matrix

I want to go from 1,a,1a 1,b,1b 1,c,1c 2,a,2a 2,b,2b 3,a,3a to a,b,c 1,1a,1b,1c 2,2a,2b,- 3,3a,-,- Here is what I tried awk -F, 'BEGIN {OFS = ","} (4 Replies)
Discussion started by: senhia83
4 Replies

3. Shell Programming and Scripting

Weighted adjacency list to adjacency matrix

dear awk gurus, i would need a fast (therefore) awk solution for the reformation of an uncomplete weighted adjacency list to a complete sorted adjacency matrix. example (FS=OFS=,): a,d,0.33 a,b,0.25 b,c,0.11 should give: ,a,b,c,d a,1,0.25,0,0.33 b,0.25,1,0.11,0... (4 Replies)
Discussion started by: dietmar13
4 Replies

4. Shell Programming and Scripting

3 column .csv --> correlation matrix; awk, perl?

Greetings, salutations. I have a 3 column csv file with ~13 million rows and I would like to generate a correlation matrix. Interestingly, you all previously provided a solution to the inverse of this problem. Thread title: "awk? adjacency matrix to adjacency list / correlation matrix to list"... (6 Replies)
Discussion started by: R3353
6 Replies

5. Shell Programming and Scripting

Summing up a matrix using awk

Hi there, If anyone can help me sorting out this small task would be great. Given a matrix like the following: 100 3 3 3 3 3 ... 200 5 5 5 5 5 ... 400 1 1 1 1 1 ... 500 8 8 8 8 8 ... 900 0 0 0 0... (5 Replies)
Discussion started by: JRodrigoF
5 Replies

6. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

7. Shell Programming and Scripting

diagonal matrix to square matrix

Hello, all! I am struggling with a short script to read a diagonal matrix for later retrieval. 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125 1.000 0.587 0.159 0.357... (11 Replies)
Discussion started by: yifangt
11 Replies

8. Shell Programming and Scripting

awk matrix problem

hi there I'm very new in programing and i've started with awk. I'm processing 200 data files and I need to do some precessing on them. The files have 3 columns with N-lines for each line a have on the first and second value is the same for all the files and only the third is variable. like... (2 Replies)
Discussion started by: philstar
2 Replies

9. Programming

Converting distance list to distance matrix in R

Hi power user, I have this type of data (distance list): file1 A B 10 B C 20 C D 50I want output like this # A B C D A 0 10 30 80 B 10 0 20 70 C 30 20 0 50 D 80 70 50 0 Which is a distance matrix I have tried... (0 Replies)
Discussion started by: anjas
0 Replies

10. UNIX for Dummies Questions & Answers

need help-matrix inverse (awk)

I have few days to complete my awk homework. But I'm stucked. i hope some1 will help me out. I have to inverse n x n matrix, but I have problems with finding the determinant of the matrix. I found the algoritm, how to find a determinant of n x n matrix:... (0 Replies)
Discussion started by: vesyyr
0 Replies
Login or Register to Ask a Question