Weighted adjacency list to adjacency matrix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Weighted adjacency list to adjacency matrix
# 1  
Old 09-09-2013
Weighted adjacency list to adjacency matrix

dear awk gurus,

i would need a fast (therefore) awk solution for the reformation of an uncomplete weighted adjacency list to a complete sorted adjacency matrix.

example (FS=OFS=,):
Code:
 
a,d,0.33
a,b,0.25
b,c,0.11

should give:
Code:
 
,a,b,c,d
a,1,0.25,0,0.33
b,0.25,1,0.11,0
c,0,0.11,1,0
d,0.33,0,0,1

I have found this thread but this does not work, because (as I think) there are missing combinations, which should be zero or 1 for self-combinations (a-a, b-b, ...).

my idea would be, to read in a 2-dimensional array with the first two columns as indices (already also with transposed indices), build the union of both indices (should not be necessary, if I already transpose the indices?!?), sort the indices, loop two times over all indices and write line for line the values of the array, including zeros for all missing index-pairs and '1' for all self-index pairs.

I can provide a few code snippets, but some problems are outside my awk-capabilities:

Code:
 
awk 'BEGIN { FS =","; OFS="," }
w[$1,$2]=$3; w[$2,$1]=$3
<if i am right, both dimensions of indices should be already complete?>
END {
<first i have to write the column headers>
print OFS; for (i in sorted(w)) {print i OFS }; print "\n";
<now i want loop over the alphabetically sorted indices>
for (i in sorted(w)) {
print i OFS; for (j in sorted(w)) {
if (w(i,j) not defined) print 0;
else if (i==j) print 1;
else print w(i,j)
} print "\n" }
}' ADJlist.txt > ADJmatrix.txt

could you please complete my script, or (as i always learn here) provide a much more elegant solution...

dietmar
# 2  
Old 09-09-2013
Your approach seems essentially right to me. Here is one way it could be moulded into awk:

Code:
awk '
  BEGIN{
    FS=OFS=","
    h="a,b,c,d"
    n=split(h,F)
    print x,h
  }
  {
    A[$1,$2]=A[$2,$1]=$3
  }
  END{
    for(i=1; i<=n; i++) {
      s=F[i]
      for(j=1; j<=n; j++) s=s OFS (i==j ? 1 : A[F[i],F[j]]+0)
      print s
    }
  }
' file

# 3  
Old 09-09-2013
thank's scrutinizer

I always forget, you take the input too literally. ;-)

of course is my real table much, much larger and has not only A,B,C,D.

some things i do not understand:
what means print x,h (x is not defined befor) and
s=F[i] what is F - an new array, used for what?

what i would need is a loop over the sorted index (which are long names like ENSG000006234, ENSG000001345), and i do not know the number of the indices. they are given after i have read in the complete file.

two things i would need. how do i get the length of the 1. (and therefore also the 2.) dimension of the 2D-index, how can i sort these 1. dim index and than loop over this sorted index.
# 4  
Old 09-09-2013
x is an empty string, so ""
F contains the indexes after the split of the header, an n would be the number..

You could solve the header with something like this, but that would still leave the sort part...

Code:
awk '
  {
    A[$1,$2]=A[$2,$1]=$3
    if(!P[$1]++) F[++n]=$1
    if(!P[$2]++) F[++n]=$2
  }
  END{
    for(i=1; i<=n; i++) h=h OFS F[i]
    print h
    for(i=1; i<=n; i++) {
      s=F[i]
      for(j=1; j<=n; j++) s=s OFS (i==j?1:A[F[i],F[j]]+0)
      print s
    }
  }
' FS=, OFS=, file

So you would need to add a bubble sort in the script.

Or you could first extract the indexes, sort them and feed them to the previous suggestion as variable h
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 09-10-2013
asorti from gawk

thanks again,

i searched a little bit and found the asorti function from gawk...

i only have to sort the array F and loop over this sorted array:

n = asort(F, G)

Code:
 
gawk '
BEGIN { FS=OFS="," }  
{
    A[$1,$2]=A[$2,$1]=$3
    if(!P[$1]++) F[++n]=$1
    if(!P[$2]++) F[++n]=$2
  }
  END{
n = asort(F, G)     
for(i=1; i<=n; i++) h=h OFS G[i]
    print h
    for(i=1; i<=n; i++) {
      s=G[i]
      for(j=1; j<=n; j++) s=s OFS (i==j?1:A[G[i],G[j]]+0)
      print s
    }
  }
' file

It seems to work !

dietmar
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

List to matrix

I want to go from 1,a,1a 1,b,1b 1,c,1c 2,a,2a 2,b,2b 3,a,3a to a,b,c 1,1a,1b,1c 2,2a,2b,- 3,3a,-,- Here is what I tried awk -F, 'BEGIN {OFS = ","} (4 Replies)
Discussion started by: senhia83
4 Replies

2. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies

3. UNIX for Dummies Questions & Answers

Calculating weighted average

Dear all, i have 200 values in a file. How can i calculate a weighted average and output into a new file avg.dat? INPUT: file1.dat 1.3453 2.434 2.345 ..... OUTPUT: avg.dat file1: 1.762 Thanks. Po (3 Replies)
Discussion started by: chen.xiao.po
3 Replies

4. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

5. Shell Programming and Scripting

Matrix

Hi All I would like to merge multiple files with the same row and column size into a matrix format In a folder I have multiple files in the following format vi 12.txt a 1 b 5 c 7 d 0 vi 45.txt a 3 b 6 c 9 d 2 vi 9.txt a 4 (7 Replies)
Discussion started by: Lucky Ali
7 Replies

6. UNIX for Dummies Questions & Answers

Help with Weighted Regression in Matlab

I need to weight the regression I am performing on a data file. I am not using all the numbers in the file. Here are my variables: Y = data(,1) X has the same bracketed first term, and the second term is I have eight different weights I want to give, one weight corresponding to two different... (0 Replies)
Discussion started by: kssteig
0 Replies

7. Shell Programming and Scripting

diagonal matrix to square matrix

Hello, all! I am struggling with a short script to read a diagonal matrix for later retrieval. 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125 1.000 0.587 0.159 0.357... (11 Replies)
Discussion started by: yifangt
11 Replies

8. Programming

Converting distance list to distance matrix in R

Hi power user, I have this type of data (distance list): file1 A B 10 B C 20 C D 50I want output like this # A B C D A 0 10 30 80 B 10 0 20 70 C 30 20 0 50 D 80 70 50 0 Which is a distance matrix I have tried... (0 Replies)
Discussion started by: anjas
0 Replies
Login or Register to Ask a Question