Visit The New, Modern Unix Linux Community

how to calculate all pairwise distances in two dimensions and transform them into a matrix

Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to calculate all pairwise distances in two dimensions and transform them into a matrix
# 1  
how to calculate all pairwise distances in two dimensions and transform them into a matrix

Hello to all,

I am very new in the shell scripting and I need help. I have data for several individuals in several rows followed by a tag and by 5 values per row, with the name of the individual in the first column, e.g.:
  IND1 H1 12 13 12 15 14
  IND2 H2 12 12 15 14 14
  IND3 H1 12 15 12 14 11

I would like to calculate the sum of the absolute values of the pairwise differences between individuals. For instance:
  Distance between IND1 and IND2=|12-12|+|13-12|+|12-15|+|15-14|+|14-14|= 5
  Distance between IND1 and IND3=|12-12|+|13-15|+|12-12|+|15-14|+|14-11|= 6
  Distance between IND2 and IND3=|12-12|+|12-15|+|15-12|+|14-14|+|14-11|= 9

Additionally, if the tags of two individuals are different, I would like to sum 100 to the final number. So finally the distance between:
IND1 and IND2 would be = 5 + 100 = 105
IND1 and IND3 would be = 6 + 0 = 6
IND2 and IND3 would be = 9 + 100 = 109

After this, I would like to transform this list of distances into a matrix:
  IND1   IND2   IND3
  IND2   105
  IND3   6        109

Could some one help me with this? Thanks a lot in advance! Best!
Moderator's Comments:
Mod Comment Please use code tags!
# 2  
Here is a solution using awk:

awk '
{ for(i=1;i<=NF;i++) D[NR,i]=$i }
  for(i=1;i<=NR;i++) printf "\t" D[i,1];
  print ""
  for(i=1;i<=NR;i++) {
      printf D[i,1];
      for(j=1;j<=NR;j++) {
           for(k=3;k<8;k++) TOT+=D[i,k]>D[j,k]?D[i,k]-D[j,k]:D[j,k]-D[i,k]
           printf "\t" TOT;
      print ""
}' infile

Output for example infile is:
        IND1    IND2    IND3
IND1    0       105     6
IND2    105     0       109
IND3    6       109     0

# 3  

It works great, thanks a lot!



Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #457
Difficulty: Easy
NTP was created to synchronize all participating computers to within a few milliseconds of Coordinated Universal Time (UTC).
True or False?

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Create 'n' number random pairwise combination of words

File 1 contains the list of words that needed to be randomly paired: Tiger Cat Fish Frog Dog Mouse Elephant Monkey File 2 contains the pairs that should not be used (in any solution) during random pairing. Elephant-Dog Cat-Fish Monkey-Frog Dog-Elephant, Fish-Cat, Frog-Monkey... (1 Reply)
Discussion started by: sammy777888
1 Replies

2. Shell Programming and Scripting

Transform columns to matrix

The following code transform the matrix to columns. Is it possible to do it other way around ( get the input from the output) ? input y1 y2 y3 y4 y5 x1 0.3 0.5 2.3 3.1 5.1 x2 1.2 4.1 3.5 1.7 1.2 x3 3.1 2.1 1.0 4.1 2.1 x4 5.0 4.0 6.0 7.0 1.1 output x1 y1 0.3 x2 y1 1.2 x3... (1 Reply)
Discussion started by: quincyjones
1 Replies

3. Shell Programming and Scripting

Split files by pairwise combination

I have 2 files $ cat tmp A1 File1a B1 File1b A2 File2a B2 File2b A1 File1a B3 File3b and $ cat tmp1 A1/B1 File3 A1/B1 File4 A1/B1 File5 A1/B1 File6 A1/B1 File7 A2/B2 File8 A2/B2 File9 A2/B2 File10 (1 Reply)
Discussion started by: senhia83
1 Replies

4. Shell Programming and Scripting

Calculate percentage of columns greater than certain value in a matrix using awk

This matrix represents correlation values. Is it possible to calculate the percentage of columns (a1, a2, a3) that have a value >= |0.5| and report the percentage that has positive correlation >0.5 and negative correlation <-0.5 separately. thanx in advance! input name a1 a2 a3... (5 Replies)
Discussion started by: quincyjones
5 Replies

5. Shell Programming and Scripting

Eliminating sequences based on Distances

I have to remove sequences from a file based on the distance value. I am attaching the file containing the distances (Distance.xls) The second file looks something like this: Sequences.txt >Sample1 Freq 59 ggatatgatgatgaactggt >Sample1 Freq 54 ggatatgatgttgaactggt >Sample1 Freq 44... (2 Replies)
Discussion started by: Xterra
2 Replies

6. Shell Programming and Scripting

awk to log transform on matrix file

Hi Friends, I have an input matrix file like this Col1 Col2 Col3 Col4 R1 1 2 3 4 R2 4 5 6 7 R3 5 6 7 8 I would like to consider only the numeric values without touching the column header and the row header. I looked up on the forum's search, and I found this. But, I donno how to... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

7. Shell Programming and Scripting

Removing distances from Newick tree format

I have a large numbers of files containing data that look like this: (ID31:0.01682,(ID-123:0.00000,(ID_24:0.00000,ID&890:0.00000):0.00000):0.00000,ID12876:0.00000); (ID_24:-0.00052,(ID31:0.01697,(ID-123:-0.00059,ID&890:0.03528):0.00037):0.00027,ID12876:0.03484); I need to find ":" anywhere... (6 Replies)
Discussion started by: Xterra
6 Replies

8. Shell Programming and Scripting

dimensions 10

Hi, We are using dimensions 10 (source code control system) for our programs. Some programs contain special characters like $' , #' , , etc.. During the check-out process of an item , a unix shell script will be called to process the item. If the item contains a $' character, it will... (0 Replies)
Discussion started by: mrs_rajan
0 Replies

9. Programming

hoe to allocate a 2 dimensions array?

hi . how can I allocate a 2 dimensions array? I used : { int i; /* Allocating the rows */ Schedule = (int **)( malloc( sizeof(int*) * (N-2) ) ); if( Schedule == NULL ) { printf("\nError - couldn't allocate memory! Aborting...\n"); exit(-1); } /* Allocating memory for... (2 Replies)
Discussion started by: azran
2 Replies

Featured Tech Videos