How to make a distance matrix

05-02-2010

Registered User

19, 0

Join Date: Apr 2010

Last Activity: 2 May 2010, 7:19 AM EDT

Posts: 19

Thanks Given: 0

Thanked 0 Times in 0 Posts

How to make a distance matrix

Hi,

I'm trying to generate a distance matrix between sample pairs for use in a tree-drawing program (example below). The example below demonstrates what I'd like to get out of the data - essentially, to calculate the proportion of positions where two samples differ.
Any help much appreciated! Also, any notes on how the functions work would be great!

Thanks!

Example input (note: comma indicates column separators, a:d are sample names):

a,1,2,4,4
b,2,1,4,4
c,1,2,3,4
d,1,0,4,0

Identify positions which differ between pairwise comparisons of samples a:d (score 1 for differ, 0 for shared in example below)
some comparisons are duplicates, e.g. ab and ba, and self-comparisons such as aa or bb are obviously all "1", but these are neccessary to make the matrix

aa,1,1,1,1
ab,1,1,0,0
ac,0,0,1,0
ad,0,1,0,1
ba,1,1,0,0
bb,1,1,1,1
bc,1,1,1,0
etc... to dd

Calculate proportion of differing positions between pairwise comparisons
aa,0
ab,0.5
ac,0.25
ad,0.5
ba,0.5
bb,0
bc,0.75
etc...to dd

prepare matrix (e.g. ab value plotted in [a,b]; ba value plotted in [b,a] etc...)

a,b,c,d
a,0,0.5,0.25,0.5
b,0.5,0,0.75 etc...
c
d

auburn

View Public Profile for auburn

Find all posts by auburn

05-05-2010

Registered User

175, 2

Join Date: Mar 2009

Last Activity: 18 July 2012, 9:53 PM EDT

Posts: 175

Thanks Given: 0

Thanked 2 Times in 2 Posts

Code:

#!/usr/bin/perl

my %hash;
while(<>){
        chomp;
        my ($var,@arr) = split(",");
        push @{$hash{$var}} , @arr;
}

my %prop_hash;
my $header = 1;

foreach my $var1(sort keys %hash){
        if ($header){
                print "\t"; map {print "$_\t";} @var_arr; print "\n";
        }
        $header = 0;
        print "$var1\t";
        foreach my $var2(sort keys %hash){
                my $sum =0;
                for(my $i=0;$i<@{$hash{$var2}};$i++){
                        $sum += ($hash{$var1}[$i] == $hash{$var2}[$i]) ? 0 : 1;
                }
                $prop_hash{"$var1$var2"} = $sum/@{$hash{$var2}};

        }
        print $prop_hash{"$var1$_"}."\t" foreach (@var_arr);

Code:

cat filename | perl scriptname 

        a       b       c       d
a       0       0.5     0.25    0.5
b       0.5     0       0.75    0.75
c       0.25    0.75    0       0.75
d       0.5     0.75    0.75    0

HTH,
PL

daptal

View Public Profile for daptal

Find all posts by daptal

UNIX for Dummies Questions & Answers

How to make a distance matrix

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Calculate average, azimut and distance

Discussion started by: jiam912

2. Shell Programming and Scripting

Edit distance using perl or awk

Discussion started by: gimley

3. Shell Programming and Scripting

Make Separated files from a single matrix - Perl

Discussion started by: @man

4. Shell Programming and Scripting

finding distance between numbers

Discussion started by: Diya123

5. Ubuntu

How to convert full data matrix to linearised left data matrix?

Discussion started by: evoll

6. Shell Programming and Scripting

Calculate distance and azimuth

Discussion started by: chamara

7. Shell Programming and Scripting

diagonal matrix to square matrix

Discussion started by: yifangt

8. Programming

Converting distance list to distance matrix in R

Discussion started by: anjas

9. Shell Programming and Scripting

program to calculate distance between 5 atoms

Discussion started by: annie_singh

10. Shell Programming and Scripting

Lat/Long Distance Calculation

Discussion started by: Ernst