Hi,
I'm trying to generate a distance matrix between sample pairs for use in a tree-drawing program (example below). The example below demonstrates what I'd like to get out of the data - essentially, to calculate the proportion of positions where two samples differ.
Any help much appreciated! Also, any notes on how the functions work would be great!
Thanks!
Example input (note: comma indicates column separators, a:d are sample names):
a,1,2,4,4
b,2,1,4,4
c,1,2,3,4
d,1,0,4,0
Identify
positions which differ between pairwise comparisons of samples a:d (score 1 for differ, 0 for shared in example below)
some comparisons are duplicates, e.g. ab and ba, and self-comparisons such as aa or bb are obviously all "1", but these are neccessary to make the matrix
aa,1,1,1,1
ab,1,1,0,0
ac,0,0,1,0
ad,0,1,0,1
ba,1,1,0,0
bb,1,1,1,1
bc,1,1,1,0
etc... to dd
Calculate
proportion of differing positions between pairwise comparisons
aa,0
ab,0.5
ac,0.25
ad,0.5
ba,0.5
bb,0
bc,0.75
etc...to dd
prepare matrix (e.g. ab value plotted in [a,b]; ba value plotted in [b,a] etc...)
a,b,c,d
a,0,0.5,0.25,0.5
b,0.5,0,0.75 etc...
c
d