Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How to make a distance matrix Post 302417989 by auburn on Sunday 2nd of May 2010 06:54:15 AM
Old 05-02-2010
How to make a distance matrix

Hi,

I'm trying to generate a distance matrix between sample pairs for use in a tree-drawing program (example below). The example below demonstrates what I'd like to get out of the data - essentially, to calculate the proportion of positions where two samples differ.
Any help much appreciated! Also, any notes on how the functions work would be great!

Thanks! Image


Example input (note: comma indicates column separators, a:d are sample names):

a,1,2,4,4
b,2,1,4,4
c,1,2,3,4
d,1,0,4,0

Identify positions which differ between pairwise comparisons of samples a:d (score 1 for differ, 0 for shared in example below)
some comparisons are duplicates, e.g. ab and ba, and self-comparisons such as aa or bb are obviously all "1", but these are neccessary to make the matrix

aa,1,1,1,1
ab,1,1,0,0
ac,0,0,1,0
ad,0,1,0,1
ba,1,1,0,0
bb,1,1,1,1
bc,1,1,1,0
etc... to dd

Calculate proportion of differing positions between pairwise comparisons
aa,0
ab,0.5
ac,0.25
ad,0.5
ba,0.5
bb,0
bc,0.75
etc...to dd

prepare matrix (e.g. ab value plotted in [a,b]; ba value plotted in [b,a] etc...)

a,b,c,d
a,0,0.5,0.25,0.5
b,0.5,0,0.75 etc...
c
d
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Lat/Long Distance Calculation

I amtrying to write a script that would compute the distance between an "x" number of points. This is what I have come up with so far and it is not working. Can anyone modify it to make it work? A=34.16597 B=-84.33244 C=34.2344 D=-84.29189 test "$A" -eq "$C" -o "$B" -eq "$D" then echo... (3 Replies)
Discussion started by: Ernst
3 Replies

2. Shell Programming and Scripting

program to calculate distance between 5 atoms

Hello, I am a beginner with perl. I have a perl program to calculate the distance between 5 atoms or more. i have an array which looks like this: 6.324 32.707 50.379 5.197 32.618 46.826 4.020 36.132 46.259 7.131 38.210 45.919 6.719 38.935 42.270 2.986 39.221 ... (1 Reply)
Discussion started by: annie_singh
1 Replies

3. Programming

Converting distance list to distance matrix in R

Hi power user, I have this type of data (distance list): file1 A B 10 B C 20 C D 50I want output like this # A B C D A 0 10 30 80 B 10 0 20 70 C 30 20 0 50 D 80 70 50 0 Which is a distance matrix I have tried... (0 Replies)
Discussion started by: anjas
0 Replies

4. Shell Programming and Scripting

diagonal matrix to square matrix

Hello, all! I am struggling with a short script to read a diagonal matrix for later retrieval. 1.000 0.234 0.435 0.123 0.012 0.102 0.325 0.412 0.087 0.098 1.000 0.111 0.412 0.115 0.058 0.091 0.190 0.045 0.058 1.000 0.205 0.542 0.335 0.054 0.117 0.203 0.125 1.000 0.587 0.159 0.357... (11 Replies)
Discussion started by: yifangt
11 Replies

5. Shell Programming and Scripting

Calculate distance and azimuth

Hi all, I have a data file like this lat lon lat lon 12.000 25.125 14.235 25.012 14.200 81.000 25.584 25.014 45.023 25.365 25.152 35.222 I want to calculate distance and azimuth between this points eg:- 12.000,25.125 and 14.235,25.012 I want to use awk programming... (3 Replies)
Discussion started by: chamara
3 Replies

6. Ubuntu

How to convert full data matrix to linearised left data matrix?

Hi all, Is there a way to convert full data matrix to linearised left data matrix? e.g full data matrix Bh1 Bh2 Bh3 Bh4 Bh5 Bh6 Bh7 Bh1 0 0.241058 0.236129 0.244397 0.237479 0.240767 0.245245 Bh2 0.241058 0 0.240594 0.241931 0.241975 ... (8 Replies)
Discussion started by: evoll
8 Replies

7. Shell Programming and Scripting

finding distance between numbers

Hi, I have a file as ABC 1634230,1634284,1634349,1634468 1634272,1634301,1634356,1634534 What I want is to find distance between the numbers.. column 1 is the gene name and column 2 are starts and column 3 are their respective stops for the starts. So what I want is column 3 which has +1... (2 Replies)
Discussion started by: Diya123
2 Replies

8. Shell Programming and Scripting

Make Separated files from a single matrix - Perl

Hey Masters, Here is my input: fragmentID chromosome start end HEL25E TRIP1 r5GATC2L00037 chr2L 5301 6026 0.03 0.036 r5GATC2L00038 chr2L 6023 6882 -0.025 -0.041 r5GATC2L00040 chr2R 6921 7695 -0.031 0.005 r5GATC2L00042 chr2R 7715 8554 -0.006 -0.024 r5GATC2L00043 chr3L 8551 8798 0.042 0... (4 Replies)
Discussion started by: @man
4 Replies

9. Shell Programming and Scripting

Edit distance using perl or awk

Dear all, I am working on a large Sindhi lexicon which I hope to complete by 2017 and place in open source. The database is in Arabic script in two columns delimited by an equal to sign. Column 1 contains a word or words without the short vowel and also some extraneous information which is... (0 Replies)
Discussion started by: gimley
0 Replies

10. Shell Programming and Scripting

Calculate average, azimut and distance

Gents, Please i will to get the distance and azimut from 2 coordinates: Usig excel formula i get the correct values, but i will like to do it using awk. Example A 35089.0 50345.016 9 75 1 2101774 77 70 79 483911.6 2380106.9 137.4 1 1 6 1 A 35089.0 50345.01620 75... (8 Replies)
Discussion started by: jiam912
8 Replies
FITCIRCLE(l)                                                                                                                          FITCIRCLE(l)

NAME
fitcircle - find mean position and pole of best-fit great [or small] circle to points on a sphere. SYNOPSIS
fitcircle [ xyfile ] -Lnorm [ -H[nrec] ] [ -S ] [ -V ] [ -: ] [ -bi[s][n] ] DESCRIPTION
fitcircle reads lon,lat [or lat,lon] values from the first two columns on standard input [or xyfile]. These are converted to cartesian three-vectors on the unit sphere. Then two locations are found: the mean of the input positions, and the pole to the great circle which best fits the input positions. The user may choose one or both of two possible solutions to this problem. The first is called -L1 and the second is called -L2. When the data are closely grouped along a great circle both solutions are similar. If the data have large dispersion, the pole to the great circle will be less well determined than the mean. Compare both solutions as a qualitative check. The -L1 solution is so called because it approximates the minimization of the sum of absolute values of cosines of angular distances. This solution finds the mean position as the Fisher average of the data, and the pole position as the Fisher average of the cross-products between the mean and the data. Averaging cross-products gives weight to points in proportion to their distance from the mean, analogous to the "leverage" of distant points in linear regression in the plane. The -L2 solution is so called because it approximates the minimization of the sum of squares of cosines of angular distances. It creates a 3 by 3 matrix of sums of squares of components of the data vectors. The eigenvectors of this matrix give the mean and pole locations. This method may be more subject to roundoff errors when there are thousands of data. The pole is given by the eigenvector corresponding to the smallest eigenvalue; it is the least-well represented factor in the data and is not easily estimated by either method. -L Specify the desired norm as 1 or 2, or use -L or -L3 to see both solutions. OPTIONS
xyfile ASCII [or binary, see -b] file containing lon,lat [lat,lon] values in the first 2 columns. If no file is specified, fitcircle will read from standard input. -H Input file(s) has Header record(s). Number of header records can be changed by editing your .gmtdefaults file. If used, GMT default is 1 header record. -S Attempt to fit a small circle instead of a great circle. The pole will be constrained to lie on the great circle connecting the pole of the best-fit great circle and the mean location of the data. -V Selects verbose mode, which will send progress reports to stderr [Default runs "silently"]. -: Toggles between (longitude,latitude) and (latitude,longitude) input/output. [Default is (longitude,latitude)]. Applies to geo- graphic coordinates only. -bi Selects binary input. Append s for single precision [Default is double]. Append n for the number of columns in the binary file(s). [Default is 2 input columns]. EXAMPLES
Suppose you have lon,lat,grav data along a twisty ship track in the file ship.xyg. You want to project this data onto a great circle and resample it in distance, in order to filter it or check its spectrum. Try: fitcircle ship.xyg -L2 project ship.xyg -Cox/oy -Tpx/py -S -pz | sample1d -S-100 -I1 > output.pg Here, ox/oy is the lon/lat of the mean from fitcircle, and px/py is the lon/lat of the pole. The file output.pg has distance, gravity data sampled every 1 km along the great circle which best fits ship.xyg SEE ALSO
gmt(1gmt), project(1gmt), sample1d(1gmt) 1 Jan 2004 FITCIRCLE(l)
All times are GMT -4. The time now is 12:25 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy