09-20-2010
Thank you.
Jim,
{$1$2} was a typo.
rdcwayx,
My files are not Latitude Longitude Degrees, but State Plane Feet.
They typically contain 3 million records.
A file looks like, PointNumber Easting Northing Elevation, as follows:
PointNumber_0000001 1000000.123456 1000000.123456 10000.123456
PointNumber_0000010 1000001.234567 1000002.234567 10345.234567
PointNumber_0000100 1000010.345678 1000020.456789 10030.987654
PointNumber_0001000 1000050.345678 1000050.456789 10030.987654
PointNumber_0010000 1000123.123456 1000456.123456 10789.123456
PointNumber_0100000 1000123.123456 1000456.123456 10789.123456
PointNumber_1000000 1000000.123456 1000000.123456 10000.123456
PointNumber_2000000 1000011.345678 1000021.456789 10030.987654
PointNumber_3000000 1000051.000678 1000049.999000 10030.987654
Where, relative to fields 2 and 3:
PointNumber_1000000 is an "exact duplicate" of PointNumber_0000001
PointNumber_0100000 is an "exact duplicate" of PointNumber_0010000
Where, relative to fields 2 and 3, and within a user defined range of + or - 2.0:
PointNumber_2000000 is a "near duplicate" of PointNumber_0000100
PointNumber_3000000 is a "near duplicate" of PointNumber_0001000
So a point/record is a "near duplicate" when the easting and northing are within a user defined range. So if I use a value of 2.75 feet for a range, then if a record has easting and northing that are within 2.5 feet of any other record then it it to considered a "near duplicate" and deleted.
If possible, it would be great if I could get two files from the input file:
1. An output file with the near duplicates removed.
2. An output file with the near duplicates that were removed.
Thank you again,
Kenny.
---------- Post updated at 01:38 PM ---------- Previous update was at 09:25 AM ----------
Jim,
When I use your code on the sample data set in my previous post, it prints the whole file.
Kenny.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have duplicates records in a file, but they are not consecutive. I want to remove the duplicates , using script.
Can some one help me in writing a ksh script to implement this task.
Ex file is like below.
1234
5689
4556
1234
4444 (7 Replies)
Discussion started by: Srini75
7 Replies
2. Shell Programming and Scripting
Hello,
I am starting up an Xnest window and trying to place a program inside of it. I have the window inside of it now but it always spawns with the top left corner at (0, 0). I need to find a way to set the x and y coordinates to something other than (0, 0). I tried using the -geometry option... (1 Reply)
Discussion started by: lesnaubr
1 Replies
3. Shell Programming and Scripting
hi,
i have a pair of latitude and longitude and i want to calculate the distance between these two points. In vbscript i achieved in the following way...Now i want to implement this in unix shell scripting....
<%
Dim lat1, lon1, lat2, lon2
const pi = 3.14159265358979323846
... (8 Replies)
Discussion started by: aemunathan
8 Replies
4. Shell Programming and Scripting
Hi
I have a file whose sample contents are shown here,
1.2.3.4->2.4.2.4 a(10) b(20) c(30)
1.2.3.4->2.9.2.4 a(10) c(20)
2.3.4.3->3.6.3.2 b(40) d(50) c(20)
2.3.4.3->3.9.0.2 a(40) e(50) c(20)
1.2.3.4->3.4.2.4 a(10) c(30)
6.2.3.4->2.4.2.5 c(10)
.
.
.
.
Here I need to search... (5 Replies)
Discussion started by: AKD
5 Replies
5. Shell Programming and Scripting
Hi,
I have this problem on how to place the cursor in a text editor (for example: pico).
I made this script that would attach comments to a script file then open the script file,
I would like to know how to place the cursor in a specific place,
for example at the end of the comments,
... (1 Reply)
Discussion started by: lechelle
1 Replies
6. Shell Programming and Scripting
Hi guys. Can anyone tell me how to determine points between two coardinates. For example: Which type of command line gives me
50 points between (8, -5, 7) and (2, 6, 9) points
Thanks (5 Replies)
Discussion started by: rpf
5 Replies
7. Shell Programming and Scripting
Hello all, this might be better suited for a bioinformatics forum, but I thought I'd try my luck here as well.
I have several tabular text files of DNA sequence reads that appear as such:
File_1.txt
>H01BA45XW GATTACAGATTCGACATCCAACTGAGGCATT
>H02BG78WR CCTTACAGACTGGGCATGAATATTGCATACC... (3 Replies)
Discussion started by: vectorborne5
3 Replies
8. UNIX for Dummies Questions & Answers
Hi,
I would like to have the length of a segment based on coordinates of its parts.
Example input file:
chr11 genes_good3.gtf aggregate_gene 1 100 gene1
chr11 genes_good3.gtf exonic_part 1 60
chr11 genes_good3.gtf exonic_part 70 100
chr11 genes_good3.gtf aggregate_gene 200 1000 gene2... (2 Replies)
Discussion started by: fadista
2 Replies
9. UNIX for Dummies Questions & Answers
Hi,
I would like to know how can I get the ID of a feature if its genomic coordinates overlap the coordinates of another file. Example:
Get the 4th column (ID) of this file1:
chr1 10 100 gene1
chr2 3000 5000 gene2
chr3 200 1500 gene3
if it overlaps with a feature in this file2:
chr2... (1 Reply)
Discussion started by: fadista
1 Replies
10. UNIX for Beginners Questions & Answers
I have a variation table (variation.txt) which is a very big file. The first column in the chromosome number and the second column is the position of the variation. I have a second file annotation.txt which has a list of 37,000 genes (1st column), their chromosome number(2nd column), their start... (1 Reply)
Discussion started by: Sanchari
1 Replies