Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How to filter out almost dupicate X Y (Easting Northing) coordinates? Post 302454972 by kenneth.mcbride on Monday 20th of September 2010 04:38:55 PM
Old 09-20-2010
Thank you.

Jim,

{$1$2} was a typo.

rdcwayx,

My files are not Latitude Longitude Degrees, but State Plane Feet.
They typically contain 3 million records.

A file looks like, PointNumber Easting Northing Elevation, as follows:

PointNumber_0000001 1000000.123456 1000000.123456 10000.123456
PointNumber_0000010 1000001.234567 1000002.234567 10345.234567
PointNumber_0000100 1000010.345678 1000020.456789 10030.987654
PointNumber_0001000 1000050.345678 1000050.456789 10030.987654
PointNumber_0010000 1000123.123456 1000456.123456 10789.123456
PointNumber_0100000 1000123.123456 1000456.123456 10789.123456
PointNumber_1000000 1000000.123456 1000000.123456 10000.123456
PointNumber_2000000 1000011.345678 1000021.456789 10030.987654
PointNumber_3000000 1000051.000678 1000049.999000 10030.987654

Where, relative to fields 2 and 3:
PointNumber_1000000 is an "exact duplicate" of PointNumber_0000001
PointNumber_0100000 is an "exact duplicate" of PointNumber_0010000
Where, relative to fields 2 and 3, and within a user defined range of + or - 2.0:
PointNumber_2000000 is a "near duplicate" of PointNumber_0000100
PointNumber_3000000 is a "near duplicate" of PointNumber_0001000

So a point/record is a "near duplicate" when the easting and northing are within a user defined range. So if I use a value of 2.75 feet for a range, then if a record has easting and northing that are within 2.5 feet of any other record then it it to considered a "near duplicate" and deleted.

If possible, it would be great if I could get two files from the input file:
1. An output file with the near duplicates removed.
2. An output file with the near duplicates that were removed.

Thank you again,
Kenny.

---------- Post updated at 01:38 PM ---------- Previous update was at 09:25 AM ----------

Jim,

When I use your code on the sample data set in my previous post, it prints the whole file.

Kenny.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing dupicate lines in the file ..(they are not continuous)

I have duplicates records in a file, but they are not consecutive. I want to remove the duplicates , using script. Can some one help me in writing a ksh script to implement this task. Ex file is like below. 1234 5689 4556 1234 4444 (7 Replies)
Discussion started by: Srini75
7 Replies

2. Shell Programming and Scripting

Defining X and Y Coordinates Inside A Window

Hello, I am starting up an Xnest window and trying to place a program inside of it. I have the window inside of it now but it always spawns with the top left corner at (0, 0). I need to find a way to set the x and y coordinates to something other than (0, 0). I tried using the -geometry option... (1 Reply)
Discussion started by: lesnaubr
1 Replies

3. Shell Programming and Scripting

Calculating distance between two LAT long coordinates

hi, i have a pair of latitude and longitude and i want to calculate the distance between these two points. In vbscript i achieved in the following way...Now i want to implement this in unix shell scripting.... <% Dim lat1, lon1, lat2, lon2 const pi = 3.14159265358979323846 ... (8 Replies)
Discussion started by: aemunathan
8 Replies

4. Shell Programming and Scripting

Search for particular tag and arrange as coordinates

Hi I have a file whose sample contents are shown here, 1.2.3.4->2.4.2.4 a(10) b(20) c(30) 1.2.3.4->2.9.2.4 a(10) c(20) 2.3.4.3->3.6.3.2 b(40) d(50) c(20) 2.3.4.3->3.9.0.2 a(40) e(50) c(20) 1.2.3.4->3.4.2.4 a(10) c(30) 6.2.3.4->2.4.2.5 c(10) . . . . Here I need to search... (5 Replies)
Discussion started by: AKD
5 Replies

5. Shell Programming and Scripting

place cursor in specific coordinates

Hi, I have this problem on how to place the cursor in a text editor (for example: pico). I made this script that would attach comments to a script file then open the script file, I would like to know how to place the cursor in a specific place, for example at the end of the comments, ... (1 Reply)
Discussion started by: lechelle
1 Replies

6. Shell Programming and Scripting

Determination n points between two coordinates

Hi guys. Can anyone tell me how to determine points between two coardinates. For example: Which type of command line gives me 50 points between (8, -5, 7) and (2, 6, 9) points Thanks (5 Replies)
Discussion started by: rpf
5 Replies

7. Shell Programming and Scripting

Differential substring removal using coordinates

Hello all, this might be better suited for a bioinformatics forum, but I thought I'd try my luck here as well. I have several tabular text files of DNA sequence reads that appear as such: File_1.txt >H01BA45XW GATTACAGATTCGACATCCAACTGAGGCATT >H02BG78WR CCTTACAGACTGGGCATGAATATTGCATACC... (3 Replies)
Discussion started by: vectorborne5
3 Replies

8. UNIX for Dummies Questions & Answers

Length of a segment based on coordinates

Hi, I would like to have the length of a segment based on coordinates of its parts. Example input file: chr11 genes_good3.gtf aggregate_gene 1 100 gene1 chr11 genes_good3.gtf exonic_part 1 60 chr11 genes_good3.gtf exonic_part 70 100 chr11 genes_good3.gtf aggregate_gene 200 1000 gene2... (2 Replies)
Discussion started by: fadista
2 Replies

9. UNIX for Dummies Questions & Answers

overlapped genomic coordinates

Hi, I would like to know how can I get the ID of a feature if its genomic coordinates overlap the coordinates of another file. Example: Get the 4th column (ID) of this file1: chr1 10 100 gene1 chr2 3000 5000 gene2 chr3 200 1500 gene3 if it overlaps with a feature in this file2: chr2... (1 Reply)
Discussion started by: fadista
1 Replies

10. UNIX for Beginners Questions & Answers

Help with processing coordinates in a file.

I have a variation table (variation.txt) which is a very big file. The first column in the chromosome number and the second column is the position of the variation. I have a second file annotation.txt which has a list of 37,000 genes (1st column), their chromosome number(2nd column), their start... (1 Reply)
Discussion started by: Sanchari
1 Replies
All times are GMT -4. The time now is 10:03 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy