Visit Our UNIX and Linux User Community


Finding Overlap between two sets of data


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding Overlap between two sets of data
# 1  
Old 09-05-2008
Finding Overlap between two sets of data

Hi everyone,
I posted this earlier, but the idea changed since then and I figured it would make more sense if I repost with a clearer idea in hopes someone can help me out.

I have two lists of data in file1 and file 2

file1 (tab separated - column1 column2 column 3)
1 91625106 91626002
1 121185385 121187221
1 141477378 141481748
1 143786446 143787519
1 154452584 154453547
2 91639309 91640060
2 91644584 91645592
2 91653916 91655039
2 91660184 91661295
2 91669205 91670333

file2 (tab separated - column1 column2 column 3)
1 91625115 91626003
1 121185385 121187221
1 143785958 143787823
1 154452584 154453545
1 204875171 204876073
2 91639390 91640185
2 91653640 91654559
2 91660256 91660912
2 91669209 91669849
2 132727258 132728754

Column 2 and 3 from each file represents coordinates in a linear line (think of them as ordered pairs in a straight line). Column 1 is the reference point and when comparing columns 2 and 3, column 1 should be the same. My objective is to find pairs in file1 which DO NOT overlap with the ones in file2.

In this example, the output should represent the pair in file2 (this is the only one that doesn't overalap:

1 204875171 204876073


Someone posted this code but it doesn't do exactly what I had hoped (please see for yourself) but I think it's a good start:
Code:
awk '
        NR==FNR { col1[FNR]=$1; col2[FNR]=$2; col3[FNR]=$3; next }
        $1 == col1[FNR] && ( col2[FNR] > $2 || col3[FNR] > $3 )
' $1 $2

# 2  
Old 09-05-2008
Duplicate posting is not allowed, please read the rules.

Proceed here if you want to add new information to clarify your question:

https://www.unix.com/shell-programmin...comparing.html

Thread closed.

Previous Thread | Next Thread
Test Your Knowledge in Computers #1004
Difficulty: Medium
Hopper College at Caltech University was named in honor of Grace Hopper in 2017.
True or False?

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding the Overlap

Hi Help, I am trying to find a overlap zone by compraing th etwo files which is printed below. File-1 is --- 1011 234 2967 787 235 900 435 654 File 2 is --- 1211 456 595 678 546 678 2967 787 I would like to have a o/p which just read 2967 787,'comm' doesn't seem to do the... (5 Replies)
Discussion started by: Indra2011
5 Replies

2. UNIX for Dummies Questions & Answers

Overlap by two columns

Hi, I want to overlap two files based on two columns in each files. Here I'm overlapping the first two columns of the first file with columns 3 &4 of the second file (Bolded) to get the common lines. File1 ESR1 1 15 ggtga ESR1 7 18 tgcagt FOXA1 3 10 gtgat FOXA1 10 20 tgacc File2... (1 Reply)
Discussion started by: JJ001
1 Replies

3. UNIX for Dummies Questions & Answers

UNIX one line cmd join 2 sets of data from 2 files

Hi all, This is my first and undoubtedly many posts to come. I'm new to using unix and would like a hand with this problem I have. What i'm trying to do is match 2 sets of data from 2 files and put result into file 3. Sounds simply but there is a catch, the match is a "partial field" match, if... (2 Replies)
Discussion started by: tugar
2 Replies

4. Red Hat

Partition Overlap

I have just purchased a new server running RHELS R6 from a well known supplier. The OS came pre-installed so I (as a previous almost identical server) just finished off the install. I have just been extending one of the partitions and noticed an overlap. Filesystem Size Used Avail... (5 Replies)
Discussion started by: zetex
5 Replies

5. Shell Programming and Scripting

How to extract specific data and count number containing sets from a file?

Hello everybody! I am quit new here and hope you can help me. Using an awk script I am trying to extract data from several files. The structure of the input files is as follows: TimeStep parameter1 parameter2 parameter3 parameter4 e.g. 1 X Y Z L 1 D H Z I 1 H Y E W 2 D H G F 2 R... (2 Replies)
Discussion started by: Daniel8472
2 Replies

6. Shell Programming and Scripting

Inserting Lines between data sets using SED?

Hello all and thanks in advance! What I'm looking to do is insert a blank line, anytime the first 9 characters of a given line don't match the first 9 characters of the previous line. i.e. Convert the data set 1 45 64 89 1 89 69 235 2 89 234 67 2 56 90... (1 Reply)
Discussion started by: selkirk
1 Replies

7. Virtualization and Cloud Computing

Clouds (Partially Order Sets) - Streams (Linearly Ordered Sets) - Part 2

timbass Sat, 28 Jul 2007 10:07:53 +0000 Originally posted in Yahoo! CEP-Interest Here is my follow-up note on posets (partially ordered sets) and tosets (totally or linearly ordered sets) as background set theory for event processing, and in particular CEP and ESP. In my last note, we... (0 Replies)
Discussion started by: Linux Bot
0 Replies

8. Shell Programming and Scripting

Reading in data sets into arrays from an input file.

Hye all, I would like some help with reading in a file in which the data is seperated by commas. for instance: input.dat: 1,2,34,/test for the above case, the fn. will store the values into an array -> data as follows: data = 1 data = 2 data = 34 data = /test I am trying to write... (5 Replies)
Discussion started by: sidamin810
5 Replies

Featured Tech Videos