Common records after matching on different columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Common records after matching on different columns
# 1  
Old 02-09-2012
Common records after matching on different columns

Hi,

I have the following files.

cat 1.txt

Quote:
chr1 100 200
chr1 200 300
chr1 1000 1200
chr2 300 400
chr2 400 500
chr2 600 900
chr2 1200 1800
chrz 100 200
chrz 300 400
chrz 400 500
cat 2.txt

Quote:
chr1 100 200
chr1 130 220
chr1 498 600
chr1 700 820
chr1 1499 1600
chr1 1800 1920
chr2 301 330
chr2 600 700
chrz 1000 1350
chrz 420 465
output.txt

Quote:
chr1 100 200 12.txt (because this record comes from both 1.txt and 2.txt)
chr1 200 300 1.txt
chr1 1000 1200 1.txt
chr1 130 220 2.txt
chr1 498 600 2.txt
chr1 700 820 2.txt
chr1 1499 1600 2.txt
chr2 300 400 1.txt
chr2 400 500 1.txt
chr2 600 900 1.txt
chr2 301 330 2.txt
chr2 600 700 2.txt
chrz 100 200 1.txt
chrz 300 400 1.txt
chrz 400 500 1.txt
chrz 420 465 2.txt



The logic is as follows....

chr1 in column1 of file1 should be matched to chr1 in column1 of file2.

Any value that is equal or 300 plus/minus range of the value in column2 of file1 matches to column2 of file2, (i.e., if column2 of file1 is 500, then the value in column2 of file2 can be 500, or between 200 and 500, or between 500 and 800) they should be printed.

Any value that is equal or 300 plus/minus range of the value in column3 of file1 matches to column3 of file2, (i.e., if column3 of file1 is 800, then the value in column3 of file2 can be 800, or between 500 and 800, or between 800 and 1100) they should be printed.

Also, anything that is in the range of column2 and column3 should be printed.
Ex: If file 1 has this record chr2 300 400, and file2 has this record chr1 301 383, both of them should be printed.

Each record is matched to each record in both these files.

I am looking for something that can be used across multiple files that are more than two.

Thanks a ton in advance. I know it is a pain. But, please help me.

---------- Post updated 02-09-12 at 09:59 AM ---------- Previous update was 02-08-12 at 02:18 PM ----------

Please guys. Someone help me out. SmilieSmilieSmilieSmilieSmilieSmilieSmilieSmilie

---------- Post updated at 04:19 PM ---------- Previous update was at 09:59 AM ----------

Any thoughts by anyone?

---------- Post updated at 04:20 PM ---------- Previous update was at 04:19 PM ----------

Any thoughts by anyone?
# 2  
Old 02-09-2012
You had a very similar post that I provided a solution using shell scripts:
https://www.unix.com/shell-programmin...condition.html

Change it according to your new specifications.
# 3  
Old 02-09-2012
That shell script for my earlier solution didn't output some of the records.

Is there another alternative?
# 4  
Old 02-09-2012
If you go back and explain exactly how it didn't work, and show the data which doesn't work, it can probably be fixed.

Asking for a whole new solution might leave you with the same problem as before.
# 5  
Old 02-09-2012
Thanks to both of you for leading me in some way or the other.

The shell script produces the following output.

opfromshell.txt
Quote:
chr1 300 400 1.txt
chr1 350 467 1.txt
But, I was looking for

Originaloutput.txt
Quote:
chr1 300 400 1.txt
chr1 350 467 1.txt
chr1 201 299 2.txt
chr2 800 1000 2.txt
chr2 100 200 2.txt
chr3 500 600 2.txt
Please, no offense. Since the earlier replies in that post worked, I didn't want to bother Shell_life.

Any help is highly appreciated.

Thanks in advance.
# 6  
Old 02-10-2012
Quote:
Originally Posted by jacobs.smith
The shell script produces the following output.

opfromshell.txt
Code:
chr1 300 400 1.txt
chr1 350 467 1.txt

But, I was looking for

Originaloutput.txt
Code:
chr1 300 400 1.txt
chr1 350 467 1.txt
chr1 201 299 2.txt
chr2 800 1000 2.txt
chr2 100 200 2.txt
chr3 500 600 2.txt

Please, no offense. Since the earlier replies in that post worked, I didn't want to bother Shell_life.
If you had noticed, I made the two file names as variables for the shell.

This way, to have your desired output, you just run it twice:
1)
Code:
mF1='1.txt'
mF2='2.txt'

2)
Code:
mF1='2.txt'
mF2='1.txt'

Here is the same solution again:

Code:
#!/bin/ksh
typeset -i mFromA mToA mFromB mToB
mF1='1.txt'      ### <======= Change file name here (I)
mF2='2.txt'      ### <======= Change file name here (II)
mPrevTag=''
#### sort is used to reduce the number of "grep"
sort ${mF1} | while read mTagA mFromA mToA; do
  if [[ "${mTagA}" != "${mPrevTag}" ]]; then
    grep "${mTagA}" ${mF2} > ${mF2}.tmp
  fi
  mFound="N"
  while read mTagB mFromB mToB; do
    if [[ ${mToA} -ge ${mFromB} && ${mFromA} -le ${mToB} ]]; then
      mFound="Y"
      break
    fi
  done < ${mF2}.tmp
  if [[ "${mFound}" = "N" ]]; then
    echo ${mTagA} ${mFromA} ${mToA} ${mF1}
  fi
  mPrevTag=${mTagA}
done

# 7  
Old 02-15-2012
Hi guys,

Somehow I managed to request one of my other friends for a perl script. He was able to write one that could do my task. But, it was only for 2 files.

I would like to request any of you to edit the following code so that it lets me do the task for multiple number of files and multiple cut-offs.

To be clear, I would like to specify at STDIN or while running the code the number of files and the different cutoff.

Thanks to all of you in advance.


Quote:
#!/usr/bin/perl
$file1="1.txt";
$file2="2.txt";
open(FILE, "$file1") || die "can't: $!";

open(OUTPUT, ">op.txt") || die "can't: $!";
%hash='';
$i=1;
while(<FILE>)
{
$line=$_;
chomp($line);
@a=split(/\s+/,"$line");
open(FILE1,"$file2") || die "can't: $!";
while(<FILE1>)
{
$line1=$_;
chomp($line1);
@b=split(/\s+/,"$line1");
if($a[0] eq $b[0])
{
if($a[1] == $b[1])
{
if($a[2] == $b[2])
{
@x=split(/\./,"$file1");
@y=split(/\./,"$file2");
if(!exists $hash{$line})
{
$hash{$line}=1;
print OUTPUT "$a[0]\t$a[1]\t$a[2]\t$x[0]$y[0].txt\n";
}
}
}
}

if($a[0] eq $b[0])
{
if($a[1] < $b[2])
{
if($a[2] > $b[1])
{
if(!exists $hash{$line})
{
$hash{$line}=1;
print OUTPUT "$a[0]\t$a[1]\t$a[2]\t$file1\n";
}
if(!exists $hash{$line1})
{
$hash{$line1}=1;
print OUTPUT "$b[0]\t$b[1]\t$b[2]\t$file2\n";
}
}
}
}

if($a[0] eq $b[0])
{
if(abs($a[1] - $b[2]) <= 300 || abs($a[2] - $b[1]) <=300)
{
if(!exists $hash{$line})
{
$hash{$line}=1;
print OUTPUT "$a[0]\t$a[1]\t$a[2]\t$file1\n";
}
if(!exists $hash{$line1})
{
$hash{$line1}=1;
print OUTPUT "$b[0]\t$b[1]\t$b[2]\t$file2\n";
}
}
}

}
}
---------- Post updated 02-15-12 at 10:36 AM ---------- Previous update was 02-14-12 at 11:28 AM ----------

Please help me. It is a very important taskSmilieSmilieSmilieSmilieSmilieSmilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Comparing fastq files and outputting common records

I have two files: File_1: @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86 GGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCAGAAGCAGCAT + GGGGGGGGGGGGGGGGGCCGGGGGF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8F ... (3 Replies)
Discussion started by: Xterra
3 Replies

2. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Hello, I need to find the intersection across 10 columns. Kindly help. my file (INPUT.csv) looks like this 4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Discussion started by: Sanchari
1 Replies

3. Shell Programming and Scripting

Shell script to filter records in a zip file that contains matching columns from another file

Not sure if this is the correct forum for this question. I have two files. file1.zip, file2 Input: file1.zip col1, col2 , col3 a , b , 0:0:0:0:0:c436:9346:d40b x, y, 0:0:0:0:0:880:39f9:c9a7 m, n , 0:0:0:0:0:80c7:9161:fe00 file2.txt col1 c4:36:93:46:d4:0b... (1 Reply)
Discussion started by: anil.v
1 Replies

4. Shell Programming and Scripting

Common values in 2 columns in 2 files

Hello, Suppose I have these 2 tab delimited files, where the second column in first file contains matching values from first column of the second file, I would like to get an output like this: File A 1 A 2 B 3 C File B A Apple C Cinnabon B Banana I would like... (1 Reply)
Discussion started by: Mohamed EL Hadi
1 Replies

5. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

6. Shell Programming and Scripting

Two columns-Common records - 20 files

Hi Friends, I have an input file like this cat input1 x 1 y 2 z 3 a 2 b 4 c 6 d 9 cat input2 x 7 h 8 k 9 l 5 m 9 d 12 (5 Replies)
Discussion started by: jacobs.smith
5 Replies

7. UNIX for Dummies Questions & Answers

keeping last record among group of records with common fields (awk)

input: ref.1;rack.1;1 #group1 ref.1;rack.1;2 #group1 ref.1;rack.2;1 #group2 ref.2;rack.3;1 #group3 ref.2;rack.3;2 #group3 ref.2;rack.3;3 #group3 Among records from same group (i.e. with same 1st and 2nd field - separated by ";"), I would need to keep the last record... (5 Replies)
Discussion started by: beca123456
5 Replies

8. Shell Programming and Scripting

Common records

Hi, I have the following files, A M 2 3 B E 4 5 C I 5 6 D O 4 5 A M 3 4 B E 5 2 F U 7 9 J K 2 3 OUTPUT A M 2 3 3 4 B E 4 5 5 2 thanks in advance, (7 Replies)
Discussion started by: jacobs.smith
7 Replies

9. Shell Programming and Scripting

Matching and Merging csv data fields based on a common field

Dear List, I have a file of csv data which has a different line per compliance check per host. I do not want any omissions from this csv data file which looks like this: date,hostname,status,color,check 02-03-2012,COMP1,FAIL,Yellow,auth_pass_change... (3 Replies)
Discussion started by: landossa
3 Replies

10. Shell Programming and Scripting

Common records using AWK

Hi, To be honest, I am really impressed and amazed at the pace I find solutions for un-solved coding mysteries in this forum. I have a file like this input1.txt x y z 1 2 3 a b c 4 -3 7 k l m n 0 p 1 2 a b c 4 input2 x y z 9 0 -1 a b c 0 6 9 k l m 8 o p 1 2 a f x 9 Output... (9 Replies)
Discussion started by: jacobs.smith
9 Replies
Login or Register to Ask a Question