Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Search Forums:



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 02-09-2012
Registered User
 

Join Date: Jan 2012
Posts: 91
Thanks: 63
Thanked 1 Time in 1 Post
Common records after matching on different columns

Hi,

I have the following files.

cat 1.txt

Quote:
chr1 100 200
chr1 200 300
chr1 1000 1200
chr2 300 400
chr2 400 500
chr2 600 900
chr2 1200 1800
chrz 100 200
chrz 300 400
chrz 400 500
cat 2.txt

Quote:
chr1 100 200
chr1 130 220
chr1 498 600
chr1 700 820
chr1 1499 1600
chr1 1800 1920
chr2 301 330
chr2 600 700
chrz 1000 1350
chrz 420 465
output.txt

Quote:
chr1 100 200 12.txt (because this record comes from both 1.txt and 2.txt)
chr1 200 300 1.txt
chr1 1000 1200 1.txt
chr1 130 220 2.txt
chr1 498 600 2.txt
chr1 700 820 2.txt
chr1 1499 1600 2.txt
chr2 300 400 1.txt
chr2 400 500 1.txt
chr2 600 900 1.txt
chr2 301 330 2.txt
chr2 600 700 2.txt
chrz 100 200 1.txt
chrz 300 400 1.txt
chrz 400 500 1.txt
chrz 420 465 2.txt



The logic is as follows....

chr1 in column1 of file1 should be matched to chr1 in column1 of file2.

Any value that is equal or 300 plus/minus range of the value in column2 of file1 matches to column2 of file2, (i.e., if column2 of file1 is 500, then the value in column2 of file2 can be 500, or between 200 and 500, or between 500 and 800) they should be printed.

Any value that is equal or 300 plus/minus range of the value in column3 of file1 matches to column3 of file2, (i.e., if column3 of file1 is 800, then the value in column3 of file2 can be 800, or between 500 and 800, or between 800 and 1100) they should be printed.

Also, anything that is in the range of column2 and column3 should be printed.
Ex: If file 1 has this record chr2 300 400, and file2 has this record chr1 301 383, both of them should be printed.

Each record is matched to each record in both these files.

I am looking for something that can be used across multiple files that are more than two.

Thanks a ton in advance. I know it is a pain. But, please help me.

---------- Post updated 02-09-12 at 09:59 AM ---------- Previous update was 02-08-12 at 02:18 PM ----------

Please guys. Someone help me out.

---------- Post updated at 04:19 PM ---------- Previous update was at 09:59 AM ----------

Any thoughts by anyone?

---------- Post updated at 04:20 PM ---------- Previous update was at 04:19 PM ----------

Any thoughts by anyone?
Sponsored Links
    #2  
Old 02-09-2012
Shell_Life's Avatar
Registered User
 

Join Date: Mar 2007
Location: Bahia, Brazil
Posts: 1,203
Thanks: 1
Thanked 100 Times in 97 Posts
You had a very similar post that I provided a solution using shell scripts:
http://www.unix.com/shell-programmin...condition.html

Change it according to your new specifications.
Sponsored Links
    #3  
Old 02-09-2012
Registered User
 

Join Date: Jan 2012
Posts: 91
Thanks: 63
Thanked 1 Time in 1 Post
That shell script for my earlier solution didn't output some of the records.

Is there another alternative?
    #4  
Old 02-09-2012
Moderator
 

Join Date: Aug 2005
Location: Saskatchewan
Posts: 12,191
Thanks: 232
Thanked 1,703 Times in 1,632 Posts
If you go back and explain exactly how it didn't work, and show the data which doesn't work, it can probably be fixed.

Asking for a whole new solution might leave you with the same problem as before.
Sponsored Links
    #5  
Old 02-09-2012
Registered User
 

Join Date: Jan 2012
Posts: 91
Thanks: 63
Thanked 1 Time in 1 Post
Thanks to both of you for leading me in some way or the other.

The shell script produces the following output.

opfromshell.txt
Quote:
chr1 300 400 1.txt
chr1 350 467 1.txt
But, I was looking for

Originaloutput.txt
Quote:
chr1 300 400 1.txt
chr1 350 467 1.txt
chr1 201 299 2.txt
chr2 800 1000 2.txt
chr2 100 200 2.txt
chr3 500 600 2.txt
Please, no offense. Since the earlier replies in that post worked, I didn't want to bother Shell_life.

Any help is highly appreciated.

Thanks in advance.
Sponsored Links
    #6  
Old 02-10-2012
Shell_Life's Avatar
Registered User
 

Join Date: Mar 2007
Location: Bahia, Brazil
Posts: 1,203
Thanks: 1
Thanked 100 Times in 97 Posts
Quote:
Originally Posted by jacobs.smith View Post
The shell script produces the following output.

opfromshell.txt

Code:
chr1 300 400 1.txt
chr1 350 467 1.txt

But, I was looking for

Originaloutput.txt

Code:
chr1 300 400 1.txt
chr1 350 467 1.txt
chr1 201 299 2.txt
chr2 800 1000 2.txt
chr2 100 200 2.txt
chr3 500 600 2.txt

Please, no offense. Since the earlier replies in that post worked, I didn't want to bother Shell_life.
If you had noticed, I made the two file names as variables for the shell.

This way, to have your desired output, you just run it twice:
1)

Code:
mF1='1.txt'
mF2='2.txt'

2)

Code:
mF1='2.txt'
mF2='1.txt'

Here is the same solution again:


Code:
#!/bin/ksh
typeset -i mFromA mToA mFromB mToB
mF1='1.txt'      ### <======= Change file name here (I)
mF2='2.txt'      ### <======= Change file name here (II)
mPrevTag=''
#### sort is used to reduce the number of "grep"
sort ${mF1} | while read mTagA mFromA mToA; do
  if [[ "${mTagA}" != "${mPrevTag}" ]]; then
    grep "${mTagA}" ${mF2} > ${mF2}.tmp
  fi
  mFound="N"
  while read mTagB mFromB mToB; do
    if [[ ${mToA} -ge ${mFromB} && ${mFromA} -le ${mToB} ]]; then
      mFound="Y"
      break
    fi
  done < ${mF2}.tmp
  if [[ "${mFound}" = "N" ]]; then
    echo ${mTagA} ${mFromA} ${mToA} ${mF1}
  fi
  mPrevTag=${mTagA}
done

Sponsored Links
    #7  
Old 02-15-2012
Registered User
 

Join Date: Jan 2012
Posts: 91
Thanks: 63
Thanked 1 Time in 1 Post
Hi guys,

Somehow I managed to request one of my other friends for a perl script. He was able to write one that could do my task. But, it was only for 2 files.

I would like to request any of you to edit the following code so that it lets me do the task for multiple number of files and multiple cut-offs.

To be clear, I would like to specify at STDIN or while running the code the number of files and the different cutoff.

Thanks to all of you in advance.


Quote:
#!/usr/bin/perl
$file1="1.txt";
$file2="2.txt";
open(FILE, "$file1") || die "can't: $!";

open(OUTPUT, ">op.txt") || die "can't: $!";
%hash='';
$i=1;
while(<FILE>)
{
$line=$_;
chomp($line);
@a=split(/\s+/,"$line");
open(FILE1,"$file2") || die "can't: $!";
while(<FILE1>)
{
$line1=$_;
chomp($line1);
@b=split(/\s+/,"$line1");
if($a[0] eq $b[0])
{
if($a[1] == $b[1])
{
if($a[2] == $b[2])
{
@x=split(/\./,"$file1");
@y=split(/\./,"$file2");
if(!exists $hash{$line})
{
$hash{$line}=1;
print OUTPUT "$a[0]\t$a[1]\t$a[2]\t$x[0]$y[0].txt\n";
}
}
}
}

if($a[0] eq $b[0])
{
if($a[1] < $b[2])
{
if($a[2] > $b[1])
{
if(!exists $hash{$line})
{
$hash{$line}=1;
print OUTPUT "$a[0]\t$a[1]\t$a[2]\t$file1\n";
}
if(!exists $hash{$line1})
{
$hash{$line1}=1;
print OUTPUT "$b[0]\t$b[1]\t$b[2]\t$file2\n";
}
}
}
}

if($a[0] eq $b[0])
{
if(abs($a[1] - $b[2]) <= 300 || abs($a[2] - $b[1]) <=300)
{
if(!exists $hash{$line})
{
$hash{$line}=1;
print OUTPUT "$a[0]\t$a[1]\t$a[2]\t$file1\n";
}
if(!exists $hash{$line1})
{
$hash{$line1}=1;
print OUTPUT "$b[0]\t$b[1]\t$b[2]\t$file2\n";
}
}
}

}
}
---------- Post updated 02-15-12 at 10:36 AM ---------- Previous update was 02-14-12 at 11:28 AM ----------

Please help me. It is a very important task
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Matching and Merging csv data fields based on a common field landossa Shell Programming and Scripting 3 3 Weeks Ago 02:35 AM
Common records using AWK jacobs.smith Shell Programming and Scripting 9 02-02-2012 02:39 AM
How to compare two columns and fetch the common data with additional column evoll Ubuntu 2 05-20-2011 12:15 PM
find common lines using just one column to compare and result with all columns alcalina UNIX for Dummies Questions & Answers 10 04-02-2009 02:00 PM
Comparing the common columns of a table in two files ragavhere Solaris 1 04-11-2008 08:41 AM



All times are GMT -4. The time now is 04:47 AM.