Sponsored Content
Top Forums Shell Programming and Scripting Determining number of overlaps between two files using Hashes? Post 302236482 by avronius on Monday 15th of September 2008 03:47:33 PM
Old 09-15-2008
ok - here's what I have so far.....

(it seems to work - needs some error checking for when there's no value found...)

Good luck - let me know if it comes close to doing what you need!

Code:
#!/usr/bin/perl -w

################################################################################
################################################################################
# What this script does:                                                       #
# This script can be used to compare data overlap                              #
#                                                                              #
# How this script works:                                                       #
# It parses two external files and identifies which line entries fall within   #
# the match statement.                                                         #
#                                                                              #
# Where this script is run (and by whom):                                      #
# This script should be run from a host where the files can be accessed for    #
# comparison                                                                   #
#                                                                              #
# Revision history:                                                            #
# September 15, 2008                                                           #
#    AKG - File creation                                                       #
################################################################################
################################################################################


################################################################################
################################################################################
# Define Pragma                                                                #
################################################################################
################################################################################
use strict;
use Getopt::Std;
use vars qw/ %opt /;
use Time::HiRes qw(gettimeofday);

################################################################################
# Define Variables                                                             #
################################################################################
#my $dir        = "/usr/local/overlap"; # base directory
my $dir        = "/opt/home/agray/scripting/overlap"; # base directory
my $DEBUG;
my @tm;
my $logfile;
my $timeStamp;
my @fileA;
my @fileB;
my $count;
my @lineFile1Array;
my @lineFile2Array;
my @tempStart;
my @tempEnd;
my $line;

################################################################################
# Define Prerequisites  (Require / Include statements go here)                 #
################################################################################

################################################################################
# Forward declaration of subroutines                                           #
################################################################################
sub do_init();              # This manages the command line options
sub do_usage();             # This is the usage message for do_init()

################################################################################
################################################################################
# MAIN                                                                         #
################################################################################
################################################################################

# Parse command line variables                                                 #
do_init();

# set DEBUG flag according to command line options                             #
if ( $opt{d} )
{
   $DEBUG = 1;
}
else
{
   $DEBUG = 0;
}

# Next, create a timestamp and logfile to store information                    #
# getTimestamp
@tm = (localtime($^T))[0..5];
++$tm[4];
$tm[5] += 1900;
$timeStamp = sprintf("%04d%02d%02d.%02d%02d%02d", reverse @tm);
$logfile = "$dir/log/$0.$timeStamp.log";

open LOG, ">> $logfile" or die "Can't open $logfile for write: $!";
print LOG "$timeStamp: running $0\n";


################################################################################
# Open files into array                                                        #
################################################################################

open(FILEA, "<$opt{a}") or die "Cannot open $opt{a} for read :$!";
@fileA = <FILEA>;
close( FILEA );

open(FILEB, "<$opt{b}") or die "Cannot open $opt{b} for read :$!";
@fileB = <FILEB>;
close( FILEB );


# Open the first file
foreach (@fileA)
{
   chomp;
   $count = 0;
   @lineFile1Array = split /\t/,$_;   #split the line into temporary array elements
   my $numberOfElements = $lineFile1Array[4];
   @tempStart = split /,/,$lineFile1Array[5];
   @tempEnd = split /,/,$lineFile1Array[6];
   while ($count < $numberOfElements)
   {
      # changed this to lessthan - the number in the file might be 6 but we have array elements 0-5
      # We grab the first element from [5] and the first element from [6]
      foreach (@fileB)
      {
         @lineFile2Array = split /\t/,$_;
         if (($lineFile2Array[1] >= $tempStart[$count]) && ($lineFile2Array[2] <= $tempEnd[$count]))
         {
            print LOG "Match!   $lineFile2Array[1] ~ $tempStart[$count]\n";
            print LOG "Match!   $lineFile2Array[2] ~ $tempEnd[$count]\n";
         }
      # when done evaluating that element
      }
   # If no match found, (or when done evaluating that line) move on to the next line in file2
   $count++;     # We increment afterward, so that the next time that the evaluation
                 # of $count > $lineArray[4], we should stop if we've reached the
                 # number of pairs for this line.
   }
# If no match found, move on to the next line in file2
}

close( LOG );




################################################################################
################################################################################
# Subroutines                                                                  #
################################################################################
################################################################################
sub do_init()
{
   my $opt_string = 'hda:b:';
   getopts( "$opt_string", \%opt ) or do_usage();
   do_usage() if $opt{h};
   do_usage() unless ($opt{a} && $opt{b});
}


sub do_usage()
{
   print "\nusage: $0 [-h] [-d] [-a file1] [-b file2]\n\n";
   #############################################################################
   print "\n\n";
   exit;
}

Here are the results based on the two files that you provided (You'll likely want to include more information - this is just a sample)
Code:
20080915.134301: running ./overlap.pl
Match!   100208130 ~ 100208127
Match!   100208166 ~ 100208306
Match!   100231689 ~ 100231680
Match!   100231725 ~ 100231885

 

10 More Discussions You Might Find Interesting

1. Programming

determining the object files...

hello, is there a utility to determine which object files are used to create a binary executable file?let me explain, please: for ex. there are three files: a.o b.o c.o and these files are used to create a binary called: prg namely, a.o b.o c.o -> prg so, how can i determine these three... (1 Reply)
Discussion started by: xyzt
1 Replies

2. Shell Programming and Scripting

Perl Hashes, reading and hashing 2 files

So I have two files that I want to put together via hashes and am having a terrible time with syntax. For example: File1 A apple B banana C citrusFile2 A red B yellow C orangeWhat I want to enter on the command line is: program.pl File1 File2And have the result... (11 Replies)
Discussion started by: silkiechicken
11 Replies

3. Shell Programming and Scripting

Creating Hashes of Hashes of Array

Hi folks, I have a structure as mentioned below in a configuration file. <Component> Comp1: { item1:data,someUniqueAttribute; item2:data,someUniqueAttribute, } Comp2: { item3:data,someUniqueAttribute; ... (1 Reply)
Discussion started by: ckv84
1 Replies

4. UNIX for Dummies Questions & Answers

Determining file size for a list of files with paths

Hello, I have a flat file with a list of files with the path to the file and I am attempting to calculate the filesize for each one; however xargs isn't playing nicely and I am sure there is probably a better way of doing this. What I envisioned is this: cat filename|xargs -i ls -l {} |awk... (4 Replies)
Discussion started by: joe8mofo
4 Replies

5. Shell Programming and Scripting

awk? create similarity matrix by calculating overlaps between sets comprising of individual parts

Hi everyone I am very new at awk and to me the task I need to get done is very very challenging... Nevertheless, after admiring how fast and elegant issues are being solved here I am sure this is my best chance. I have a 2D data file (input file is a plain tab-delimited text file). The first... (1 Reply)
Discussion started by: stonemonkey
1 Replies

6. Shell Programming and Scripting

Compare values of hashes of hash for n number of hash in perl without sorting.

Hi, I have an hashes of hash, where hash is dynamic, it can be n number of hash. i need to compare data_count values of all . my %result ( $abc => { 'data_count' => '10', 'ID' => 'ABC122', } $def => { 'data_count' => '20', 'ID' => 'defASe', ... (1 Reply)
Discussion started by: asak
1 Replies

7. Red Hat

Crontab: overlaps

I'm using CentOS 6.3 and I use a crontab entries like this: 0 23 2-31 * 1-6 root weekdayscript 0 23 1 * 7 root weekendscript this 2 entries always overlaps... but I don't know how... :wall: thanks (10 Replies)
Discussion started by: ionral
10 Replies

8. Shell Programming and Scripting

How to count number of files in directory and write to new file with number of files and their name?

Hi! I just want to count number of files in a directory, and write to new text file, with number of files and their name output should look like this,, assume that below one is a new file created by script Number of files in directory = 25 1. a.txt 2. abc.txt 3. asd.dat... (20 Replies)
Discussion started by: Akshay Hegde
20 Replies

9. Solaris

Determining number of hard disks in the system

Hello to all, what is the command in Solaris/Unix which I can use to determine how many hard disks exist in the system? I have tried with different command such as df -lk and similar but cannot know for sure how many actual disks are installed. Commands like # fdisk -l | grep Disk and #... (14 Replies)
Discussion started by: Mick
14 Replies

10. Shell Programming and Scripting

Base64 conversion in awk overlaps

hi, problem: output is not consistent as expected using external command in AWK description: I'm trying to convert $2 into a base64 string for later decoding, and for this when I use awk , I'm getting overlapped results , or say it results are not 100% correct. my code is: gawk... (9 Replies)
Discussion started by: busyboy
9 Replies
All times are GMT -4. The time now is 01:50 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy