Sponsored Content
Top Forums Shell Programming and Scripting Determining number of overlaps between two files using Hashes? Post 302236095 by labrazil on Sunday 14th of September 2008 08:03:43 PM
Old 09-14-2008
Determining number of overlaps between two files using Hashes?

Hi there,

I have a doubt about how to set this up. This is the situation.

I have two files, one that is ~31,000 in length and has the following information (7 fields):
file1
Code:
1    +       100208127       100261594       6       100208127,100231680,100237404,100245177,100249508,100260529,    100208306,100231885,100237559,100245300,100249677,100261594,
1    +       100217082       100217185       1       100217082,      100217185,
1    +       100276376       100321515       12      100276376,100288052,100296809,100298021,100299978,100306120,100306616,100307757,100315308,100316594,100318639,100320146,        100276460,100288148,100296872,100298149,100300093,100306339,100306730,100307829,100315421,100316692,100318803,100321515,

the 5th field is important and it explains the number of segments represented in fields 6 and 7. So for example, the first line shows 6, so if you took the first number of field 6 this would represent the start of the first segment and the first number of field 7 would represent the end of the first segment, and so on till you have the total 6 segments. The second line for example shows only 1 in field 5 and hence there's only one segment starting at 100217082 and ending at 100217185.

the second file I have is variable in length and can be from 3,000,000 to 10,000,000 lines. The format contains 4 fields:
file2
Code:
1    100208130       100208166       +
1    100208310       100208346       +
1    100217090       100217126       +
1    100231689       100231725       +

As you can see, field 2 and 3 is just a difference of 36 numbers and I want to know how many times each line in file2 is contained within file1 specifically when looking at the segments (remember each line in file1 has different numbers of segments above, e.g. 6, 1, and 12 as represented in field 5).

So if I use these two files to generate my output, my output would tell me:

There are 3 lines from file2 that matches or overlaps segments in file1 and 1 line from file2 that DOESNOT match or overlap segments in file1.

Code:
YES 1    100208130       100208166       +
NO 1    100208310       100208346       +
YES 1    100217090       100217126       +
YES 1    100231689       100231725       +

To get this kind of computation, do you think it's important to use hashes for the first file or second file and if so, how would I set this up? Can someone assist here? Thanks!
 

10 More Discussions You Might Find Interesting

1. Programming

determining the object files...

hello, is there a utility to determine which object files are used to create a binary executable file?let me explain, please: for ex. there are three files: a.o b.o c.o and these files are used to create a binary called: prg namely, a.o b.o c.o -> prg so, how can i determine these three... (1 Reply)
Discussion started by: xyzt
1 Replies

2. Shell Programming and Scripting

Perl Hashes, reading and hashing 2 files

So I have two files that I want to put together via hashes and am having a terrible time with syntax. For example: File1 A apple B banana C citrusFile2 A red B yellow C orangeWhat I want to enter on the command line is: program.pl File1 File2And have the result... (11 Replies)
Discussion started by: silkiechicken
11 Replies

3. Shell Programming and Scripting

Creating Hashes of Hashes of Array

Hi folks, I have a structure as mentioned below in a configuration file. <Component> Comp1: { item1:data,someUniqueAttribute; item2:data,someUniqueAttribute, } Comp2: { item3:data,someUniqueAttribute; ... (1 Reply)
Discussion started by: ckv84
1 Replies

4. UNIX for Dummies Questions & Answers

Determining file size for a list of files with paths

Hello, I have a flat file with a list of files with the path to the file and I am attempting to calculate the filesize for each one; however xargs isn't playing nicely and I am sure there is probably a better way of doing this. What I envisioned is this: cat filename|xargs -i ls -l {} |awk... (4 Replies)
Discussion started by: joe8mofo
4 Replies

5. Shell Programming and Scripting

awk? create similarity matrix by calculating overlaps between sets comprising of individual parts

Hi everyone I am very new at awk and to me the task I need to get done is very very challenging... Nevertheless, after admiring how fast and elegant issues are being solved here I am sure this is my best chance. I have a 2D data file (input file is a plain tab-delimited text file). The first... (1 Reply)
Discussion started by: stonemonkey
1 Replies

6. Shell Programming and Scripting

Compare values of hashes of hash for n number of hash in perl without sorting.

Hi, I have an hashes of hash, where hash is dynamic, it can be n number of hash. i need to compare data_count values of all . my %result ( $abc => { 'data_count' => '10', 'ID' => 'ABC122', } $def => { 'data_count' => '20', 'ID' => 'defASe', ... (1 Reply)
Discussion started by: asak
1 Replies

7. Red Hat

Crontab: overlaps

I'm using CentOS 6.3 and I use a crontab entries like this: 0 23 2-31 * 1-6 root weekdayscript 0 23 1 * 7 root weekendscript this 2 entries always overlaps... but I don't know how... :wall: thanks (10 Replies)
Discussion started by: ionral
10 Replies

8. Shell Programming and Scripting

How to count number of files in directory and write to new file with number of files and their name?

Hi! I just want to count number of files in a directory, and write to new text file, with number of files and their name output should look like this,, assume that below one is a new file created by script Number of files in directory = 25 1. a.txt 2. abc.txt 3. asd.dat... (20 Replies)
Discussion started by: Akshay Hegde
20 Replies

9. Solaris

Determining number of hard disks in the system

Hello to all, what is the command in Solaris/Unix which I can use to determine how many hard disks exist in the system? I have tried with different command such as df -lk and similar but cannot know for sure how many actual disks are installed. Commands like # fdisk -l | grep Disk and #... (14 Replies)
Discussion started by: Mick
14 Replies

10. Shell Programming and Scripting

Base64 conversion in awk overlaps

hi, problem: output is not consistent as expected using external command in AWK description: I'm trying to convert $2 into a base64 string for later decoding, and for this when I use awk , I'm getting overlapped results , or say it results are not 100% correct. my code is: gawk... (9 Replies)
Discussion started by: busyboy
9 Replies
PAPS(1) 						      General Commands Manual							   PAPS(1)

NAME
paps - UTF-8 to PostScript converter using Pango SYNOPSIS
paps [options] files... DESCRIPTION
paps reads a UTF-8 encoded file and generates a PostScript language rendering of the file. The rendering is done by creating outline curves through the pango ft2 backend. OPTIONS
These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. --landscape Landscape output. Default is portrait. --columns=cl Number of columns output. Default is 1. --font=desc Set the font description. Default is Monospace 12. --rtl Do rtl layout. --paper ps Choose paper size. Known paper sizes are legal, letter, a4. Default is A4. --bottom-margin=bm Set bottom margin in postscript points (1/72 inch). Default is 36. --top-margin=tm Set top margin. Default is 36. --left-margin=lm Set left margin. Default is 36. --right-margin=rm Set right margin. Default is 36. --help Show summary of options. --header Draw page header for each page. --markup Interpret the text as pango markup. --encoding=ENCODING Assume the documentation encoding is ENCODING. --lpi Set the lines per inch. This determines the line spacing. --cpi Set the characters per inch. This is an alternative method of specifying the font size. --stretch-chars Indicates that characters should be stretched in the y-direction to fill up their vertical space. This is similar to the texttops behaviour. AUTHOR
paps was written by Dov Grobgeld <dov.grobgeld@gmail.com>. This manual page was written by Lior Kaplan <kaplan@debian.org>, for the Debian project (but may be used by others). April 17, 2006 PAPS(1)
All times are GMT -4. The time now is 09:24 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy