Sponsored Content
Top Forums UNIX for Dummies Questions & Answers compare 2 very large lists of different length Post 302385815 by uiop44 on Sunday 10th of January 2010 05:55:23 AM
Old 01-10-2010
compare 2 very large lists of different length

I have two very large datasets (>100MB) in a simple vertical list format. They are of different size and with different order and formatting (e.g. whitespace and some other minor cruft that would thwart easy regex).

Let's call them set1 and set2.

I want to check set2 to see if it contains any of the data entries in set1. I think of this as individual greps of set2 using each line of set1.

(NB- I could, with some work, manipulate the sets to make the order and formatting the same.)

In your opinion, what is the best tool to use for this search of set2 using the data in set1?

- comm?
- a looping shell script, or xargs, that calls grep?
- grep -f?
- diff?
- combine the sets (after making format the same) then sort and print only duplicate lines? uniq -d, sed or awk
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare lists of files

If I had a list of numbers in two different files, what would be the fastest and easiest way to find out which numbers in list B are not in list A without reading each number in list B one at a time and using grep thousands of times against list A? I have two very long lists of numbers and the... (4 Replies)
Discussion started by: keelba
4 Replies

2. UNIX for Dummies Questions & Answers

Sed working on lines of small length and not large length

Hi , I have a peculiar case, where my sed command is working on a file which contains lines of small length. sed "s/XYZ:1/XYZ:3/g" abc.txt > xyz.txt when abc.txt contains lines of small length(currently around 80 chars) , this sed command is working fine. when abc.txt contains lines of... (3 Replies)
Discussion started by: thanuman
3 Replies

3. UNIX for Dummies Questions & Answers

Compare 2 lists using a full and/or partial match at beginning of line?

hello all, I wonder if anybody might be able to help with this. I have file 1 and file2. Both files may contain thousands of lines that have variable contents. file1 234GH 5234BTW 89er 678tfg 234 234YT tfg456 wert 78gt gh23444 (7 Replies)
Discussion started by: Garrred
7 Replies

4. Shell Programming and Scripting

How to make bash wrapper for java/groovy program with variable length arguments lists?

The following bash script does not work because the java/groovy code always thinks there are four arguments even if there are only 1 or 2. As you can see from my hideous backslashes, I am using cygwin bash on windows. export... (1 Reply)
Discussion started by: siegfried
1 Replies

5. Programming

Python: Compare 2 word lists

Hi. I am trying to write a Python programme that compares two different text files which both contain a list of words. Each word has its own line worda wordb wordc I want to compare textfile 2 with textfile 1, and if there's a word in textfile 2 that is NOT in textfile 1, I want to... (6 Replies)
Discussion started by: Bloomy
6 Replies

6. Shell Programming and Scripting

Comparison between 2 large lists with Getting VALUES from one into the other

hi, I have 2 large lists: LIST A: containes 6 fields of many entries (VARIABLE number), like: 2011-07-10 | 18:19:47 | 38037300 | 9647808003122 | 2 | success LIST B: containes 3 fields & 183 entries (FIXED number), like: 9647805651885 9647805651885 SCP_10 What I want is a... (8 Replies)
Discussion started by: amurib
8 Replies

7. Shell Programming and Scripting

Bash script to compare two lists

Hi, I do little bash scripting so sorry for my ignorance. How do I compare if the two variable not match and if they do not match run a command. I was thinking a for loop but then I need another for loop for the 2nd list and I do not think that would work as in the real world there could... (2 Replies)
Discussion started by: GermanJulian
2 Replies

8. Shell Programming and Scripting

Compare two lists with perl

Hi everybody! I'm trying to delete some elements from a list with two elements on each row agreeing with the elements in another list. Pratically I want a perl script able to take each element of the second list (that is a single column list), compare it with both elements of each row from the... (3 Replies)
Discussion started by: gabrysfe
3 Replies

9. Shell Programming and Scripting

compare two lists on two files

I have two files A and B listing ip addresses and all the ip addresses in B are in A, and A includes other ip addresses now I want to get the list of the ip addresses that are in A but not in B how to achieve this? thanks (1 Reply)
Discussion started by: esolvepolito
1 Replies

10. Homework & Coursework Questions

[Python] Compare 2 lists

Hello, I'm new to the python programming, and I have a question. I have to write a program that prints a receipt for a restaurant. The input is a list which looks like: product1 product3 product8 .... In the other input file there is a list which looks like: product1 coffee 5,00... (1 Reply)
Discussion started by: dagendy
1 Replies
DateTime::Span(3pm)					User Contributed Perl Documentation				       DateTime::Span(3pm)

NAME
DateTime::Span - Datetime spans SYNOPSIS
use DateTime; use DateTime::Span; $date1 = DateTime->new( year => 2002, month => 3, day => 11 ); $date2 = DateTime->new( year => 2003, month => 4, day => 12 ); $set2 = DateTime::Span->from_datetimes( start => $date1, end => $date2 ); # set2 = 2002-03-11 until 2003-04-12 $set = $set1->union( $set2 ); # like "OR", "insert", "both" $set = $set1->complement( $set2 ); # like "delete", "remove" $set = $set1->intersection( $set2 ); # like "AND", "while" $set = $set1->complement; # like "NOT", "negate", "invert" if ( $set1->intersects( $set2 ) ) { ... # like "touches", "interferes" if ( $set1->contains( $set2 ) ) { ... # like "is-fully-inside" # data extraction $date = $set1->start; # first date of the span $date = $set1->end; # last date of the span DESCRIPTION
"DateTime::Span" is a module for handling datetime spans, otherwise known as ranges or periods ("from X to Y, inclusive of all datetimes in between"). This is different from a "DateTime::Set", which is made of individual datetime points as opposed to a range. There is also a module "DateTime::SpanSet" to handle sets of spans. METHODS
o from_datetimes Creates a new span based on a starting and ending datetime. A 'closed' span includes its end-dates: $span = DateTime::Span->from_datetimes( start => $dt1, end => $dt2 ); An 'open' span does not include its end-dates: $span = DateTime::Span->from_datetimes( after => $dt1, before => $dt2 ); A 'semi-open' span includes one of its end-dates: $span = DateTime::Span->from_datetimes( start => $dt1, before => $dt2 ); $span = DateTime::Span->from_datetimes( after => $dt1, end => $dt2 ); A span might have just a beginning date, or just an ending date. These spans end, or start, in an imaginary 'forever' date: $span = DateTime::Span->from_datetimes( start => $dt1 ); $span = DateTime::Span->from_datetimes( end => $dt2 ); $span = DateTime::Span->from_datetimes( after => $dt1 ); $span = DateTime::Span->from_datetimes( before => $dt2 ); You cannot give both a "start" and "after" argument, nor can you give both an "end" and "before" argument. Either of these conditions will cause the "from_datetimes()" method to die. To summarize, a datetime passed as either "start" or "end" is included in the span. A datetime passed as either "after" or "before" is excluded from the span. o from_datetime_and_duration Creates a new span. $span = DateTime::Span->from_datetime_and_duration( start => $dt1, duration => $dt_dur1 ); $span = DateTime::Span->from_datetime_and_duration( after => $dt1, hours => 12 ); The new "end of the set" is open by default. o clone This object method returns a replica of the given object. o set_time_zone( $tz ) This method accepts either a time zone object or a string that can be passed as the "name" parameter to "DateTime::TimeZone->new()". If the new time zone's offset is different from the old time zone, then the local time is adjusted accordingly. If the old time zone was a floating time zone, then no adjustments to the local time are made, except to account for leap seconds. If the new time zone is floating, then the UTC time is adjusted in order to leave the local time untouched. o duration The total size of the set, as a "DateTime::Duration" object, or as a scalar containing infinity. Also available as "size()". o start o end First or last dates in the span. It is possible that the return value from these methods may be a "DateTime::Infinite::Future" or a "DateTime::Infinite::Past"xs object. If the set ends "before" a date $dt, it returns $dt. Note that in this case $dt is not a set element - but it is a set boundary. o start_is_closed o end_is_closed Returns true if the first or last dates belong to the span ( begin <= x <= end ). o start_is_open o end_is_open Returns true if the first or last dates are excluded from the span ( begin < x < end ). o union o intersection o complement Set operations may be performed not only with "DateTime::Span" objects, but also with "DateTime::Set" and "DateTime::SpanSet" objects. These set operations always return a "DateTime::SpanSet" object. $set = $span->union( $set2 ); # like "OR", "insert", "both" $set = $span->complement( $set2 ); # like "delete", "remove" $set = $span->intersection( $set2 ); # like "AND", "while" $set = $span->complement; # like "NOT", "negate", "invert" o intersects o contains These set functions return a boolean value. if ( $span->intersects( $set2 ) ) { ... # like "touches", "interferes" if ( $span->contains( $dt ) ) { ... # like "is-fully-inside" These methods can accept a "DateTime", "DateTime::Set", "DateTime::Span", or "DateTime::SpanSet" object as an argument. SUPPORT
Support is offered through the "datetime@perl.org" mailing list. Please report bugs using rt.cpan.org AUTHOR
Flavio Soibelmann Glock <fglock@gmail.com> The API was developed together with Dave Rolsky and the DateTime Community. COPYRIGHT
Copyright (c) 2003-2006 Flavio Soibelmann Glock. All rights reserved. This program is free software; you can distribute it and/or modify it under the same terms as Perl itself. The full text of the license can be found in the LICENSE file included with this module. SEE ALSO
Set::Infinite For details on the Perl DateTime Suite project please see <http://datetime.perl.org>. perl v5.12.4 2011-08-22 DateTime::Span(3pm)
All times are GMT -4. The time now is 02:02 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy