Sponsored Content
Top Forums UNIX for Dummies Questions & Answers compare 2 very large lists of different length Post 302385815 by uiop44 on Sunday 10th of January 2010 05:55:23 AM
Old 01-10-2010
compare 2 very large lists of different length

I have two very large datasets (>100MB) in a simple vertical list format. They are of different size and with different order and formatting (e.g. whitespace and some other minor cruft that would thwart easy regex).

Let's call them set1 and set2.

I want to check set2 to see if it contains any of the data entries in set1. I think of this as individual greps of set2 using each line of set1.

(NB- I could, with some work, manipulate the sets to make the order and formatting the same.)

In your opinion, what is the best tool to use for this search of set2 using the data in set1?

- comm?
- a looping shell script, or xargs, that calls grep?
- grep -f?
- diff?
- combine the sets (after making format the same) then sort and print only duplicate lines? uniq -d, sed or awk
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare lists of files

If I had a list of numbers in two different files, what would be the fastest and easiest way to find out which numbers in list B are not in list A without reading each number in list B one at a time and using grep thousands of times against list A? I have two very long lists of numbers and the... (4 Replies)
Discussion started by: keelba
4 Replies

2. UNIX for Dummies Questions & Answers

Sed working on lines of small length and not large length

Hi , I have a peculiar case, where my sed command is working on a file which contains lines of small length. sed "s/XYZ:1/XYZ:3/g" abc.txt > xyz.txt when abc.txt contains lines of small length(currently around 80 chars) , this sed command is working fine. when abc.txt contains lines of... (3 Replies)
Discussion started by: thanuman
3 Replies

3. UNIX for Dummies Questions & Answers

Compare 2 lists using a full and/or partial match at beginning of line?

hello all, I wonder if anybody might be able to help with this. I have file 1 and file2. Both files may contain thousands of lines that have variable contents. file1 234GH 5234BTW 89er 678tfg 234 234YT tfg456 wert 78gt gh23444 (7 Replies)
Discussion started by: Garrred
7 Replies

4. Shell Programming and Scripting

How to make bash wrapper for java/groovy program with variable length arguments lists?

The following bash script does not work because the java/groovy code always thinks there are four arguments even if there are only 1 or 2. As you can see from my hideous backslashes, I am using cygwin bash on windows. export... (1 Reply)
Discussion started by: siegfried
1 Replies

5. Programming

Python: Compare 2 word lists

Hi. I am trying to write a Python programme that compares two different text files which both contain a list of words. Each word has its own line worda wordb wordc I want to compare textfile 2 with textfile 1, and if there's a word in textfile 2 that is NOT in textfile 1, I want to... (6 Replies)
Discussion started by: Bloomy
6 Replies

6. Shell Programming and Scripting

Comparison between 2 large lists with Getting VALUES from one into the other

hi, I have 2 large lists: LIST A: containes 6 fields of many entries (VARIABLE number), like: 2011-07-10 | 18:19:47 | 38037300 | 9647808003122 | 2 | success LIST B: containes 3 fields & 183 entries (FIXED number), like: 9647805651885 9647805651885 SCP_10 What I want is a... (8 Replies)
Discussion started by: amurib
8 Replies

7. Shell Programming and Scripting

Bash script to compare two lists

Hi, I do little bash scripting so sorry for my ignorance. How do I compare if the two variable not match and if they do not match run a command. I was thinking a for loop but then I need another for loop for the 2nd list and I do not think that would work as in the real world there could... (2 Replies)
Discussion started by: GermanJulian
2 Replies

8. Shell Programming and Scripting

Compare two lists with perl

Hi everybody! I'm trying to delete some elements from a list with two elements on each row agreeing with the elements in another list. Pratically I want a perl script able to take each element of the second list (that is a single column list), compare it with both elements of each row from the... (3 Replies)
Discussion started by: gabrysfe
3 Replies

9. Shell Programming and Scripting

compare two lists on two files

I have two files A and B listing ip addresses and all the ip addresses in B are in A, and A includes other ip addresses now I want to get the list of the ip addresses that are in A but not in B how to achieve this? thanks (1 Reply)
Discussion started by: esolvepolito
1 Replies

10. Homework & Coursework Questions

[Python] Compare 2 lists

Hello, I'm new to the python programming, and I have a question. I have to write a program that prints a receipt for a restaurant. The input is a list which looks like: product1 product3 product8 .... In the other input file there is a list which looks like: product1 coffee 5,00... (1 Reply)
Discussion started by: dagendy
1 Replies
af_sets(3)						    Attribute Filesystem (AtFS) 						af_sets(3)

NAME
af_initset, af_nrofkeys, af_setgkey, af_setaddkey, af_setrmkey, af_setposrmkey, af_sortset, af_subset, af_copyset, af_intersect, af_union, af_diff - AtFS operations on key sets SYNOPSIS
#include <atfs.h> int af_initset (Af_set *set) int af_nrofkeys (Af_set *set) int af_setgkey (Af_set *set, int position, Af_key *key) int af_setaddkey (Af_set *set, int position, Af_key *key) int af_setrmkey (Af_set *set, Af_key *key) int af_setposrmkey (Af_set *set, int position) int af_sortset (Af_set *set, char *attrname) int af_subset (Af_set *set, Af_attrs *attrbuf, Af_set *subset) int af_copyset (Af_set *source, Af_set *destination) int af_intersect (Af_set *set1, Af_set *set2, Af_set *newset) int af_union (Af_set *set1, Af_set *set2, Af_set *newset) int af_diff (Af_set *set1, Af_set *set2, *Af_set newset) DESCRIPTION
Sets in AtFS are ordered collections of keys. The structure of sets is the following typedef struct { int af_nkeys; int af_setlen; Af_key *af_klist; } Af_set; The list of keys in a set is a linear list, residing in allocated memory. The list has no holes, so that positions 0 through af_nkeys-1 are occupied with valid keys. Set functions returning a set require a pointer to an empty set structure as argument. af_initset initializes a set. af_nrofkeys returns the number of valid keys in the given set. af_setgkey delivers the filekey, stored at position position in the identified set. The result is passed in the buffer key. Typically you use af_setgkey to run through a set and perform a special action on each key. The following code sequence does this job: Af_key key; Af_set set; af_initset (&set); ... for (i = 0; i < af_nrofkeys (&set); i++) { af_setgkey (&set, i, &key); /* process key */ ... } af_setaddkey introduces a new filekey to an existing set at the given position. All following keys are moved back by one position. The constant AF_LASTPOS given as position argument leads to adding the new filekey at the end of the set. af_setrmkey (af_setposrmkey) removes the given filekey (the filekey at position position) from the specified set. Holes generated by delet- ing single keys from a set are eliminated by condensing the set. All following keys are moved one position forth in the set. af_sortset sorts a given set of object keys by the values of the named attribute. The set is sorted in increasing order. Increasing order means, that the lowest value occurs first in the set. Af_user structures are compared by username first and by userdomain, if the names are equal (user host will not be taken into account). Version numbers are ordered in natural order, busy versions first. In atfs.h you can find a list of attribute names naming the standard attributes. All other attribute names are presumed to be user defined attributes. While sorting by the values of an user defined attribute, all ASOs that do not have the named attribute are added at the end of the resulting (sorted) set. Sorting of user defined attributes with multiple values bases on simple text comparison with the order of the values taken as it is. The length of the given attribute name is limited. This limit is defined by the constant AF_UDANAMLEN in atfs.h. af_subset does a retrieve operation (similar to af_find - manual page af_retrieve(3)) on a given set of object keys. Af_subset takes an attribute buffer (attrbuf) with all desired attributes set to an appropriate value as argument. The attribute buffer should be initialized by af_initattrs (manual page af_retrieve(3)) beforehand. af_subset returns it's result in a new set, the original set remains unchanged. af_copyset for copying sets (really! =:-). af_intersect, af_union and af_diff build intersections, unions, and differences between two sets. The result is a new set, where all keys taken from the first argument set (set1) occur first, and the keys from the second argument set (set2) afterwards. You may gibe one of set1 or set2 as resultset. In that case, the original set get lost and is dropped implicitely. Sets generated by af_copyset, af_subset, af_intersect, af_union, or af_diff should be released by af_dropset as soon as they are not used any longer. SEE ALSO
af_retrieve(3) DIAGNOSTICS
Upon error, -1 or a nil pointer (depending on the return type) is returned and af_errno is set to the corresponding error number. AtFS-1.71 Fri Jun 25 14:33:20 1993 af_sets(3)
All times are GMT -4. The time now is 03:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy