05-20-2013
Compare multiple files, identify common records and combine unique values into one file
Good morning all,
I have a problem that is one step beyond a standard awk compare.
I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated in the tree files vis a vis the fourth
What I want to see is:
1) The number of records that is unique in each of three (not in any of the others),
2) The number of records that is not unique in each of three,
3) the number of records in the fourth that is NOT in any of the other three;
4) An output file with the full row of each unique record across all the files
These are all text files.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I need to compile a large amount of data with a common string from individual text files throughout many directories.
An example data file is below. I want to search for the following string, "cc_sectors_1" and combine all the data from each file which contains this string, into one new... (2 Replies)
Discussion started by: GradStudent2010
2 Replies
2. Shell Programming and Scripting
Hi friends,
I have multiple files. For now, let's say I have two of the following style
cat 1.txt
cat 2.txt
output.txt
Please note that my files are not sorted and in the output file I need another extra column that says the file from which it is coming. I have more than 100... (19 Replies)
Discussion started by: jacobs.smith
19 Replies
3. Shell Programming and Scripting
- I have two files (File 1 and File 2) and the contents of the files are mentioned below.
- I am trying to compare the values of Column1 of File1 with Column1 of File2. If a match is found, print the corresponding value from Column2 of File1 in Column5 of File2.
- I tried to modify and use... (10 Replies)
Discussion started by: Santoshbn
10 Replies
4. Shell Programming and Scripting
Hi All,
I have multiple (5+) text files with single columns and I would like to grep the common values across all the text files and parse it to a new file. All the values are numerical. Please let me know how to do it using awk. (6 Replies)
Discussion started by: Lucky Ali
6 Replies
5. Shell Programming and Scripting
I can't decide if I should use AWK or PERL after pouring over these forums for hours today I decided I'd post something and see if I couldn't get some advice.
I've got a text file full of hundreds of events in this format:
Record Number : 1
Records in Seq : ... (3 Replies)
Discussion started by: Mayday22
3 Replies
6. Shell Programming and Scripting
I have this code
awk 'NR==FNR{a=$1;next} a' file1 file2
which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones... (7 Replies)
Discussion started by: castrojc
7 Replies
7. Shell Programming and Scripting
Hi,
I have multiple files that each contain one column of strings:
File1:
123abc
456def
789ghi
File2:
123abc
456def
891jkl
File3:
234mno
123abc
456def
In total I have 25 of these type of file. (5 Replies)
Discussion started by: owwow14
5 Replies
8. Shell Programming and Scripting
Hi,
I have 5 files with two columns. I need to merge all the 5 files based on column 1. If any of them are missing then corresponding 2nd column should be substituted by missing value.
I know hoe to do this for 2 files. but how can I implement for 5 files. I tried this based on 5 files but it... (2 Replies)
Discussion started by: Diya123
2 Replies
9. Shell Programming and Scripting
Looking for a little help here.
I have 1000's of text files within a multiple folders.
YYYY/
/MM
/1000's Files
Eg.
2014/01/1000 files
2014/02/1237 files
2014/03/1400 files
There are folders for each year and each month, and within each monthly folder there are... (4 Replies)
Discussion started by: whegra
4 Replies
10. Shell Programming and Scripting
Hi,
I have a huge unsorted text file. We wanted to identify the unique field values in a line and consider those fields as a primary key for a table in upstream system.
Basically, the process or script should fetch the values from each line that are unique compared to the rest of the lines in... (13 Replies)
Discussion started by: manikandan23
13 Replies
LEARN ABOUT DEBIAN
data::compare::plugins
Data::Compare::Plugins(3pm) User Contributed Perl Documentation Data::Compare::Plugins(3pm)
NAME
Data::Compare::Plugins - how to extend Data::Compare
DESCRIPTION
Data::Compare natively handles several built-in data types - scalars, references to scalars, references to arrays, references to hashes,
references to subroutines, compiled regular expressions, and globs. For objects, it tries to Do The Right Thing and compares the
underlying data type. However, this is not always what you want. This is especially true if you have complex objects which overload
stringification and/or numification.
Hence we allow for plugins.
FINDING PLUGINS
Data::Compare will try to load any module installed on your system under the various @INC/Data/Compare/Plugins/ directories. If there is a
problem loading any of them, an appropriate warning will be issued.
Because of how we find plugins, no plugins are available when running in "taint" mode.
WRITING PLUGINS
Internally, plugins are "require"d into Data::Compare. This means that they need to evaluate to true. We make use of that true value.
Where normally you just put:
1;
at the end of an included file, you should instead ensure that you return a reference to an array. This is treated as being true so
satisfies perl, and is a damned sight more useful.
Inside that array should be either a description of what this plugin is to do, or references to several arrays containing such
descriptions. A description consists of two or three items. First a string telling us what the first data-type handled by your plugin is.
Second, (and optional, defaulting to the same as the first) the second data-type to compare. To handle comparisons to ordinary scalars,
give the empty string for the data-type, ie:
['MyType', '', sub { ...}]
Third and last, we need a reference to the subroutine which does the comparison. That subroutine should expect to take two parameters,
which will be of the specified type. It should return 1 if they compare the same, or 0 if they compare different.
Be aware that while you might give a description like:
['Type1', 'Type2', sub { ... }]
this will handle both comparing Type1 to Type2, and comparing Type2 to Type1. ie, comparison is commutative.
If you want to use Data::Compare's own comparison function from within your handler (to, for example, compare a data structure that you
have stored somewhere in your object) then you will need to call it as Data::Compare::Compare. However, you must be careful to avoid
infinite recursion by calling D::C::Compare which in turn calls back to your handler.
The name of your plugins does not matter, only that it lives in one of those directories. Of course, giving it a sensible name means that
the usual installation mechanisms will put it in the right place, and meaningful names will make it easier to debug your code.
For an example, look at the plugin that handles Scalar::Properties objects, which is distributed with Data::Compare.
DISTRIBUTION
Provided that the above rules are followed I see no reason for you to not upload your plugin to the CPAN yourself. You will need to make
Data::Compare a pre-requisite, so that the CPAN.pm installer does the right thing.
Alternatively, if you would prefer me to roll your plugin in with the Data::Compare distribution, I'd be happy to do so provided that the
code is clear and well-commented, and that you include tests and documentation.
SEE ALSO
Data::Compare
Data::Compare::Plugins::Scalar::Properties
AUTHOR
Copyright (c) 2004 David Cantrell <david@cantrell.org.uk>. All rights reserved. This program is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.
perl v5.12.4 2009-03-07 Data::Compare::Plugins(3pm)