Sponsored Content
Top Forums Shell Programming and Scripting Merging two files by comparing three fields Post 302323328 by durden_tyler on Saturday 6th of June 2009 04:40:11 PM
Old 06-06-2009
One way to do it with perl:

Code:
$
$ cat file1
Class1 Sports Ball 11 12 13
Class2 Academic Bat 21 22 23
Class3 Academic Pen 31 32 33
Class4 Gift Birthday 41 42 43
$
$ cat file2
Class1 Sports Ball 14 15
Class2 Academic Bat 24 25
Class3 Academic Pen 34 35
Class5 Books Maths 54 55
$
$ perl -ne 'BEGIN {open(F,"file2"); while(<F>){split; $x{$_[0].":".$_[1].":".$_[2]}=" $_[3] $_[4]"} close(F)}
>   { chomp; split; $y=$_[0].":".$_[1].":".$_[2]; print $_,defined $x{$y}?$x{$y}:" 0 0","\n" }' file1
Class1 Sports Ball 11 12 13 14 15
Class2 Academic Bat 21 22 23 24 25
Class3 Academic Pen 31 32 33 34 35
Class4 Gift Birthday 41 42 43 0 0
$
$

tyler_durden
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merging fields from 2 different files.

File 1 3337304 2 4 DH.ER@TORONTO.CA 20080504 04622 3337305 2 4 A@C.COM 20080504 04622 3337306 2 4 JO@NET.NET 20080504 04622 3337307 2 4 L@GMAIL.COM 20080504 05344 2479201 2 2 ORY@YAHOO.COM 20080504 05344 File 2 ... (5 Replies)
Discussion started by: rudoraj
5 Replies

2. Shell Programming and Scripting

Comparing two files and replacing fields

I have two files with ids and email addresses. File 2 cotains a subset of the records in file 1. The key field is the first field containing the id. file 1: 123|myadr@abc.com 456|myadr2@abc.com 789|myadr3@abc.com file 2: 456|adr456@xyz.com Where the record appears in the second... (3 Replies)
Discussion started by: tltroy
3 Replies

3. Shell Programming and Scripting

Comparing fields in two files

Hi, i want to compare two files by one field say $3 in file1 needs to compare with $2 in file2. sample file1 - reqd_charge_code 2263881188,24570896,439 2263881964,24339077,439 2263883220,22619162,228 2263884224,24631840,442 2263884246,22612161,442 sample file2 - rg_j ... (2 Replies)
Discussion started by: raghavendra.cse
2 Replies

4. Shell Programming and Scripting

Problem in comparing 2 fields from 2 files

I've 2 files. Need to compare File1.Field1,File1.Field2 with File2.Field1,File2.Field2. If matches then create a new file. File1 10 A|ADB|967143.24|1006101.5 3E HK|DHB|24294.76|242513.89 ABN ACU|ADB|22104.69|51647.14 ABN BU|DBA|39137.14|109128.38 ABN|ADB|64466.89|167936.55 ABOC... (2 Replies)
Discussion started by: buster
2 Replies

5. Programming

comparing two fields from two different files in AWK

Hi, I have two files formatted as following: File 1: (user_num_ID , realID) (the NR here is 41671) 1 cust_034_60 2 cust_80_91 3 cust_406_4 .. .. File 2: (realID , clusterNumber) (total NR here is 1000) cust_034_60 2 cust_406_4 3 .. .. (11 Replies)
Discussion started by: amarn
11 Replies

6. UNIX for Dummies Questions & Answers

Comparing and merging two text files

Hey everybody, I am new here and already a question to ask, I just recently started some bioinformatic work for my PhD so I am slowly learning Anyway, here is my problem, I have two text files, one contains the complete data file with 43000 genes and their read counts for all my samples... (1 Reply)
Discussion started by: ant55
1 Replies

7. Shell Programming and Scripting

Comparing two files using four fields

I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2 are... (1 Reply)
Discussion started by: NamS
1 Replies

8. Shell Programming and Scripting

Comparing two files using four fields

Dear All, I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2... (6 Replies)
Discussion started by: NamS
6 Replies

9. UNIX for Advanced & Expert Users

Need urgent help in comparing two fields in two files

Hi all, I have two files as below. I need to compare field 2 of file 1 against field 1 of file 2 and field 5 of file 1 against filed 2 of file 2. If both matches , then create a result file 1 with first file data and if not matches , then create file with first fie data. Please help me in... (1 Reply)
Discussion started by: sivarajb
1 Replies

10. Shell Programming and Scripting

Comparing two files by two matching fields

Long time listener first time poster. Hope someone can advise. I have two files, 1000+ lines in each, two fields in each file. After performing a sort, what is the best way to find exact matches where field $1 and $2 in file1 are also present in file2 on the same line, then output only those... (6 Replies)
Discussion started by: bstaff
6 Replies
NUMDIFF(1)							   User Commands							NUMDIFF(1)

NAME
numdiff - compare similar files with numeric fields DESCRIPTION
Usage: numdiff -h|--help|-v|--version or numdiff [-s IFS][-a THRVAL[:RANGE|:RANGE1:RANGE2]][-r THRVAL[:RANGE|:RANGE1:RANGE2]][-2][-F NUM][-# NUM][-P][-N][-I][-c CURRNAME][-d C1C2][-t C1C2][-g N1N2][-p C1C2][-n C1C2][-e C1C2][-i C1C2][-X 1:RANGE][-X 2:RANGE][-E][-D][-b][-V][-O[NUM]][-q][-S][-z 1:RANGE][-z 2:RANGE][-Z 1:RANGE][-Z 2:RANGE][-m][-H][-f[NUM]][-T][-B][-l PATH][-o PATH] FILE1 FILE2 Compare putatively similar files line by line and field by field, ignoring small numeric differences or/and different numeric formats. RANGE, RANGE1 and RANGE2 stay for a positive integer value or for a range of integer values, like 1-, 3-5 or -7. The two arguments after the options are the names of the files to compare. The complete paths of the files should be given, a directory name is not accepted. The given paths cannot refer to the same file but one of them can be "-", which refers to stdin. Exit status: 1 if files differ, 0 if they are equal, -1 (255) in case of error -s, --separator=IFS Specify the set of characters to use to split the input lines into fields (The default set of characters is space, tab and newline). If IFS is prefixed with 1: or 2: then use the given character set only for the lines from the first or the second file respectively -a, --absolute-tolerance=THRVAL[:RANGE|:RANGE1:RANGE2] Set to THRVAL the maximum absolute difference permitted before that two numeric fields are regarded as different (The default value is zero). If a RANGE is given, use the specified threshold only when comparing fields whose positions lie in RANGE. If both RANGE1 and RANGE2 are given and have the same length, then use the specified threshold when comparing a field of FILE1 lying in RANGE1 with the corresponding field of FILE2 in RANGE2 -r, --relative-tolerance=THRVAL[:RANGE|:RANGE1:RANGE2] Set to THRVAL the maximum relative difference permitted before that two numeric fields are regarded as different (The default value is zero). If a RANGE is given, use the specified threshold only when comparing fields whose positions lie in RANGE. If both RANGE1 and RANGE2 are given and have the same length, then use the specified threshold when comparing a field of FILE1 lying in RANGE1 with the corresponding field of FILE2 in RANGE2 -2, --strict Consider two numerical values as equal only if both absolute and relative difference do not exceed the corresponding tolerance threshold -F, --formula=NUM Use the formula indicated by NUM to compute the relative errors. If 'NUM' is 0 use the classic formula. If 'NUM' is 1 compute the relative errors by considering the values in FILE1 as sample values. If 'NUM' is 2 compute the relative errors by considering the values in FILE2 as sample values. -#, --digits=NUM Set to NUM the number of digits in the significands used in multiple precision arithmetic -P, --positive-differences Ignore all differences due to numeric fields of the second file that are less than the corresponding numeric fields in the first file -N, --negative-differences Ignore all differences due to numeric fields of the second file that are greater than the corresponding numeric fields in the first file -I, --ignore-case Ignore changes in case while doing literal comparisons -c, --currency=CURRNAME Set to CURRNAME the currency name for the two files to compare. CURRNAME must be prefixed with 1: or 2: to specify the currency name only for the first or the second file -d, --decimal-point=C1C2 Specify the characters representing the decimal point in the two files to compare -t, --thousands-separator=C1C2 Specify the characters representing the thousands separator in the two files to compare -g, --group-length=N1N2 Specify the number of digits forming each group of thousands in the two files to compare -p, --plus-prefix=C1C2 Specify the (optional) prefixes for positive values used in the two files to compare -n, --minus-prefix=C1C2 Specify the prefixes for negative values used in the two files to compare -e, --exponent-letter=C1C2 Specify the exponent letters used in the two files to compare -i, --imaginary-unit=C1C2 Specify the characters representing the imaginary unit in the two files to compare -X, --exclude=1:RANGE Select the fields of the first file that have to be ignored -X, --exclude=2:RANGE Select the fields of the second file that have to be ignored -E, --essential While printing the differences between the two compared files show only the numerical ones -D, --dummy While printing the differences between the two compared files neglect all the numerical ones (dummy mode) -b, --brief Suppress all messages concerning the differences discovered in the structures of the two files -V, --verbose For every couple of lines which differ in at least one field print an header to show how these lines appear in the two compared files -O, --overview[=NUM] Display a side by side difference listing of the two files showing which lines are present only in one file, which lines are present in both files but with one or more differing fields, and which lines are identical. If 'NUM' is zero or is not specified, output at most 130 columns per line. If 'NUM' is a positive number, output at most 'NUM' columns per line. If 'NUM' is a negative number, do not output common lines and display at most -'NUM' columns per line. -q, --quiet, --silent Suppress all the standard output -S, --statistics Add some statistics to the standard output -z, --blur-if-numerical=1:RANGE Select the fields of the first file that have to be blurred during the synchronization procedure only if they turn out to be numeric -z, --blur-if-numerical=2:RANGE Select the fields of the second file that have to be blurred during the synchronization procedure only if they turn out to be numeric -Z, --blur-unconditionally=1:RANGE Select the fields of the first file that have to be unconditionally blurred during the synchronization procedure -Z, --blur-unconditionally=2:RANGE Select the fields of the second file that have to be unconditionally blurred during the synchronization procedure -m, --minimal During synchronization try hard to find a smaller set of changes -H, --speed-large-files During synchronization assume large files and many scattered small changes -f, --test-filter[=NUM] Run only the filter and then show the results of its attempt to synchronize the two files. If 'NUM' is zero or is not specified, output at most 130 columns per line. If 'NUM' is a positive number, output at most 'NUM' columns per line. If 'NUM' is a negative number, do not output common lines and display at most -'NUM' columns per line. -T, --expand-tabs Expand tabs to spaces in output while displaying the results of the synchronization procedure (meaningful only together with option -O or -f) -B, --binary Treat both files as binary files (only meaningful under Doz/Windoz) -l, --warnings-to=PATH Redirect warning and error messages from stderr to the indicated file -o, --output=PATH Redirect output from stdout to the indicated file -h, --help Show help message and predefined settings -v, --version Show version number, Copyright, Distribution Terms and NO-Warranty Default numeric format (for both files to compare): Currency name = "" Decimal point = `.' Thousands separator = `,' Number of digits in each thousands group = 3 Leading positive sign = `+' Leading negative sign = `-' Prefix for decimal exponent = `e' Symbol used to denote the imaginary unit = `i' COPYRIGHT
Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Ivano Primi <ivprimi@libero.it> License GPLv3+: GNU GPL version 3 or later, see <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO
The full documentation for numdiff is maintained as a Texinfo manual. If the info and numdiff programs are properly installed at your site, the command info numdiff should give you access to the complete manual. numdiff 5.6.0 January 2012 NUMDIFF(1)
All times are GMT -4. The time now is 02:03 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy