Sorry to be a wet blanket but neither the grep nor the uniq approach will fulfill the requirement, even if the data was in sorted order (which it isn't).
1) Do both files have exactly the same number of records and are you just looking for records which have changed? Does the order of the output into file3 matter?
2) If there can be more or less records in file2 than file1, does the order of the output into file3 matter?
Are you also interested in records which exist in file1 but do not exist in file2?
3) What percentage of differences do you expect? (This is really a performance question because some approaches would involve multiple lookups).
4) If this proves too difficult for shell programming, do you have a mainstream database engine?
---------- Post updated at 15:05 ---------- Previous update was at 14:20 ----------
One shell approach if the order of the output does not matter.
Tried with two approx 5 million record files of 500 Mb each. Took about 5 mins to run and the output only shows the mismatched records from file2. Actual performance will depend on how fast you computer is and how much memory you can give to sort.
When sorting large files be sure to set $TMPDIR to somewhere with enough space for at least twice the size of the file being sorted.
hey guys, I have two files both with two columns, I have already created an
awk code to ignore certain lines (e.g lines that start with 963) as they wou
ld begin with a certain string, however, the rest I have added together and
calculated the average.
At the moment the code also displays... (3 Replies)
Hi guys,
I need some help to come out with a solution . I have seven such files but I am showing only three for convenience.
filea
a5 20
a8 16
fileb
a3 42
a7 14
filec
a5 23
a3 07
The output file shoud contain the data in table form showing first field of... (7 Replies)
You have two files to compare by searching keyword from one file into another file
File A
23 >pp_ANSWER
24 >aa hello
25 >jau head wear
66 >jss oops
872 >aqq olps ploww oww sss
722 >GG_KILLER
..... large files
File B
Beta done
KILLER
John Mayor
calix meyers
... (5 Replies)
All,
PLease can you help me with a shell script which can compare two xml files and print the difference to a output file.
I have attached one such file for you reference.
<Group>
<Member ID=":Year_Quad:41501" childCount="4" fullPath="PEPSICO Year-Quad-Wk : FOLDER.52 Weeks Ending Dec... (2 Replies)
Hello. I have two files. FILE1 was extracted from FILE2 and modified thanks to help from this post. Now I need to replace the extracted, modified lines into the original file (FILE2) to produce the FILE3.
FILE1
1466 55.27433 14.72050 -2.52E+03 3.00E-01 1.05E+04 2.57E+04
1467 55.27433... (1 Reply)
compare to flat files using awk .but in 4th field contains non ordered substring. how to do that.
file1.txt
john|0.0|4|**:25;JP:50;UY:25
file2.txt
andy|0.0|4|JP:50;**:25;UY:25 (4 Replies)
Hi,
I have multiple files that each contain one column of strings:
File1:
123abc
456def
789ghi
File2:
123abc
456def
891jkl
File3:
234mno
123abc
456def
In total I have 25 of these type of file. (5 Replies)
Hi,
I want to compare two columns from file1 with another two column of file2 and print matched and unmatched column like this
File1
1 rs1 abc
3 rs4 xyz
1 rs3 stu
File2
1 kkk rs1 AA 10
1 aaa rs2 DD 20
1 ccc ... (2 Replies)
Hi All,
i am trying to compare two files in Centos 6.
F1: /tmp/d21
NAME="xvda" TYPE="disk" SIZE="40G" OWNER="root" GROUP="disk" MODE="brw-rw----" MOUNTPOINT=""
NAME="xvda1" TYPE="part" SIZE="500M" OWNER="root" GROUP="disk" MODE="brw-rw----" MOUNTPOINT="/boot"
NAME="xvda2" TYPE="part"... (2 Replies)
Discussion started by: balu1234
2 Replies
LEARN ABOUT OSF1
comm
comm(1) General Commands Manual comm(1)NAME
comm - Compares two sorted files.
SYNOPSIS
comm [-123] file1 file2
STANDARDS
Interfaces documented on this reference page conform to industry standards as follows:
command: XCU5.0
Refer to the standards(5) reference page for more information about industry standards and associated tags.
OPTIONS
Suppresses output of the first column (lines in file1 only). Suppresses output of the second column (lines in file2 only). Suppresses
output of the third column (lines common to file1 and file2).
The command comm -123 produces no output.
OPERANDS
A pathname of the first file to be compared. If file1 is a hyphen (-), the standard input is used. A pathname of the second file to be
compared. If file2 is a hyphen (-), the standard input is used.
If both file1 and file2 refer to standard input or to the same FIFO special, block special or character special file, the results are unde-
fined.
DESCRIPTION
The comm command reads file1 and file2 and writes three columns to standard output, showing which lines are common to the files and which
are unique to each.
The leftmost column of standard output includes lines that are in file1 only. The middle column includes lines that are in file2 only.
The rightmost column includes lines that are in both file1 and file2.
If you specify a hyphen (-) in place of one of the file names, comm reads standard input.
Generally, file1 and file2 should be sorted according to the collating sequence specified by the LC_COLLATE environment variable. (See
sort(1).) If the input files are not sorted properly, the output of comm might not be useful.
EXIT STATUS
Successful completion. Error occurred.
EXAMPLES
In the following examples, file1 contains the following sorted list of North American cities:
Anaheim Baltimore Boston Chicago Cleveland Dallas Detroit Kansas City Milwaukee Minneapolis New York Oakland Seattle Toronto
The second file, file2, contains this sorted list:
Atlanta Chicago Cincinnati Houston Los Angeles Montreal New York Philadelphia Pittsburgh San Diego San Francisco St. Louis
To display the lines unique to each file and common to the two files, enter: comm file1 file2
This command results in the following output: Anaheim Atlanta Baltimore Boston Chicago Cincinnati Cleveland Dal-
las Detroit Houston Kansas City Los Angeles Milwaukee Minneapolis Montreal New York Oakland Philadel-
phia Pittsburgh San Diego San Francisco Seattle St. Louis Toronto
The leftmost column contains lines in file1 only, the middle column contains lines in file2 only, and the rightmost column contains
lines common to both files. To display any one or two of the three output columns, include the appropriate flags to suppress the
columns you do not want. For example, the following command displays columns 1 and 2 only: comm -3 file1 file2
Anaheim
Atlanta Baltimore Boston
Cincinnati Cleveland Dallas Detroit
Houston Kansas City
Los Angeles Milwaukee Minneapolis
Montreal Oakland
Philadelphia
Pittsburgh
San Diego
San Francisco Seattle
St. Louis Toronto
The following command displays output from only the second column: comm -13 file1 file2
Atlanta Cincinnati Houston Los Angeles Montreal Philadelphia Pittsburgh San Diego San Francisco St. Louis
The following command displays output from only the third column: comm -12 file1 file2
Chicago New York
SEE ALSO
Commands: cmp(1), diff(1), sdiff(1), sort(1), uniq(1)comm(1)