Thanks for your time on this, its much appreciated
1) Do both files have exactly the same number of records and are you just looking for records which have changed? Does the order of the output into file3 matter?
File1 has 1803077 records
file2 has 1795370 records
2) If there can be more or less records in file2 than file1, does the order of the output into file3 matter?
I would prefer 1st row in file3 from file1 and 2nd row from file2 and so on
Are you also interested in records which exist in file1 but do not exist in file2?
Yes, and viceversa also, it would be good if we can copy the records to diffrent files say recordsonlyonfile1.txt and recordsonlyonfile2.txt
3) What percentage of differences do you expect? (This is really a performance question because some approaches would involve multiple lookups).
there are huge changes in the file it could be over 50%
4) If this proves too difficult for shell programming, do you have a mainstream database engine?
I have informix database I am not sure if this would not help me as there is no uniq key in the records
---------- Post updated at 15:05 ---------- Previous update was at 14:20 ----------
One shell approach if the order of the output does not matter.
Tried with two approx 5 million record files of 500 Mb each. Took about 5 mins to run and the output only shows the mismatched records from file2. Actual performance will depend on how fast you computer is and how much memory you can give to sort.
When sorting large files be sure to set $TMPDIR to somewhere with enough space for at least twice the size of the file being sorted.[/QUOTE]
hey guys, I have two files both with two columns, I have already created an
awk code to ignore certain lines (e.g lines that start with 963) as they wou
ld begin with a certain string, however, the rest I have added together and
calculated the average.
At the moment the code also displays... (3 Replies)
Hi guys,
I need some help to come out with a solution . I have seven such files but I am showing only three for convenience.
filea
a5 20
a8 16
fileb
a3 42
a7 14
filec
a5 23
a3 07
The output file shoud contain the data in table form showing first field of... (7 Replies)
You have two files to compare by searching keyword from one file into another file
File A
23 >pp_ANSWER
24 >aa hello
25 >jau head wear
66 >jss oops
872 >aqq olps ploww oww sss
722 >GG_KILLER
..... large files
File B
Beta done
KILLER
John Mayor
calix meyers
... (5 Replies)
All,
PLease can you help me with a shell script which can compare two xml files and print the difference to a output file.
I have attached one such file for you reference.
<Group>
<Member ID=":Year_Quad:41501" childCount="4" fullPath="PEPSICO Year-Quad-Wk : FOLDER.52 Weeks Ending Dec... (2 Replies)
Hello. I have two files. FILE1 was extracted from FILE2 and modified thanks to help from this post. Now I need to replace the extracted, modified lines into the original file (FILE2) to produce the FILE3.
FILE1
1466 55.27433 14.72050 -2.52E+03 3.00E-01 1.05E+04 2.57E+04
1467 55.27433... (1 Reply)
compare to flat files using awk .but in 4th field contains non ordered substring. how to do that.
file1.txt
john|0.0|4|**:25;JP:50;UY:25
file2.txt
andy|0.0|4|JP:50;**:25;UY:25 (4 Replies)
Hi,
I have multiple files that each contain one column of strings:
File1:
123abc
456def
789ghi
File2:
123abc
456def
891jkl
File3:
234mno
123abc
456def
In total I have 25 of these type of file. (5 Replies)
Hi,
I want to compare two columns from file1 with another two column of file2 and print matched and unmatched column like this
File1
1 rs1 abc
3 rs4 xyz
1 rs3 stu
File2
1 kkk rs1 AA 10
1 aaa rs2 DD 20
1 ccc ... (2 Replies)
Hi All,
i am trying to compare two files in Centos 6.
F1: /tmp/d21
NAME="xvda" TYPE="disk" SIZE="40G" OWNER="root" GROUP="disk" MODE="brw-rw----" MOUNTPOINT=""
NAME="xvda1" TYPE="part" SIZE="500M" OWNER="root" GROUP="disk" MODE="brw-rw----" MOUNTPOINT="/boot"
NAME="xvda2" TYPE="part"... (2 Replies)
Discussion started by: balu1234
2 Replies
LEARN ABOUT PLAN9
join
JOIN(1) General Commands Manual JOIN(1)NAME
join - relational database operator
SYNOPSIS
join [ options ] file1 file2
DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If one of the file names is the
standard input is used.
File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in
each line.
There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con-
sists of the common field, then the rest of the line from file1, then the rest of the line from file2.
Input fields are normally separated spaces or tabs; output fields by space. In this case, multiple separators count as one, and leading
separators are discarded.
The following options are recognized, with POSIX syntax.
-a n In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.
-v n Like -a, omitting output for paired lines.
-e s Replace empty output fields by string s.
-1 m
-2 m Join on the mth field of file1 or file2.
-jn m Archaic equivalent for -n m.
-ofields
Each output line comprises the designated fields. The comma-separated field designators are either 0, meaning the join field, or
have the form n.m, where n is a file number and m is a field number. Archaic usage allows separate arguments for field designators.
-tc Use character c as the only separator (tab character) on input and output. Every appearance of c in a line is significant.
EXAMPLES
sort /adm/users | join -t: -a 1 -e "" - bdays
Add birthdays to password information, leaving unknown birthdays empty. The layout of is given in users(6); bdays contains sorted
lines like
tr : ' ' </adm/users | sort -k 3 3 >temp
join -1 3 -2 3 -o 1.1,2.1 temp temp | awk '$1 < $2'
Print all pairs of users with identical userids.
SOURCE
/sys/src/cmd/join.c
SEE ALSO sort(1), comm(1), awk(1)BUGS
With default field separation, the collating sequence is that of sort -b -ky,y; with -t, the sequence is that of sort -tx -ky,y.
One of the files must be randomly accessible.
JOIN(1)