Sponsored Content
Top Forums UNIX for Advanced & Expert Users Solution for the Massive Comparison Operation Post 302428591 by jim mcnamara on Thursday 10th of June 2010 10:08:46 AM
Old 06-10-2010
We have a similar problem. Are you running diff? That would take forever.

Use something that has associative (hashed) arrays like awk or perl. Assuming you have several files, and an "old" one and a "new" one, that should take less than an hour.
You can search here for examples of both types of code on how to find file differences.

You need a lot of virtual memory, we run on a Solaris 9 sparc v440 with 32GB of memory.
We complete comparing 1.5GB (250K lines) files in about 5 minutes. We do them 12 at a time: 6 old vs 6 new.

I hope this is what you were asking....
This User Gave Thanks to jim mcnamara For This Post:
 

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Looking for AWK Solution for column comparison in a single file

- I am looking for different kind of awk solution which I don't think is mentioned before in these forums. Number of rows in the file are fixed Their are two columns in file1.txt 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 I am looking for 3... (1 Reply)
Discussion started by: softwarekids23
1 Replies

2. Shell Programming and Scripting

Column operation : cosne and sine operation

I have a txt file with several columns and i want to peform an operation on two columns and output it to a new txt file . file.txt 900.00000 1 1 1 500.00000 500.00000 100000.000 4 4 1.45257346E-07 899.10834 ... (4 Replies)
Discussion started by: shashi792
4 Replies

3. Homework & Coursework Questions

having massive trouble with 5 questions about egrep!

Hi all! I need help to do a few things with a .txt file using egrep. 1. I need to list all sequences where the vowel letters 'a, e, i, o, u' occur in that order, possibly separated by characters other than a, e, i, o, u; consisting of one or more complete words, possibly including punctuation. ... (1 Reply)
Discussion started by: dindiqotu
1 Replies

4. Shell Programming and Scripting

Massive Copy With Base Directory

I have a script that I am using to copy around 40-70k files to a NFS NAS. I have posted my code below in hopes that someone can help me figure out a faster way of achieving this. At the end of the script i need to have all the files in the list, copied over to the nas with source directory... (8 Replies)
Discussion started by: nitrobass24
8 Replies

5. Shell Programming and Scripting

Massive ftp

friends good morning FTP works perfect but I have a doubt if I want to transport 10 files, I imagine that I should not open 10 connections as I can transfer more than 1 file? ftp -n <<!EOF open caburga user ephfact ephfact cd /users/efactura/docONE/entrada bin mput EPH`date... (16 Replies)
Discussion started by: tricampeon81
16 Replies
bdiff(1)							   User Commands							  bdiff(1)

NAME
bdiff - big diff SYNOPSIS
bdiff filename1 filename2 [n] [-s] DESCRIPTION
bdiff is used in a manner analogous to diff to find which lines in filename1 and filename2 must be changed to bring the files into agree- ment. Its purpose is to allow processing of files too large for diff. If filename1 (filename2) is -, the standard input is read. bdiff ignores lines common to the beginning of both files, splits the remainder of each file into n-line segments, and invokes diff on cor- responding segments. If both optional arguments are specified, they must appear in the order indicated above. The output of bdiff is exactly that of diff, with line numbers adjusted to account for the segmenting of the files (that is, to make it look as if the files had been processed whole). Note: Because of the segmenting of the files, bdiff does not necessarily find a smallest sufficient set of file differences. OPTIONS
n The number of line segments. The value of n is 3500 by default. If the optional third argument is given and it is numeric, it is used as the value for n. This is useful in those cases in which 3500-line segments are too large for diff, causing it to fail. -s Specifies that no diagnostics are to be printed by bdiff (silent option). Note: However, this does not suppress possible diagnos- tic messages from diff, which bdiff calls. USAGE
See largefile(5) for the description of the behavior of bdiff when encountering files greater than or equal to 2 Gbyte ( 2**31 bytes). FILES
/tmp/bd????? ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ SEE ALSO
diff(1), attributes(5), largefile(5) DIAGNOSTICS
Use help for explanations. SunOS 5.10 14 Sep 1992 bdiff(1)
All times are GMT -4. The time now is 05:01 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy