09-10-2008
Difference between two huge files
Hi,
As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line.
As DIFF command wont work for big files, i tried to use BDIFF instead.
I am getting incorrect number of records.
I have done the following test:
I have got a dat file with a few million records in it and to generate a another file i have used sed '1,100d' oldfile > newfile
so i am using Bdiff oldfile newfile | sed -n '/^</p' > DIFF.DAT
The output(DIFF) should be having 100 records in it. But i am getting an output with several records in it.
Could anyone help me out from this situation?
Thanks
Sue
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of... (11 Replies)
Discussion started by: kmkbuddy_1983
11 Replies
2. AIX
IBM RS6000 F50
AIX 4.3.2
i am having trouble in calculating the actual size of a set of directories and reconciling the results with the actual Hard Disk space used
I have 33GB disk which is showing 7.8GB used, a byte count of the files in the directory/sub-dirs i`m interested in is 48GB,... (4 Replies)
Discussion started by: cooperuf
4 Replies
3. UNIX for Advanced & Expert Users
Hi , i need a fast way to delete duplicates entrys from very huge files ( >2 Gbs ) , these files are in plain text.
I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (memory core dump)
In using HP-UX large servers.
Any advice will... (8 Replies)
Discussion started by: Klashxx
8 Replies
4. High Performance Computing
we have one file (11 Million) line that is being matched with (10 Billion) line.
the proof of concept we are trying , is to join them on Unix :
All files are delimited and they have composite keys..
could unix be faster than Oracle in This regards..
Please advice (1 Reply)
Discussion started by: magedfawzy
1 Replies
5. Shell Programming and Scripting
I'm trying simple functionality of replacing the second line of files with some other string.
Problem is these files are huge and there are too many files to process.
Could anyone please suggest me a way to replace the second line of all files with another text in a fastest possible manner.
... (2 Replies)
Discussion started by: satish.pyboyina
2 Replies
6. Programming
On my Linux system there seems to be a massive difference between the value of _POSIX_OPEN_MAX and what sysconf(_SC_OPEN_MAX) returns and also what I'd expect from the table of examples of configuration limits from Advanced Programming In The UNIX Environment, 2nd Ed.
_POSIX_OPEN_MAX: 16... (5 Replies)
Discussion started by: gencon
5 Replies
7. Shell Programming and Scripting
Hi, all:
I've got two folders, say, "folder1" and "folder2".
Under each, there are thousands of files.
It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command.
However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies
8. Shell Programming and Scripting
I got three different file:
Part of File 1
ARTPHDFGAA
.
.
Part of File 2
ARTGHHYESA
.
.
Part of File 3
ARTPOLYWEA
.
. (4 Replies)
Discussion started by: patrick87
4 Replies
9. Shell Programming and Scripting
Hi all,
I need help on getting difference between 2 .csv files.
I have 2 large . csv files which has equal number of columns. I nned to compare them and get output in new file which will have difference olny.
E.g.
File1.csv
Name, Date, age,number
Sakshi, 16-12-2011, 22, 56
Akash,... (10 Replies)
Discussion started by: Dimple
10 Replies
10. Shell Programming and Scripting
Hi Friends !!
I am facing a hash total issue while performing over a set of files of huge volume:
Command used:
tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f'
Pipe delimited file and 156 column is for hash totalling.... (14 Replies)
Discussion started by: Ravichander
14 Replies
LEARN ABOUT DEBIAN
latexdiff-vc
LATEXDIFF-VC(1) User Contributed Perl Documentation LATEXDIFF-VC(1)
NAME
latexdiff-vc - wrapper script that calls latexdiff for different versions of a file under version management (CVS, RCS or SVN)
SYNOPSIS
latexdiff-vc [ latexdiff-options ] [ latexdiff-vc-options ] -r [rev1] [-r rev2] file1.tex [ file2.tex ...]
or
latexdiff-vc [ latexdiff-options ] [ latexdiff-vc-options ][ --postscript | --pdf ] old.tex new.tex
DESCRIPTION
latexdiff-vc is a wrapper script that applies latexdiff to a file, or multiple files under version control (CVS or RCS), and optionally
runs the sequence of "latex" and "dvips" or "pdflatex" commands necessary to produce pdf or postscript output of the difference tex
file(s). It can also be applied to a pair of files to automatise the generation of difference file in postscript or pdf format.
OPTIONS
--rcs, --svn, or --cvs
Set the version system. If no version system is specified, latexdiff-vc will venture a guess.
latexdiff-cvs and latexdiff-rcs are variants of latexdiff-vc which default to the respective versioning system. However, this default
can still be overridden using the options above.
-r, -r rev or --revision, --revision=rev
Choose revision (under RCS, CVS or SVN). One or two -r options can be specified, and the resulting in different behaviour:
latexdiff-vc -r file.tex ...
compares file.tex with the most recent version checked into RCS.
latexdiff-vc -r rev1 file.tex ...
compares file.tex with revision rev1.
latexdiff-vc -r rev1 -r rev2 file.tex ...
compares revisions rev1 and rev2 of file.tex.
Multiple files can be specified for all of the above options. All files must have the extension ".tex", though.
latexdiff-vc old.tex new.tex
compares two files.
The name of the difference file is generated automatically and reported to stdout.
-d or --dir -d path or --dir=path
Rather than appending the string "diff" and optionally the version numbers given to the output-file, this will prepend a directory name
"diff" to the original filename, creating the directory and subdirectories should they not exist already. This is particularly useful
in order to clone a complete directory hierarchy. Optionally, a pathname path can be specified, which is prepended instead of "diff".
--ps or --postscript
Generate postscript output from difference file. This will run the sequence "latex; latex; dvips" on the difference file (do not use
this option in the rare cases, where three "latex" commands are required if you care about correct referencing). If the difference
file contains a "ibliography" tag, run the sequence "latex; bibtex; latex; latex; dvips".
--pdf
Generate pdf output from difference file using "pdflatex". This will run the sequence "pdflatex; pdflatex" on the difference file, or
"pdflatex; bibtex; pdflatex; pdflatex" for files requiring bibtex.
--force
Overwrite existing diff files without asking for confirmation. Default behaviour is to ask for confirmation before overwriting an
existing difference file.
--help or -h
Show help text
--version
Show version number
All other options are passed on to "latexdiff".
SEE ALSO
latexdiff
PORTABILITY
latexdiff-vc uses external commands and is therefore limited to Unix-like systems. It also requires the RCS version control system and
latex to be installed on the system. Modules from Perl 5.8 or higher are required.
AUTHOR
Copyright (C) 2005 Frederik Tilmann
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License Version 2
Contributors: S Utcke, H Bruyninckx
perl v5.14.2 2007-09-29 LATEXDIFF-VC(1)