Comparing two huge files Post: 302231756

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

comparing Huge Files - Performance is very bad

Hi All, Can you please help me in resolving the following problem? My requirement is like this: 1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data. 2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by...

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Hi, As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line. As DIFF command wont work for big files, i tried to use BDIFF instead. I am getting incorrect...

3. UNIX for Advanced & Expert Users

Huge files manipulation

Hi , i need a fast way to delete duplicates entrys from very huge files ( >2 Gbs ) , these files are in plain text. I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (memory core dump) In using HP-UX large servers. Any advice will...

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a...

5. Shell Programming and Scripting

Comparing two huge files on field basis.

Hi all, I have two large files and i want a field by field comparison for each record in it. All fields are tab seperated. file1: Email SELVAKUMAR RAMACHANDRAN Email SHILPA SAHU Web NIYATI SONI Web NIYATI SONI Email VIINII DOSHI Web RAJNISH KUMAR Web ...

6. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM...

7. Shell Programming and Scripting

Perl: Need help comparing huge files

What do i need to do have the below perl program load 205 million record files into the hash. It currently works on smaller files, but not working on huge files. Any idea what i need to do to modify to make it work with huge files: #!/usr/bin/perl $ot1=$ARGV; $ot2=$ARGV; open(mfileot1,...

8. Shell Programming and Scripting

awk to parse huge files

Hello All, I have a situation as below: (1) Read a source file (a single file of 1.2 million rows in it ) (2) Read Destination files one by one and replace the content ( few fields in it ) with the corresponding matching field from source file. I tried as below: ( please note I am not...

9. Shell Programming and Scripting

Work with huge Zipped files

Hello dear members, I have one general and one specific question which I will be very grateful if you could help me with them. Let's start with my general question: 1. I am working on cluster computer shared with other people and I need to manipulate a big zipped text file of 13 GB. There is...

10. Shell Programming and Scripting

Aggregation of Huge files

Hi Friends !! I am facing a hash total issue while performing over a set of files of huge volume: Command used: tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f' Pipe delimited file and 156 column is for hash totalling....

LEARN ABOUT DEBIAN

lr_deanonymize

LR_DEANONYMIZE.IN(1)					  LogReport's Lire Documentation				      LR_DEANONYMIZE.IN(1)

NAME

       lr_deanonymize - restore anonymized data, using a dump as produced by lr_anonymize(1)

SYNOPSIS

       lr_deanonymize dumpfilestem

DESCRIPTION

       lr_deanonymize is typically used when receiving anonymized reports from a responder.  See the section on "Processing The Responder's
       Results" in the chapter on "Using A Responder" in the Lire User Manual for usage examples.

       lr_deanonymize reads a file containing anonymized emailaddresses, ipnumbers, and hostnames (typically a report, generated from a logfile
       from an internet service) from stdin, and prints a "deanonymized" version of this file to stdout. It reads its information to do this from
       a bunch of Berkeley DB's, stored in files whose's names are derived from dumpfilestem, as produced by lr_anonymize(1).

EXAMPLE

       A 'logfile' like e.g.

	blaat fkrf 1.2.3.4.in-addr.arpa] pietje@bigcompany.com bla 1 2 3 lj;agas;gag
	blaat 1.2.3.4 fkrf 3.2.3.4.in-addr.arpa] bla 1 www.hotsex.com 2 3 lj;agas;gag
	jan@blaat.frut.com agagag
	blaat fkrf 4.2.3.4.in-addr.arpa] bla pietje@bigcompany.com www.hotsex.com
	234.34.2.0 jan@blaat.frut.com 4.2.3.4.in-addr.arpa1 2 3 lj;agas;gag
	blaat fkrf tweede 3.2.3.4.in-addr.arpa] bla 1.2.3.4 1 blablabla.com
	2 mdcc.cx
	3 lj;agas;gag

       wil get anonymized to

	blaat fkrf 1.0.0.10.in-addr.arpa] john.doe.1@example.com bla 1 2 3 lj;agas;gag
	blaat 10.0.0.1 fkrf 2.0.0.10.in-addr.arpa] bla 1 1.example.com 2 3 lj;agas;gag
	john.doe.2@example.com agagag
	blaat fkrf 3.0.0.10.in-addr.arpa] bla john.doe.1@example.com 1.example.com
	10.0.0.2 john.doe.2@example.com 3.0.0.10.in-addr.arpa1 2 3 lj;agas;gag
	blaat fkrf tweede 2.0.0.10.in-addr.arpa] bla 10.0.0.1 1 2.example.com
	2 3.example.com
	3 lj;agas;gag

       The dump will look like

	ip 234.34.2.0 10.0.0.2
	ip 1.2.3.4 10.0.0.1
	inaddr 3.2.3.4.in-addr.arpa 2.0.0.10.in-addr.arpa
	inaddr 1.2.3.4.in-addr.arpa 1.0.0.10.in-addr.arpa
	inaddr 4.2.3.4.in-addr.arpa 3.0.0.10.in-addr.arpa
	domain mdcc.cx 3.example.com
	domain blablabla.com 2.example.com
	domain www.hotsex.com 1.example.com
	email jan@blaat.frut.com john.doe.2@example.com
	email pietje@bigcompany.com john.doe.1@example.com

SEE ALSO

       lr_anonymize(1)

VERSION

       $Id: lr_deanonymize.in,v 1.4 2006/07/23 13:16:32 vanbaal Exp $

COPYRIGHT

       Copyright (C) 2000-2001 Stichting LogReport Foundation LogReport@LogReport.org

       This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
       the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

       You should have received a copy of the GNU General Public License along with this program (see COPYING); if not, check with
       http://www.gnu.org/copyleft/gpl.html.

AUTHOR

       Joost van Baal <joostvb@logreport.org>

Lire 2.1.1							    2006-07-23						      LR_DEANONYMIZE.IN(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

comparing Huge Files - Performance is very bad

Discussion started by: madhukalyan

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Discussion started by: pyaranoid

3. UNIX for Advanced & Expert Users

Huge files manipulation

Discussion started by: Klashxx

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100