09-03-2008
you don't need to touch file B
Here's how I'd do it... I think it should be very quick.
osscl1head01 1447>cat fileA
// 223 missing
223,Jan,ee,bla,bla
// data not found
254-11,Jan,ee,bla,bla
// data rejected
214-1,Jan,ee,bla,bla
osscl1head01 1448>cat fileB
aaaa,bbbb,ccc,dddd,20054-11,fff,ggg...
aaaa,bbbb,ccc,dddd,254-11,fff,ggg...
aaaa,bbbb,ccc,dddd,2545456-1,fff,ggg...
osscl1head01 1449>grep . fileA | grep -v / | awk -F, '{print $1}' > fileC
osscl1head01 1450>cat fileC
223
254-11
214-1
osscl1head01 1451>join -1 1 -2 5 -t, fileC fileB > fileD
osscl1head01 1452>cat fileD
254-11,aaaa,bbbb,ccc,dddd,fff,ggg...
osscl1head01 1453>
EDIT You need to sort both the input files to join by the identifier, but that *should* be straight forward enough.
sort +4 -t, fileB > fileBsorted
sort fileC > fileCsorted
You can probably use awk to repair the structure of fileD if that is important.
Last edited by Digby; 09-03-2008 at 07:56 AM..
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi All,
Can you please help me in resolving the following problem?
My requirement is like this:
1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data.
2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by... (5 Replies)
Discussion started by: madhukalyan
5 Replies
2. UNIX for Dummies Questions & Answers
Hi,
As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line.
As DIFF command wont work for big files, i tried to use BDIFF instead.
I am getting incorrect... (13 Replies)
Discussion started by: pyaranoid
13 Replies
3. UNIX for Advanced & Expert Users
Hi , i need a fast way to delete duplicates entrys from very huge files ( >2 Gbs ) , these files are in plain text.
I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (memory core dump)
In using HP-UX large servers.
Any advice will... (8 Replies)
Discussion started by: Klashxx
8 Replies
4. Shell Programming and Scripting
Hi, all:
I've got two folders, say, "folder1" and "folder2".
Under each, there are thousands of files.
It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command.
However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies
5. Shell Programming and Scripting
Hi all,
I have two large files and i want a field by field comparison for each record in it.
All fields are tab seperated.
file1:
Email SELVAKUMAR RAMACHANDRAN
Email SHILPA SAHU
Web NIYATI SONI
Web NIYATI SONI
Email VIINII DOSHI
Web RAJNISH KUMAR
Web ... (4 Replies)
Discussion started by: Suman Singh
4 Replies
6. Shell Programming and Scripting
I have this 2 files:
k5login
sanwar@systems.nyfix.com
jjamnik@systems.nyfix.com
nisha@SYSTEMS.NYFIX.COM
rdpena@SYSTEMS.NYFIX.COM
service/backups-ora@SYSTEMS.NYFIX.COM
ivanr@SYSTEMS.NYFIX.COM
nasapova@SYSTEMS.NYFIX.COM
tpulay@SYSTEMS.NYFIX.COM
rsueno@SYSTEMS.NYFIX.COM... (11 Replies)
Discussion started by: linuxgeek
11 Replies
7. Shell Programming and Scripting
What do i need to do have the below perl program load 205 million record files into the hash. It currently works on smaller files, but not working on huge files. Any idea what i need to do to modify to make it work with huge files:
#!/usr/bin/perl
$ot1=$ARGV;
$ot2=$ARGV;
open(mfileot1,... (12 Replies)
Discussion started by: mrn6430
12 Replies
8. Shell Programming and Scripting
Hello All,
I have a situation as below:
(1) Read a source file (a single file of 1.2 million rows in it )
(2) Read Destination files one by one and replace the content ( few fields in it ) with the corresponding matching field from source file.
I tried as below: ( please note I am not... (4 Replies)
Discussion started by: panyam
4 Replies
9. Shell Programming and Scripting
Hello dear members,
I have one general and one specific question which I will be very grateful if you could help me with them. Let's start with my general question:
1. I am working on cluster computer shared with other people and I need to manipulate a big zipped text file of 13 GB. There is... (1 Reply)
Discussion started by: Homa
1 Replies
10. Shell Programming and Scripting
Hi Friends !!
I am facing a hash total issue while performing over a set of files of huge volume:
Command used:
tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f'
Pipe delimited file and 156 column is for hash totalling.... (14 Replies)
Discussion started by: Ravichander
14 Replies
LEARN ABOUT DEBIAN
lr_deanonymize
LR_DEANONYMIZE.IN(1) LogReport's Lire Documentation LR_DEANONYMIZE.IN(1)
NAME
lr_deanonymize - restore anonymized data, using a dump as produced by lr_anonymize(1)
SYNOPSIS
lr_deanonymize dumpfilestem
DESCRIPTION
lr_deanonymize is typically used when receiving anonymized reports from a responder. See the section on "Processing The Responder's
Results" in the chapter on "Using A Responder" in the Lire User Manual for usage examples.
lr_deanonymize reads a file containing anonymized emailaddresses, ipnumbers, and hostnames (typically a report, generated from a logfile
from an internet service) from stdin, and prints a "deanonymized" version of this file to stdout. It reads its information to do this from
a bunch of Berkeley DB's, stored in files whose's names are derived from dumpfilestem, as produced by lr_anonymize(1).
EXAMPLE
A 'logfile' like e.g.
blaat fkrf 1.2.3.4.in-addr.arpa] pietje@bigcompany.com bla 1 2 3 lj;agas;gag
blaat 1.2.3.4 fkrf 3.2.3.4.in-addr.arpa] bla 1 www.hotsex.com 2 3 lj;agas;gag
jan@blaat.frut.com agagag
blaat fkrf 4.2.3.4.in-addr.arpa] bla pietje@bigcompany.com www.hotsex.com
234.34.2.0 jan@blaat.frut.com 4.2.3.4.in-addr.arpa1 2 3 lj;agas;gag
blaat fkrf tweede 3.2.3.4.in-addr.arpa] bla 1.2.3.4 1 blablabla.com
2 mdcc.cx
3 lj;agas;gag
wil get anonymized to
blaat fkrf 1.0.0.10.in-addr.arpa] john.doe.1@example.com bla 1 2 3 lj;agas;gag
blaat 10.0.0.1 fkrf 2.0.0.10.in-addr.arpa] bla 1 1.example.com 2 3 lj;agas;gag
john.doe.2@example.com agagag
blaat fkrf 3.0.0.10.in-addr.arpa] bla john.doe.1@example.com 1.example.com
10.0.0.2 john.doe.2@example.com 3.0.0.10.in-addr.arpa1 2 3 lj;agas;gag
blaat fkrf tweede 2.0.0.10.in-addr.arpa] bla 10.0.0.1 1 2.example.com
2 3.example.com
3 lj;agas;gag
The dump will look like
ip 234.34.2.0 10.0.0.2
ip 1.2.3.4 10.0.0.1
inaddr 3.2.3.4.in-addr.arpa 2.0.0.10.in-addr.arpa
inaddr 1.2.3.4.in-addr.arpa 1.0.0.10.in-addr.arpa
inaddr 4.2.3.4.in-addr.arpa 3.0.0.10.in-addr.arpa
domain mdcc.cx 3.example.com
domain blablabla.com 2.example.com
domain www.hotsex.com 1.example.com
email jan@blaat.frut.com john.doe.2@example.com
email pietje@bigcompany.com john.doe.1@example.com
SEE ALSO
lr_anonymize(1)
VERSION
$Id: lr_deanonymize.in,v 1.4 2006/07/23 13:16:32 vanbaal Exp $
COPYRIGHT
Copyright (C) 2000-2001 Stichting LogReport Foundation LogReport@LogReport.org
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program (see COPYING); if not, check with
http://www.gnu.org/copyleft/gpl.html.
AUTHOR
Joost van Baal <joostvb@logreport.org>
Lire 2.1.1 2006-07-23 LR_DEANONYMIZE.IN(1)