Solution for the Massive Comparison Operation


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Solution for the Massive Comparison Operation
# 1  
Old 06-10-2010
Solution for the Massive Comparison Operation

Hi

We have 50 million records in mainframes DB2. We have a requirement to Record the Change Data Capture(CDC) records.

i.e New Records or Updated Records that were added into the DB2.
Unfortunately we dont have any column indicators to give the details of the changes made to the records.

So we decided to import the same into flat files into UNIX box everyday and then compare the previous day's file and get the changed rows using UNIX text processing functionalities.

The problem is everyday huge data changes happen approx 40 million

So can anyone give me a solution for better handling of the same or even the hardware requirement for UNIX Server for faster processing of the huge data comparison.

Thanks
# 2  
Old 06-10-2010
We have a similar problem. Are you running diff? That would take forever.

Use something that has associative (hashed) arrays like awk or perl. Assuming you have several files, and an "old" one and a "new" one, that should take less than an hour.
You can search here for examples of both types of code on how to find file differences.

You need a lot of virtual memory, we run on a Solaris 9 sparc v440 with 32GB of memory.
We complete comparing 1.5GB (250K lines) files in about 5 minutes. We do them 12 at a time: 6 old vs 6 new.

I hope this is what you were asking....
This User Gave Thanks to jim mcnamara For This Post:
# 3  
Old 06-11-2010
Thank you very much for you help.. yes I hwas looking for the same
# 4  
Old 06-13-2010
Another Idea for the same solution..

Hi

Thanks for the solution.. We had come up with a solution for comparing the huge data..

Since we are comparing huge data of flat file records, the follwing can be done

A hash function may be used like you mentioned below for each rows on the flat files, making the comparison easier.

But Is there a utility hash function in unix same as that of orahash in oracle that wud encrypt each new row uniquely within a few set of characters or numbers.

Then we cud use only those hashed codes to compare with the old hash codes of the prev day file and which wud make processing faster too...
# 5  
Old 06-13-2010
Glibc has an extensive hash library

See: Hash Tables
This User Gave Thanks to jim mcnamara For This Post:
# 6  
Old 06-14-2010
Hi Jim thanks for the reference but i need help in implementing the same

for eg
Flat file 1 contains :
AAA-BBB-CCC
XXX-YYY-ZZZ

I want to encrypt in such a way that the rows size for each rows are lowered and unique to each other as per the data, like

the above flat file for each row is to be encrypted using any methodology to like

XYS4358
ABC4385

Can anyone help me with the same, Im using AIX
# 7  
Old 06-14-2010
Have you considered addinng a trigger to the DB2 database?
This User Gave Thanks to methyl For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Massive ftp

friends good morning FTP works perfect but I have a doubt if I want to transport 10 files, I imagine that I should not open 10 connections as I can transfer more than 1 file? ftp -n <<!EOF open caburga user ephfact ephfact cd /users/efactura/docONE/entrada bin mput EPH`date... (16 Replies)
Discussion started by: tricampeon81
16 Replies

2. Shell Programming and Scripting

Massive Copy With Base Directory

I have a script that I am using to copy around 40-70k files to a NFS NAS. I have posted my code below in hopes that someone can help me figure out a faster way of achieving this. At the end of the script i need to have all the files in the list, copied over to the nas with source directory... (8 Replies)
Discussion started by: nitrobass24
8 Replies

3. Homework & Coursework Questions

having massive trouble with 5 questions about egrep!

Hi all! I need help to do a few things with a .txt file using egrep. 1. I need to list all sequences where the vowel letters 'a, e, i, o, u' occur in that order, possibly separated by characters other than a, e, i, o, u; consisting of one or more complete words, possibly including punctuation. ... (1 Reply)
Discussion started by: dindiqotu
1 Replies

4. Shell Programming and Scripting

Column operation : cosne and sine operation

I have a txt file with several columns and i want to peform an operation on two columns and output it to a new txt file . file.txt 900.00000 1 1 1 500.00000 500.00000 100000.000 4 4 1.45257346E-07 899.10834 ... (4 Replies)
Discussion started by: shashi792
4 Replies

5. Shell Programming and Scripting

Looking for AWK Solution for column comparison in a single file

- I am looking for different kind of awk solution which I don't think is mentioned before in these forums. Number of rows in the file are fixed Their are two columns in file1.txt 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 I am looking for 3... (1 Reply)
Discussion started by: softwarekids23
1 Replies
Login or Register to Ask a Question