Solution for the Massive Comparison Operation

06-10-2010

Registered User

26, 0

Join Date: Feb 2008

Last Activity: 21 June 2010, 4:55 AM EDT

Posts: 26

Thanks Given: 4

Thanked 0 Times in 0 Posts

Solution for the Massive Comparison Operation

Hi

We have 50 million records in mainframes DB2. We have a requirement to Record the Change Data Capture(CDC) records.

i.e New Records or Updated Records that were added into the DB2.
Unfortunately we dont have any column indicators to give the details of the changes made to the records.

So we decided to import the same into flat files into UNIX box everyday and then compare the previous day's file and get the changed rows using UNIX text processing functionalities.

The problem is everyday huge data changes happen approx 40 million

So can anyone give me a solution for better handling of the same or even the hardware requirement for UNIX Server for faster processing of the huge data comparison.

Thanks

raghav288

View Public Profile for raghav288

Find all posts by raghav288

06-10-2010

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

We have a similar problem. Are you running diff? That would take forever.

Use something that has associative (hashed) arrays like awk or perl. Assuming you have several files, and an "old" one and a "new" one, that should take less than an hour.
You can search here for examples of both types of code on how to find file differences.

You need a lot of virtual memory, we run on a Solaris 9 sparc v440 with 32GB of memory.
We complete comparing 1.5GB (250K lines) files in about 5 minutes. We do them 12 at a time: 6 old vs 6 new.

I hope this is what you were asking....

This User Gave Thanks to jim mcnamara For This Post:

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

06-11-2010

Registered User

26, 0

Join Date: Feb 2008

Last Activity: 21 June 2010, 4:55 AM EDT

Posts: 26

Thanks Given: 4

Thanked 0 Times in 0 Posts

Thank you very much for you help.. yes I hwas looking for the same

raghav288

View Public Profile for raghav288

Find all posts by raghav288

06-13-2010

Registered User

26, 0

Join Date: Feb 2008

Last Activity: 21 June 2010, 4:55 AM EDT

Posts: 26

Thanks Given: 4

Thanked 0 Times in 0 Posts

Another Idea for the same solution..

Hi

Thanks for the solution.. We had come up with a solution for comparing the huge data..

Since we are comparing huge data of flat file records, the follwing can be done

A hash function may be used like you mentioned below for each rows on the flat files, making the comparison easier.

But Is there a utility hash function in unix same as that of orahash in oracle that wud encrypt each new row uniquely within a few set of characters or numbers.

Then we cud use only those hashed codes to compare with the old hash codes of the prev day file and which wud make processing faster too...

raghav288

View Public Profile for raghav288

Find all posts by raghav288

06-13-2010

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Glibc has an extensive hash library

See: Hash Tables

This User Gave Thanks to jim mcnamara For This Post:

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

06-14-2010

Registered User

26, 0

Join Date: Feb 2008

Last Activity: 21 June 2010, 4:55 AM EDT

Posts: 26

Thanks Given: 4

Thanked 0 Times in 0 Posts

Hi Jim thanks for the reference but i need help in implementing the same

for eg
Flat file 1 contains :
AAA-BBB-CCC
XXX-YYY-ZZZ

I want to encrypt in such a way that the rows size for each rows are lowered and unique to each other as per the data, like

the above flat file for each row is to be encrypted using any methodology to like

XYS4358
ABC4385

Can anyone help me with the same, Im using AIX

raghav288

View Public Profile for raghav288

Find all posts by raghav288

06-14-2010

Registered User

6,402, 678

Join Date: Mar 2008

Last Activity: 8 June 2016, 9:58 PM EDT

Posts: 6,402

Thanks Given: 288

Thanked 678 Times in 647 Posts

Have you considered addinng a trigger to the DB2 database?

This User Gave Thanks to methyl For This Post:

methyl

View Public Profile for methyl

Find all posts by methyl

UNIX for Advanced & Expert Users

Solution for the Massive Comparison Operation

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Massive ftp

Discussion started by: tricampeon81

2. Shell Programming and Scripting

Massive Copy With Base Directory

Discussion started by: nitrobass24

3. Homework & Coursework Questions

having massive trouble with 5 questions about egrep!

Discussion started by: dindiqotu

4. Shell Programming and Scripting

Column operation : cosne and sine operation

Discussion started by: shashi792

5. Shell Programming and Scripting

Looking for AWK Solution for column comparison in a single file

Discussion started by: softwarekids23