Solution for the Massive Comparison Operation


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Solution for the Massive Comparison Operation
# 8  
Old 06-15-2010
Bug

Hi methyl, we had considered that option too but, since we are dealing with huge amount of data, so there is a possibility that it wud hamper the database.
# 9  
Old 06-17-2010
Assuming that:
  1. your file has a unique key (like some sort of account id, order id etc.)
  2. it is sorted by that key (in ascending order)
Here is an approach that I use when working with flat files and comparing data (its psuedo code):

Code:
open today
open yesterday
while true
{
  ## Check if we need to read a new record
  if(!today_rec)
  {
    today_rec = read_next_record(today)
    today_key = get_key(today_rec)
  }

  ## Check if we need to read a new record
  if(!yesterday_rec)
  {
    yesterday_rec = read_next_record(yesterday)
    yesterday_key = get_key(yesterday)
  }

  ## If both files are done, exit the processing loop
  if (today_rec == NULL and yesterday_rec == NULL)
    break;

  if (today_key < yesterday_key)
  {
    ## today_key is missing from the yesterday file, its an insert
    report_inserted_record(today_rec)     
    today_rec = NULL
    continue
  }
  else if (today_key > yesterday_key)
  {
    ## yesterday_key is missing from today file, its a delete
    report_deleted_record(yesterday_rec)
    yesterday_rec = NULL
    continue
  }
  else if (compare_records)
    report_changed_record(today_rec, yesterday_rec)
  
  today_rec = NULL
  yeterday_rec = NULL
}

The performance boost comes from the fact that each file is traversed exactly once in this approach.
You will have to handle boundary conditions (especially end-of-file and other errors properly in the code.

~A Programmer
This User Gave Thanks to a_programmer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Massive ftp

friends good morning FTP works perfect but I have a doubt if I want to transport 10 files, I imagine that I should not open 10 connections as I can transfer more than 1 file? ftp -n <<!EOF open caburga user ephfact ephfact cd /users/efactura/docONE/entrada bin mput EPH`date... (16 Replies)
Discussion started by: tricampeon81
16 Replies

2. Shell Programming and Scripting

Massive Copy With Base Directory

I have a script that I am using to copy around 40-70k files to a NFS NAS. I have posted my code below in hopes that someone can help me figure out a faster way of achieving this. At the end of the script i need to have all the files in the list, copied over to the nas with source directory... (8 Replies)
Discussion started by: nitrobass24
8 Replies

3. Homework & Coursework Questions

having massive trouble with 5 questions about egrep!

Hi all! I need help to do a few things with a .txt file using egrep. 1. I need to list all sequences where the vowel letters 'a, e, i, o, u' occur in that order, possibly separated by characters other than a, e, i, o, u; consisting of one or more complete words, possibly including punctuation. ... (1 Reply)
Discussion started by: dindiqotu
1 Replies

4. Shell Programming and Scripting

Column operation : cosne and sine operation

I have a txt file with several columns and i want to peform an operation on two columns and output it to a new txt file . file.txt 900.00000 1 1 1 500.00000 500.00000 100000.000 4 4 1.45257346E-07 899.10834 ... (4 Replies)
Discussion started by: shashi792
4 Replies

5. Shell Programming and Scripting

Looking for AWK Solution for column comparison in a single file

- I am looking for different kind of awk solution which I don't think is mentioned before in these forums. Number of rows in the file are fixed Their are two columns in file1.txt 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 I am looking for 3... (1 Reply)
Discussion started by: softwarekids23
1 Replies
Login or Register to Ask a Question