The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Comparing files soliberus Shell Programming and Scripting 5 04-29-2008 02:37 AM
Splitting huge XML Files into fixsized wellformed parts Malapha Shell Programming and Scripting 0 03-17-2008 11:35 AM
Comparing two files kingofprussia UNIX for Dummies Questions & Answers 2 08-01-2007 12:25 PM
comparing shadow files with real files terrym UNIX for Advanced & Expert Users 4 02-09-2007 02:38 AM
Huge (repeated Entry) text files axl SUN Solaris 4 07-16-2004 07:05 AM

 
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
Prev Previous Post   Next Post Next
  #1 (permalink)  
Old 10-10-2006
madhukalyan madhukalyan is offline
Registered User
  
 

Join Date: May 2006
Posts: 5
comparing Huge Files - Performance is very bad

Hi All,

Can you please help me in resolving the following problem?

My requirement is like this:

1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data.
2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by treating as duplicate.
3) If does not exists, I need to check the primary key fields, say for example first 3 fields, if match is found i need check the data part, say for example 5th and 6th fields, if datapart is not matching, then i need to write the record in a new file, say OUTPUT_FILE, with a prefix of 'C' (Change). If datapart is matching skip the record.
4) If Primary Key fields match not found then i need to write the record with a prefix of A (Append).
4) After above process, i need to check each record of YESTERDAY_FILE in TODAY_FILE, if does not exists, i need to write the record with a prefix of D (Delete).

I developed the following logic which is taking too much time to execute...in one minute it is creating 100 records. Performance is too bad. Can any one of you please help me out.

My code is:
Code:
while read record
do
        primary_key_fields=`echo $record | cut -d "|" -f 1-${fields_to_compare}`
        data_part=`echo $record | cut -d "|" -f ${fields_to_skip}-`

        flag=`grep "${record}" ${yesterday_file_name}`
        if [ -z "${flag}" ]; then
                flag_tmp=`grep "${primary_key_fields}" $yesterday_file_name`
                yesterday_data_part=`echo ${flag_temp} | cut -d "|" -f ${fields_to_skip}-`
                if [ -z "${flag_tmp}" ] ; then
                        current_record="A|"${record}
                elif [ "${yesterday_data_part}" != "${data_part}" ] ; then
                        current_record="C|"${record}|sed "s/|I|/|U|/g"
                fi
                echo "${current_record}" >> $delta_file_name
        fi
done < $file_name

while read record
do
        primary_key_fields=`echo $record | cut -d "|" -f 1-${fields_to_compare}`
        flag=`fgrep "${primary_key_fields}" ${file_name}`
        if [ -z "${flag}" ]; then
                current_record="D|"`echo ${record| sed "s/|I|/|D|/g"`
                echo "${current_record}" >> $delta_file_name
        fi
done < ${yesterday_file_name}
fields_to_compare, fields_to_skip and file_name are the parameters passed to the script. In the following case:

fields_to_compare=1 (primary Key fields: aaa, bbb etc)
fields_to_skip =3 (From 4th field i need to consider as data part)
file_name=today_file

My Input is :

Yesterdays File (yester_file)

aaa|xxxxxxxxxxxxxxxxxxxxxxxxx|I|mmmmmmmmm
bbb|xxxxxxxxxxxxxxxxxxxxxxxxx|I|nnnnnnnnnnnnnn
ccc|xxxxxxxxxxxxxxxxxxxxxxxxx|I|bbbbbbbbbbbbbbb

Todays File (today_file)

aaa|xxxxxxxxxxxxxxxxxxxxxxxxx|I|vvvvvvvvvvvvvvvvvvv
bbb|xxxxxxxxxxxxxxxxxxxxxxxxx|I|kkkkkkkkkkkkkkkk
ddd|xxxxxxxxxxxxxxxxxxxxxxxxx|I|zzzzzzzzzzzzzzzzz

Output File (deltafile)

C|aaa|xxxxxxxxxxxxxxxxxxxxxxxxx|U|vvvvvvvvvvvvvvvvvvv
C|bbb|xxxxxxxxxxxxxxxxxxxxxxxxx|U|kkkkkkkkkkkkkkkk
A|ddd|xxxxxxxxxxxxxxxxxxxxxxxxx|I|zzzzzzzzzzzzzzzzz
D|ccc|xxxxxxxxxxxxxxxxxxxxxxxxx|D|bbbbbbbbbbbbbbb

Last edited by reborg; 10-10-2006 at 04:06 PM..
 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 08:39 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0