![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Comparing files | soliberus | Shell Programming and Scripting | 5 | 04-29-2008 02:37 AM |
| Splitting huge XML Files into fixsized wellformed parts | Malapha | Shell Programming and Scripting | 0 | 03-17-2008 11:35 AM |
| Comparing two files | kingofprussia | UNIX for Dummies Questions & Answers | 2 | 08-01-2007 12:25 PM |
| comparing shadow files with real files | terrym | UNIX for Advanced & Expert Users | 4 | 02-09-2007 02:38 AM |
| Huge (repeated Entry) text files | axl | SUN Solaris | 4 | 07-16-2004 07:05 AM |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
||||
|
comparing Huge Files - Performance is very bad
Hi All,
Can you please help me in resolving the following problem? My requirement is like this: 1) I have two files YESTERDAY_FILE and TODAY_FILE. Each one is having nearly two million data. 2) I need to check each record of TODAY_FILE in YESTERDAY_FILE. If exists we can skip that by treating as duplicate. 3) If does not exists, I need to check the primary key fields, say for example first 3 fields, if match is found i need check the data part, say for example 5th and 6th fields, if datapart is not matching, then i need to write the record in a new file, say OUTPUT_FILE, with a prefix of 'C' (Change). If datapart is matching skip the record. 4) If Primary Key fields match not found then i need to write the record with a prefix of A (Append). 4) After above process, i need to check each record of YESTERDAY_FILE in TODAY_FILE, if does not exists, i need to write the record with a prefix of D (Delete). I developed the following logic which is taking too much time to execute...in one minute it is creating 100 records. Performance is too bad. Can any one of you please help me out. My code is: Code:
while read record
do
primary_key_fields=`echo $record | cut -d "|" -f 1-${fields_to_compare}`
data_part=`echo $record | cut -d "|" -f ${fields_to_skip}-`
flag=`grep "${record}" ${yesterday_file_name}`
if [ -z "${flag}" ]; then
flag_tmp=`grep "${primary_key_fields}" $yesterday_file_name`
yesterday_data_part=`echo ${flag_temp} | cut -d "|" -f ${fields_to_skip}-`
if [ -z "${flag_tmp}" ] ; then
current_record="A|"${record}
elif [ "${yesterday_data_part}" != "${data_part}" ] ; then
current_record="C|"${record}|sed "s/|I|/|U|/g"
fi
echo "${current_record}" >> $delta_file_name
fi
done < $file_name
while read record
do
primary_key_fields=`echo $record | cut -d "|" -f 1-${fields_to_compare}`
flag=`fgrep "${primary_key_fields}" ${file_name}`
if [ -z "${flag}" ]; then
current_record="D|"`echo ${record| sed "s/|I|/|D|/g"`
echo "${current_record}" >> $delta_file_name
fi
done < ${yesterday_file_name}
fields_to_compare=1 (primary Key fields: aaa, bbb etc) fields_to_skip =3 (From 4th field i need to consider as data part) file_name=today_file My Input is : Yesterdays File (yester_file) aaa|xxxxxxxxxxxxxxxxxxxxxxxxx|I|mmmmmmmmm bbb|xxxxxxxxxxxxxxxxxxxxxxxxx|I|nnnnnnnnnnnnnn ccc|xxxxxxxxxxxxxxxxxxxxxxxxx|I|bbbbbbbbbbbbbbb Todays File (today_file) aaa|xxxxxxxxxxxxxxxxxxxxxxxxx|I|vvvvvvvvvvvvvvvvvvv bbb|xxxxxxxxxxxxxxxxxxxxxxxxx|I|kkkkkkkkkkkkkkkk ddd|xxxxxxxxxxxxxxxxxxxxxxxxx|I|zzzzzzzzzzzzzzzzz Output File (deltafile) C|aaa|xxxxxxxxxxxxxxxxxxxxxxxxx|U|vvvvvvvvvvvvvvvvvvv C|bbb|xxxxxxxxxxxxxxxxxxxxxxxxx|U|kkkkkkkkkkkkkkkk A|ddd|xxxxxxxxxxxxxxxxxxxxxxxxx|I|zzzzzzzzzzzzzzzzz D|ccc|xxxxxxxxxxxxxxxxxxxxxxxxx|D|bbbbbbbbbbbbbbb Last edited by reborg; 10-10-2006 at 04:06 PM.. |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|