Performance issue in UNIX while generating .dat file from large text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Performance issue in UNIX while generating .dat file from large text file
# 1  
Old 04-20-2009
Performance issue in UNIX while generating .dat file from large text file

Hello Gurus,

We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this .

Problem Definition:
/Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below portion of code. The below portion of codes reads an input file and writes them into an .dat file. The performance issue arises when there is huge volume of data in the input file.
For example: For data volume having 200,000 records is taking 38 mins to get append/write into the .dat file which increases the complete load process timings. We need to increase the performance of this proces by reducing the time its taking to append/write the records.
/*****************************************

Portion of Code from Shell Script:
/**************************************************************************************************** *******************************************
m_arr_ctr=1
cat ${m_recv_dir}/${m_glb_d92_nm}${m_glb_file_seq} |while read d92_line
do
m_brch_cd=`echo "${d92_line}" |cut -c166-168`
# This is the case when we reach the last line '*/', we just skip that line
if [ "${m_brch_cd}" = "" ]
then
continue
fi
if [ "${m_brch_cd}" = "400" ]
then
m_jv_cd=`echo "${d92_line}" |cut -c190-192`
else
m_jv_cd=${m_brch_cd}
fi
if [ ! -s tmp_d92${m_brch_cd}z${m_jv_cd} ]
then
echo "TMP" > tmp_d92${m_brch_cd}z${m_jv_cd}
m_a_d92_list[$m_arr_ctr]=tmp_d92${m_brch_cd}z${m_jv_cd}
m_a_d92_files[$m_arr_ctr]=${m_recv_dir}/gd${m_brch_cd}x${m_jv_cd}${m_glb_rate_cd}.dat
m_arr_ctr=`expr $m_arr_ctr + 1`
m_touched="N"
else
m_touched="Y"
fi
if [ m_touched = "N" ]
then
echo "${d92_line}" > ${m_recv_dir}/gd${m_brch_cd}${m_jv_cd}${m_glb_rate_cd}.dat
else
echo "${d92_line}" >> ${m_recv_dir}/gd${m_brch_cd}${m_jv_cd}${m_glb_rate_cd}.dat
fi

done
for m_file_name in `echo ${m_a_d92_files[*]}`
do
if [[ `grep "*/" ${m_file_name} | wc -l` = 0 ]]
then
echo "*/" >> ${m_file_name}
fi
done
for m_file_name in `echo ${m_a_d92_list[*]}`
do
rm -f $m_file_name
done
/************************************

Please provide your valuable suggestions. Also is there any way by using SED command for appending the output in fast way?


# 2  
Old 04-20-2009
*
Quote:
Originally Posted by KRAMA
We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this .

Problem Definition:
/Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below portion of code. The below portion of codes reads an input file and writes them into an .dat file. The performance issue arises when there is huge volume of data in the input file.
For example: For data volume having 200,000 records is taking 38 mins to get append/write into the .dat file which increases the complete load process timings. We need to increase the performance of this proces by reducing the time its taking to append/write the records.

With a file that size, you should really be using awk.
Quote:
/*****************************************

Portion of Code from Shell Script:

Please put code inside [code] tags.
Quote:
/****************************************
Code:
m_arr_ctr=1
cat ${m_recv_dir}/${m_glb_d92_nm}${m_glb_file_seq} |while read d92_line


That cat is an unnecessary external command, but since it is only run once, eliminating it wll make very little difference.

Part of the slowness is due to calling multiple external commands (many of which are unnecessary: there's no need for expr as the shell can do its own arithmetic) for every line.
Quote:
Code:
do
m_brch_cd=`echo "${d92_line}" |cut -c166-168`


What shell are you using? If it's bash or ksh93, you can replace the call to cut:

Code:
m_brch_cd=${d92_line:165:3}

Quote:
Code:
for m_file_name in `echo ${m_a_d92_files[*]}`


An unnecessary subshell (here and later) can add a significant amount of time. Use:

Code:
for m_file_name in "${m_a_d92_files[@]"

Quote:
Code:
do
if [[ `grep "*/" ${m_file_name} | wc -l` = 0 ]]


You don't need wc as well as grep:

Code:
if grep "*/" ${m_file_name} > /dev/null

Quote:
Code:
then
echo "*/" >> ${m_file_name}
fi
done
for m_file_name in `echo ${m_a_d92_list[*]}`
do
rm -f $m_file_name
done
/************************************

Please provide your valuable suggestions. Also is there any way by using SED command for appending the output in fast way?
# 3  
Old 04-20-2009
Hi johnson,

thanks for your advise. I will try to implement your suggestion and will look in the performance. Also the shell used here is ksh.
# 4  
Old 04-20-2009

Which version of ksh?
# 5  
Old 04-21-2009
Hi John,

The ksh version is 88f. Also i implemented the comand which you gave but the one having removing cut (i.e m_brch_cd=${d92_line:165:3} ) did not worked as you said it will work for ksh93 . And rest of the command did not improved the perfoprmance much . (it improved performance by 1-2 mins). Can you please help me with the suggestion of using AWK. I am very new to AWK .

Last edited by KRAMA; 04-21-2009 at 04:50 PM..
# 6  
Old 04-21-2009
Quote:
Originally Posted by KRAMA
Can you please help me with the suggestion of using AWK. I am very new to AWK ...

Please describe exactly what the script needs to do.

What files does it use for input? What is the format of those files?

What is the format of the output?
# 7  
Old 04-21-2009
Hi John,

Please find the answers as below:

Please describe exactly what the script needs to do.
This script splits the data from Detail files (i.e which are the input files for the shell script in .txt format) . In this case the detal file located at ${m_recv_dir}/${m_glb_d92_nm}${m_glb_file_seq} . which is the starting portion of the code which i posted.

This script reads the data line by line from the text file and prepare output .DAT file.

Once the .DAT file is created it puts '*/' end of file character at the bottom of the output file generated. Once the .DAT output file is generated another shell script loads data from this .DAT files to work tables of the database using SQL Loader.

What files does it use for input? What is the format of those files?

The format of the input file is .txt

What is the format of the output?

The output format is .DAT

Please do let me know what else information you need so you can help me on this..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Generating xml file from UNIX

i have a unix script which generates the csv file. the data in csv file is dynamic. how can i convert/move the data from csv file to xml. please suggest (1 Reply)
Discussion started by: archana25
1 Replies

2. Answers to Frequently Asked Questions

How to split a dat file based on another file ni UNIX?

i have two files , one is var.txt and another res.dat file var.txt contains informaton like below date,request,sales,item 20171015,1,123456,216 20171015,1,123456,217 20171015,2,345678,214 20171015,3,456789,218 and res.dat contains is a one huge file contains information like... (1 Reply)
Discussion started by: pogo
1 Replies

3. UNIX for Dummies Questions & Answers

Generating a CSV file from a text file

Hi Guys, I have a simple request. I have a file in w3c format. Each file has 2 header lines. Rest of the lines are 16 columns each. They are separated by Tab. I need to discard the first 2 lines and then write each column of the txt file into a seperate column of CSV. I tried the command below... (1 Reply)
Discussion started by: tinkugadu
1 Replies

4. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence... (8 Replies)
Discussion started by: millan
8 Replies

5. Shell Programming and Scripting

Remove <CR><LF> from the dat file in unix

Hi, The source system has created the file in the dat format and put into the linux directory as mentioned below. I want to do foloowing things. a) Delete the Line started with <CR><LF> in the record b)Also line ...........................................................<CR><LF> ... (1 Reply)
Discussion started by: mr_harish80
1 Replies

6. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

7. UNIX for Dummies Questions & Answers

How do I delete a data string from a .dat file in unix

I have a .dat file in unix and it keeps failing file validation on line x. How do I delete a data string from a .dat file in UNIX? I tried the following: sed -e 'data string' -e file name and it telling me unrecognized command (4 Replies)
Discussion started by: supergirl3954
4 Replies

8. Shell Programming and Scripting

How to read from a .dat file in Unix

Hi All, I have a .dat file named test.dat where I have stored some process IDs. Now I need to pick a process ID, one by one and then fire kill -9 for each of those. The logic should be: 1. open file <filename.dat> 2. read until last line of file 3. if process ID is found fire kill -9... (5 Replies)
Discussion started by: Sibasish
5 Replies

9. Shell Programming and Scripting

How to attach an excel file/ dat file thru unix mails

Hi. I want to attach a .xls or .dat file while sending mail thru unix. I have come across diff attachments sending options, but allthose embeds the content in the mail. I want the attachement to be send as such. Please help me out. regards Diwakar (1 Reply)
Discussion started by: diwakar82
1 Replies

10. UNIX for Dummies Questions & Answers

Unix File System performance with large directories

Hi, how does the Unix File System perform with large directories (containing ~30.000 files)? What kind of structure is used for the organization of a directory's content, linear lists, (binary) trees? I hope the description 'Unix File System' is exact enough, I don't know more about the file... (3 Replies)
Discussion started by: dive
3 Replies
Login or Register to Ask a Question