Performance issue in shell script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Performance issue in shell script
# 1  
Old 07-08-2014
Hammer & Screwdriver Performance issue in shell script

Hi All,


I am facing performance issue while rinning the LINUX shell script.

I have file1 and file 2. File one is the source file and file 2 is lookup file. Need to replace if the pattern is matching in file1 with file2.
The order of lookup file is important as if any match then exit from loop and no need to search further for that record and continue search for next record.
Code:
file1
------
one|xxxx|111111NEW YORK|abcd
two|yyy|TEXAS 222222TEXASTEXAS|defg
three|zzzz|CALIFORNIA TEXAS TEXAS 3333 CALIFORNIA|defg
four|kkkk|DALLAS DALLAS|defg
 
file2
-----
NEW YORK,NY
CALIFORNIA,CA
TEXAS,TX

If the file2 record 1st field matches with file1 record 3rd field then I need to do the below things.
  1. if the string present only once then dont replace string and just add filed2 from lookup and |2|N at the end of line
  2. if the string present more than once then leave the first occurence of string and replace the rest of occurences and add |2|Y at end of line.
if there is no match then just add space and |2|N at the end of line

So output is below.

Code:
 
one|xxxx|111111NEW YORK|abcd|NY|2|N (NEY YORK matched but present only once so not replacing. Also as match found exit from loop and no need to search and replace)
two|yyy|TEXAS 222222TXTX|defg|TX|2|Y (TEXAS present more than once and replacing from 2nd occurence and leaving the first occurence) 
three|zzzz|CALIFORNIA TEXAS TEXAS 3333 CA|defg|CA|2|Y ( only replaced the 2nd occurence of CALIFORNIA. TEXAS not replaced because if any match already done(CALIFORNIA) then no need to replace rest of matches so exit from loop.
four|kkkk|DALLAS DALLAS|defg| |2|N (no match so not replaced any thing)

I have tested the below code and its working fine but taking much time. Its processing 1 record for 1 second and I have 1000000 records to process and taking much time.
Can any one help me in tunig this script.

CODE is below

Code:
echo "Replace the string matches only once or except FIRST occurence replace ALL." >>$LOG
tot_cnt=`wc -l < $REP_FILE_PATH/$REP_FILE`
del_tmp_files
 
while IFS='' read -r line; do (to preserve leading and trailing spacees used IFS='' read -r )
i=0
while read rep_line; do
field[1]=`cut -d',' -f1 <<<"$line"`
field[2]="`cut -d',' -f2 <<<"$line"`
cnt=`echo -n "$line" | grep -o "${field[1]}" | wc -l`
if [[ "$cnt" -eq 1 ]] ; then
sed -e "s/$/|"${field[2]}"|2|N/" <<<"$line" >> tmp.txt'
break
fi
if [[ "$cnt" -gt 1 ]] ; then
sed -e "s/"${field[1]}"/"${field[2]}"/2g" -e "s/$/|"${field[2]}"|2|Y/" <<<"$line" >> tmp.txt
break
fi
let i++
if [[ "$cnt" -eq 0 && "$tot_cnt" -eq $i ]] ; then
sed -e "s/$/|" "|2|N/" <<<"$line" >> tmp.txt
fi
done < file2.txt
done< file1.txt


Last edited by rbatte1; 08-11-2014 at 12:49 PM.. Reason: Added codes
# 2  
Old 07-08-2014
Welcome ureddy,

Please wrap your code & input/output in CODE tags. Highlight the text and press the CODE button or do this:-

[CODE]Here is my code[/CODE]

...to produce:-
Code:
Here is my code

The problem I think you are having is that you are starting many sub-processes for every line of your input file. Calls such as cut, sed, etc. all have a cost to setting up the process. If you are calling them in a loop, then you may have hundreds of calls.

If you can wrap your code in CODE tags, then it will be far more readable and I will have a go at it.



Thanks, in advance,
Robin

Last edited by rbatte1; 07-08-2014 at 07:21 AM.. Reason: Undo double posting
# 3  
Old 07-08-2014
Thanks for looking into this Robin. I have added code as you specified now.
# 4  
Old 07-08-2014
Thanks for the update to mark the code. Can you do the same with the input and output? If there are multiple spaces, these get compressed when displayed as normal text - and that might be important.

For your inner loop reading file2.txt where do you plan to use the value read in as rep_line It's not anywhere else in your script.

I'm also unclear with the if.....then....break.....fi section and what is actually required here. Are you simply looking to not complete the remaining if...then.... sections? There are better ways to code that.

Can you write your logic out in words like this:-
  • For every line in file1.txt
    • Read every line in file2.txt
    • Compare them so that
      • if condition A matches I take action A
      • if condition B matches I take action A
      • if condition C matches I take action C
      • or I do action D
    • I write the output built from the input lines in format F
    • End loop
  • End loop
To get the bullet list, write your text first, then highlight the block and press the bullet list button. Having lists within lists produces the indentation to make it easier to read.



Thanks,
Robin
# 5  
Old 07-08-2014
..................
# 6  
Old 07-08-2014
Um, where did your post go? Smilie

if you post again, I will be happy take a look.


Robin
# 7  
Old 07-08-2014
Actually I was reformatting(remove temp files and parameters) my code to make it run fast so looks I deleted "rep_line". Here is the corrected code.
Code:
echo "Replace the string matches only once or except FIRST occurence replace ALL."
        tot_cnt=`wc -l < $REP_FILE_PATH/$REP_FILE`
  del_tmp_files
 
        while IFS='' read -r line; do          (to preserve leading and trailing  spacees used IFS='' read -r )
  i=0
        while read line_1; do
            field[1]=`cut -d',' -f1 <<<"$line_1"`
            field[2]="`cut -d',' -f2 <<<"$line_1"`
            cnt=`echo -n "$line" | grep -o "${field[1]}" | wc -l`
            if [[ "$cnt" -eq 1 ]] ; then
            sed -e "s/$/|"${field[2]}"|2|N/" <<<"$line" >> tmp.txt'
            break
            fi
            if [[ "$cnt" -gt 1 ]] ; then
            sed -e "s/"${field[1]}"/"${field[2]}"/2g" -e "s/$/|"${field[2]}"|2|Y/" <<<"$line" >> tmp.txt
            break
            fi
        let i++
            if [[ "$cnt" -eq 0 && "$tot_cnt" -eq $i ]] ; then
            sed -e "s/$/|"  "|2|N/" <<<"$line" >> tmp.txt
            fi
            done < file2.txt
        done< file1.txt

for your question... if any match then need to stop searching further as no need to match further and continue with the next line match.
here is the description what im doing.
Code:
For every line in file1.txt
 Read every line in file2.txt 
 Compare them so that
  if the string present only once then dont replace string and just add filed2 from file2 and |2|N at the end of line in file1 and write output to temp file.Dont check rest of lines in file2.txt as match already found.
  if the string present more than once then leave the first occurence of string and replace the rest of occurences in filed3 of file1 to field2 of file2 and add |2|Y at end of line in file1 and write output to temp file.Dont check rest of lines in file2.txt as match already found.
  if there is no match at all then just add space and |2|N at the end of line and in file1 write output to temp file.
 End loop 
End loop

Sorry for the inconvinence in reading my post as Im new user for this Forum.

Last edited by ureddy; 07-08-2014 at 10:55 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Performance Issue - Shell Script

Hi, I am beginner in shell scripting. I have written a script to parse file(s) having large number of lines each having multiple comma separated strings. But it seems like script is very slow. It took more than 30mins to parse a file with size 120MB (523564 lines), below is the script code ... (4 Replies)
Discussion started by: imrandec85
4 Replies

2. Shell Programming and Scripting

Performance problem in Shell Script

Hi, I am Shell script beginner. I wrote a shell programming that will take each line of a file1 and search for it in another file2 and give me the output of the lines that do not exist in the file2. I wrote it using do while nested loop but the problem here is its running for ever . Is there... (12 Replies)
Discussion started by: sakthisivi
12 Replies

3. Shell Programming and Scripting

Linux shell programming performance issue

Hi All, can any one help me on this please. Replace sting in FILE1.txt with FILE2.txt. FILE1.txt record must have at least one state is repeated once.But need to replace only from second occurrence in record in FILE1.txt Condition: order of searching the records in FILE2.txt is impartent.... (8 Replies)
Discussion started by: ureddy
8 Replies

4. UNIX for Dummies Questions & Answers

awk script performance issue

Hello All, I have the below excerpt of code in my shell script and it taking long time to complete, though it prints the output quickly. Is there a way to make it come out once it finds the first instance as the file size of 4.7 GB it could be going through all lines of the data file to find for... (3 Replies)
Discussion started by: Ariean
3 Replies

5. Shell Programming and Scripting

Script performance issue

hi i have written a shell script which comapare a text file data with files within number of different directories. example. Text File: i have a file /u02/abc.txt which have almost 20000 file names Directories: i have a path /u03 which have some subdirectories like a,b,c which have almost... (2 Replies)
Discussion started by: malikshahid85
2 Replies

6. Shell Programming and Scripting

Improve the performance of a shell script

Hi Friends, I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process. Please help me to make it faster and improve the performace of the script. i would be very... (10 Replies)
Discussion started by: apsprabhu
10 Replies

7. UNIX for Advanced & Expert Users

FTP-Shell Script-Performance issue

Hello All, Request any one of Unix/Linux masters to clarify on the below. How far it is feasible to open a new ftp connection for transferring each file when there are multiple files to be sent. I have developed shell script to send all files at single stretch but some how it doesnt suit to... (3 Replies)
Discussion started by: RSC1985
3 Replies

8. Shell Programming and Scripting

Performance issue with ftp script.

Hi All, I have written a script to FTP files from local server to remote server. When i try it for few number of files the scripts runs successfully. But the same script when i run for 200-300 files it gives me performanace issue by aborting the connection. Please help me out to improve the... (7 Replies)
Discussion started by: Shiv@jad
7 Replies

9. Shell Programming and Scripting

Performance issue with awk script.

Hi, The below awk script is taking about 1 hour to fetch just 11 records(columns). There are about 48000 records. The script file name is take_first_uniq.sh #!/bin/ksh if then while read line do first=`echo $line | awk -F"|" '{print $1$2$3}'` while read line2 do... (4 Replies)
Discussion started by: RRVARMA
4 Replies

10. UNIX for Advanced & Expert Users

Performance of a shell script

Hiii, I wrote a shell script for testing purpose. I have to test around 200thousand entries with the script.When i am doing only for 6000 entries its taking almost 1hour.If i test the whole testingdata it will take huge amount of time. I just want to know is it something dependent on the... (2 Replies)
Discussion started by: namishtiwari
2 Replies
Login or Register to Ask a Question