Performance issue - to read line by line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Performance issue - to read line by line
# 1  
Old 03-14-2017
Performance issue - to read line by line

All- We have a performance issue in reading a file line by line. Please find attached scripts for the same. Currently it is taking some 45 min to parse "512444" lines.

Could you please have a look at it and provide any suggestions to improve the performance.

Thanks,
Balu


------------------- start of the code --------------------------
Code:
echo " start of the script:`date`"
process_each_record()
{
  record=$1
  #record_type=`echo $record | sed 's/\(^.....\).*/\1/'`
  record_type=${record%"${record#?????}"}

   case $record_type  in
      'FHEAD')  record=`echo ${record_type}${VER}${INPUT_FILE}`;
                ;;

      'THEAD')
                REGISTER=`echo $record | cut -c 16-20 |tr "?" " "| tr -d ' '`;
                TRAN_NO=`echo $record | cut -c 35-44  |tr "?" " "| tr -d ' '`;
                TRAN_HEAD_SEQ_NO=$(( TRAN_HEAD_SEQ_NO + 1 ));
                TRAN_TRAN_DISC_SEQ_NO=0
                TRAN_ITEM_SEQ_NO=0
                TRAN_ITEM_DISC_SEQ_NO=0
                TRAN_ITEM_TAX_SEQ_NO=0
                TRAN_TENDER_SEQ_NO=0
                TRAN_CUSTOMER_SEQ_NO=0
                ;;

      'IDISC')
                TRAN_ITEM_DISC_SEQ_NO=$(( TRAN_ITEM_DISC_SEQ_NO + 1 ));
                ;;
      'TITEM')
                TRAN_ITEM_SEQ_NO=$(( TRAN_ITEM_SEQ_NO + 1 ));
                TRAN_ITEM_DISC_SEQ_NO=0
                TRAN_ITEM_TAX_SEQ_NO=0
                ;;


      'IGTAX')
                TRAN_ITEM_TAX_SEQ_NO=$(( TRAN_ITEM_TAX_SEQ_NO + 1 ));
                ;;

      'TTEND')
                TRAN_TENDER_SEQ_NO=$(( TRAN_TENDER_SEQ_NO + 1 ));
                ;;

      'TCUST')
                TRAN_CUSTOMER_SEQ_NO=$(( TRAN_CUSTOMER_SEQ_NO + 1 ));
                ;;

      *)
                ;;

  esac
        echo "${LINE_NO}${TRANS_SOURCE}${STORE_DAY_SEQ_NO}${STORE}${BUSINESS_DATE}${TRAN_HEAD_SEQ_NO}${REGISTER}${TRAN_NO}${SALESPERSON}${TRAN_TRAN_DISC_SEQ_NO}${TRAN_ITEM_SEQ_NO}${TRAN_ITEM_DISC_SEQ_NO}${TRAN_ITEM_TAX_SEQ_NO}${TRAN_TENDER_SEQ_NO}${TRAN_CUSTOMER_SEQ_NO}${TRAN_SEQ_FUTURE_USE}${record}" >> ${Test_output_data}

}


########### define the variables to appened to all the files ###
typeset -Z10 LINE_NO=0
export TRANS_SOURCE='C'
typeset -Z10 STORE_DAY_SEQ_NO=999901
typeset -Z4  STORE=9999
typeset -Z8  BUSINESS_DATE=20170314
typeset -Z10 TRAN_HEAD_SEQ_NO=0
typeset -Z10 REGISTER=0
typeset -Z10 TRAN_NO=0
typeset -Z11  SALESPERSON=0
typeset -Z3  TRAN_TRAN_DISC_SEQ_NO=0
typeset -Z4  TRAN_ITEM_SEQ_NO=0
typeset -Z4  TRAN_ITEM_DISC_SEQ_NO=0
typeset -Z4  TRAN_ITEM_TAX_SEQ_NO=0
typeset -Z4  TRAN_TENDER_SEQ_NO=0
typeset -Z3  TRAN_CUSTOMER_SEQ_NO=0
typeset -Z4  TRAN_SEQ_FUTURE_USE=0

export VER='CC'

export Test_output_data='test_output_data.log'
export INPUT_FILE='test_input_data.txt'


while read line1
do
  LINE_NO=$((LINE_NO + 1))
  process_each_record "${line1}"
done < ${INPUT_FILE}


echo " end of the script:`date`"

------------------- end of the code --------------------------

Last edited by Scrutinizer; 03-14-2017 at 03:11 PM.. Reason: code tags
# 2  
Old 03-14-2017
Hi, IMO this is the biggest culprit:
Code:
                REGISTER=`echo $record | cut -c 16-20 |tr "?" " "| tr -d ' '`;
                TRAN_NO=`echo $record | cut -c 35-44  |tr "?" " "| tr -d ' '`;

What is your OS and version and what is your shell ?
# 3  
Old 03-14-2017
AIX nmrmsdbint01 1 7 00C801E74C00 (uname -a)
and korn shell
# 4  
Old 03-14-2017
I second Scrutinizer: For those two lines, 2 * 3 * 73656 = 441936 processes must be costly created; fortunately, those are the only lines running external programs; all the remaining calculations are done using shell internals. Recent shells can do "parameter expansions" like "substring expansion" and "pattern substitution", so presumably no externals were required. Not sure why you translate ? to a space, and then delete all spaces? You can delete several chars in one go with tr.
PLUS, the redirected output file is opened and closed 512444 times.

And, all THEAD records seem to be identical?

To come to a conclusion, I think shell is not the tool of choice when it comes to analysing large text files. Use taylored tools, awk or alike.

Last edited by RudiC; 03-14-2017 at 07:33 PM.. Reason: Added the comment on repeated redirection.
# 5  
Old 03-14-2017
hi- Can you please help in writing the same in single awk command.

Note: I just copied THEAD multiple times to set the input data.
# 6  
Old 03-14-2017
If you give us some sample output to work on - I'm not going to run some script for 45 min to know what the target would be.
# 7  
Old 03-14-2017
Not sure I fully and correctly understood and interpreted your script, but you could try and comment on
Code:
awk '
BEGIN           {TRANS_SOURCE = "C"
                 STORE_DAY_SEQ_NO = 999901
                 STORE = 9999
                 BUSINESS_DATE = 20170314
                 TRAN_HEAD_SEQ_NO = 0
                 REGISTER = 0
                 TRAN_NO = 0
                 SALESPERSON = 0
                 TRAN_TRAN_DISC_SEQ_NO = 0
                 TRAN_ITEM_SEQ_NO = 0
                 TRAN_ITEM_DISC_SEQ_NO = 0
                 TRAN_ITEM_TAX_SEQ_NO = 0
                 TRAN_TENDER_SEQ_NO = 0
                 TRAN_CUSTOMER_SEQ_NO = 0
                 TRAN_SEQ_FUTURE_USE = 0
                 VER = "CC"
                }


                {RECORD   = $0
                 TYPE     = substr ($0, 1, 5)
                 if (TYPE == "FHEAD")    RECORD   = TYPE VER FILENAME

                 if (TYPE == "THEAD")   {REGISTER = substr ($0, 16,  5); gsub (/[? ]/, "")
                                         TRAN_NO  = substr ($0, 35, 10); gsub (/[? ]/, "")
                                         TRAN_HEAD_SEQ_NO++
                                         TRAN_TRAN_DISC_SEQ_NO = 0
                                         TRAN_ITEM_SEQ_NO      = 0
                                         TRAN_ITEM_DISC_SEQ_NO = 0
                                         TRAN_ITEM_TAX_SEQ_NO  = 0
                                         TRAN_TENDER_SEQ_NO    = 0
                                         TRAN_CUSTOMER_SEQ_NO  = 0
                                        }

                 if (TYPE == "IDISC")    TRAN_ITEM_DISC_SEQ_NO++

                 if (TYPE == "TITEM")   {TRAN_ITEM_SEQ_NO++
                                         TRAN_ITEM_DISC_SEQ_NO = 0
                                         TRAN_ITEM_TAX_SEQ_NO  = 0
                                        }

                 if (TYPE == "IGTAX")    TRAN_ITEM_TAX_SEQ_NO++

                 if (TYPE == "TTEND")    TRAN_TENDER_SEQ_NO++

                 if (TYPE == "TCUST")    TRAN_CUSTOMER_SEQ_NO++

                 printf "%10d%1c%10d%4d%8d%10d%10d%10d%11d%3d%4d%4d%4d%4d%3d%4d%s\n",   NR, TRANS_SOURCE, STORE_DAY_SEQ_NO, STORE, BUSINESS_DATE, TRAN_HEAD_SEQ_NO, 
                                                                                        REGISTER, TRAN_NO, SALESPERSON, TRAN_TRAN_DISC_SEQ_NO, TRAN_ITEM_SEQ_NO, 
                                                                                        TRAN_ITEM_DISC_SEQ_NO, TRAN_ITEM_TAX_SEQ_NO, TRAN_TENDER_SEQ_NO, TRAN_CUSTOMER_SEQ_NO, 
                                                                                        TRAN_SEQ_FUTURE_USE, RECORD
                }
' file
         1C    999901999920170314         0         0         0          0  0   0   0   0   0  0   0FHEADCCfile
         2C    999901999920170314         1      8050         1          0  0   0   0   0   0  0   0THEAD00000000028050?201703130000000000000001????????????????????SALE??SEND??0000000000?????1???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????10055969-1??????????????????????????????????????????????????P00000000000000000000????P00000000000000000000P00000000000000000000??????????HANNAH?PAINE??????????????????FORT?WORTH?TX?76177?????????????????????????????????????????OMS3????????????1005596927.00????????????????????????08:30:5810055969-1??E4X001034989357????????????????????????????sdfds9fwfww????????sdfds9fwfww????????
         3C    999901999920170314         1      8050         1          0  0   0   0   0   0  1   0TCUST000000000353973005????????test?123?456????????????????????????????????????????????????????????????????????????????????????????????????????????????32A?dsfsfs?erewrw????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????sdfdsf?WEST????????????????????????????????????????????????????????????????????????????????????????????????????????????42423??????????????????????????SDFDSFS???????????WERWERW???????????h.SDFDSFS@GMAIL.com???????????????????????????????????????????????????????????????????????????????????????????????00001
.
.
.


Last edited by RudiC; 03-14-2017 at 07:30 PM..
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Performance issue to read line by line

Hi All- we have performance issue in unix to read line by line. I am looking at processing all the records. description: Our script will read data from a flat file, it will pickup first four character and based on the value it will set up variables accordingly and appended the final output to... (11 Replies)
Discussion started by: balu1729
11 Replies

2. Shell Programming and Scripting

[BASH] read 'line' issue with leading tabs and virtual line breaks

Heyas I'm trying to read/display a file its content and put borders around it (tui-cat / tui-cat -t(ypwriter). The typewriter-part is a 'bonus' but still has its own flaws, but thats for later. So in some way, i'm trying to rewrite cat using bash and other commands. But sadly it fails on... (2 Replies)
Discussion started by: sea
2 Replies

3. Shell Programming and Scripting

Read line, issue with leading - and {}'s

Heyas With my forum search term 'issue with leading dash' i found 2 closed threads which sadly didnt help me. Also me was to eager to add the script, that i didnt properly test, and just now figured this issue. So i have this code: if ] then while read line do line="${line/-/'\-'}"... (7 Replies)
Discussion started by: sea
7 Replies

4. Shell Programming and Scripting

How to read file line by line and compare subset of 1st line with 2nd?

Hi all, I have a log file say Test.log that gets updated continuously and it has data in pipe separated format. A sample log file would look like: <date1>|<data1>|<url1>|<result1> <date2>|<data2>|<url2>|<result2> <date3>|<data3>|<url3>|<result3> <date4>|<data4>|<url4>|<result4> What I... (3 Replies)
Discussion started by: pat_pramod
3 Replies

5. Shell Programming and Scripting

Need a program that read a file line by line and prints out lines 1, 2 & 3 after an empty line...

Hello, I need a program that read a file line by line and prints out lines 1, 2 & 3 after an empty line... An example of entries in the file would be: SRVXPAPI001 ERRO JUN24 07:28:34 1775 REASON= 0000, PROCID= #E506 #1065: TPCIPPR, INDEX= 003F ... (8 Replies)
Discussion started by: Ferocci
8 Replies

6. Shell Programming and Scripting

how to read the contents of two files line by line and compare the line by line?

Hi All, I'm trying to figure out which are the trusted-ips and which are not using a script file.. I have a file named 'ip-list.txt' which contains some ip addresses and another file named 'trusted-ip-list.txt' which also contains some ip addresses. I want to read a line from... (4 Replies)
Discussion started by: mjavalkar
4 Replies

7. Shell Programming and Scripting

while read LINE issue

Hi, This is the script and the error I am receiving Can anyone please suggest ? For the exmaple below assume we are using vg01 #!/bin/ksh echo "##### Max Mount Count Fixer #####" echo "Please insert Volume Group name to check" read VG lvs |grep $VG | awk {'print $1'} > /tmp/audit.log ... (2 Replies)
Discussion started by: galuzan
2 Replies

8. Shell Programming and Scripting

Multi Line 'While Read' command issue when using sh -c

Hi, I'm trying to run the following command using sh -c ie sh -c "while read EachLine do rm -f $EachLine ; done < file_list.lst;" It doesn't seem to do anything. When I run this at the command line, it does remove the files contained in the list so i know the command works ie... (4 Replies)
Discussion started by: chrispward
4 Replies

9. Shell Programming and Scripting

While loop read line Issue

Hi I am using while loop, below, to read lines from a very large file, around 400,000 rows. The script works fine until around line 300k but then starts giving incorrect result. I have tried running the script with a smaller data set and it works fine. I made sure to include the line where... (2 Replies)
Discussion started by: saurabhkumar198
2 Replies

10. Shell Programming and Scripting

bash: read file line by line (lines have '\0') - not full line has read???

I am using the while-loop to read a file. The file has lines with null-terminated strings (words, actually.) What I have by that reading - just a first word up to '\0'! I need to have whole string up to 'new line' - (LF, 10#10, 16#A) What I am doing wrong? #make file 'grb' with... (6 Replies)
Discussion started by: alex_5161
6 Replies
Login or Register to Ask a Question