Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Performance issue to read line by line Post 302972393 by Don Cragun on Wednesday 4th of May 2016 10:51:52 PM
Old 05-04-2016
If we take your original code (after converting the DOS <carriage-return><linefeed> character pair line terminators into UNIX <linefeed> (AKA <newline>) single character line terminators and modifying your sample input file (test2.txt) the same way) and we time running your script 7 times on a MacBook Pro built about 2 years ago with a 2.8GHz Intel Core i7 processor and a 1Tb SSD running OS X El Capitan Version 10.11.4, the average time output looks like:
Code:
real	1m13.54s
user	0m25.80s
sys	0m45.76s

(i.e., 73.54 seconds).

If we modify your code using the suggestions Scrutinizer supplied (using a logical equivalent of:
Code:
record_type=${record%"${record#????}"}

to extract the first four characters of each record) and also get rid of the test for the existence of the output file and redirect the output from the read loop (which opens and closes the output file once) instead of opening and closing the output file once for each line read from your input file (getting rid of 9,999 opens and closes when processing your sample input) and time the following script:
Code:
#!/bin/ksh
#set -x

process_each_record() {
	###### extract first four characters  ##############
	case ${1%"${1#????}"} in
	(1111)	a1=100; a2=0; a3=0; a4=0; a5=0; a6=0; a7=0; a8=0; a9=0
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(1112)	a2=$((a2 + 1)); a3=$((a3 + 2))
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(1113)	a7=$((a7 + 1)); a5=$((a5 + 3))
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(1114)	a4=$((a4 + 3)); a6=$((a6 + 4))
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(1115)	a7=$((a7 + 1)); a9=$((a9 + 3))
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(1116)	a8=$((a8 + 1)); a5=$((a5 + 1))
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(2221)	a6=0; a7=0; a8=0; a9=0
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(2222)	a3=$((a3 + 1)); a7=$((a7 + 3))
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(3333)	a8=$((a8 + 1)); a9=$((a9 + 5))
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(5555)	a1=$((a1 + 1)); a2=$((a2 + 3)); a3=$((a3 + 1)); a4=$((a4 + 1))
		echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	(*)	echo "$line_no$a1$a2$a3$a4$a5$a6$a7$a8$a9$1"
		;;
	esac
}

######## define variables #####
typeset -Z10	a1
typeset -Z7	a2
typeset -Z3	a3
typeset -Z6	a4
typeset -Z2	a5
typeset -Z7	a6
typeset -Z9	a7
typeset -Z5	a8
typeset -Z2	a8
typeset -Z4	a9
typeset -Z10	line_no

######## initialize variables #####
a1=0; a2=0; a3=0; a4=0; a5=0; a6=0; a7=0; a8=0; a9=0; line_no=0

######## loop through the input #####
while read line1
do	line_no=$((line_no + 1))
	process_each_record "$line1"
done < test2.txt > test1_all_data.log

we get average time output:
Code:
real	0m0.32s
user	0m0.29s
sys	0m0.02s

You didn't say which version of the Korn shell you're using. The above code works with any Korn shell. If you have a 1993 or later version of ksh, you can change the line:
Code:
	case ${1%"${1#????}"} in

to:
Code:
	case ${1:0:4} in

and further reduce the average running time to:
Code:
real	0m0.17s
user	0m0.14s
sys	0m0.02s

That is better than a 99.75% reduction from your original script's running time.

If you are using a 1988 vintage ksh and don't have a /bin/ksh93 that you can use, we can still incorporate Scrutinizer's 2nd suggestion changing the above case statement to just:
Code:
	case $1 in

and change the patterns from the form:
Code:
	(1111)	assignments...

to:
Code:
	(1111*)	assignments...

and still reduce the average running time to:
Code:
real	0m0.28s
user	0m0.24s
sys	0m0.03s

which is still about a 99.62% reduction from your original script's running time and also works with any version of the Korn shell.

I hope this gives you some idea of how significant the improvement in running time can be when you get rid of unneeded invocations of external utilities and unneeded output file opens and closes.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

bash: read file line by line (lines have '\0') - not full line has read???

I am using the while-loop to read a file. The file has lines with null-terminated strings (words, actually.) What I have by that reading - just a first word up to '\0'! I need to have whole string up to 'new line' - (LF, 10#10, 16#A) What I am doing wrong? #make file 'grb' with... (6 Replies)
Discussion started by: alex_5161
6 Replies

2. Shell Programming and Scripting

While loop read line Issue

Hi I am using while loop, below, to read lines from a very large file, around 400,000 rows. The script works fine until around line 300k but then starts giving incorrect result. I have tried running the script with a smaller data set and it works fine. I made sure to include the line where... (2 Replies)
Discussion started by: saurabhkumar198
2 Replies

3. Shell Programming and Scripting

Multi Line 'While Read' command issue when using sh -c

Hi, I'm trying to run the following command using sh -c ie sh -c "while read EachLine do rm -f $EachLine ; done < file_list.lst;" It doesn't seem to do anything. When I run this at the command line, it does remove the files contained in the list so i know the command works ie... (4 Replies)
Discussion started by: chrispward
4 Replies

4. Shell Programming and Scripting

while read LINE issue

Hi, This is the script and the error I am receiving Can anyone please suggest ? For the exmaple below assume we are using vg01 #!/bin/ksh echo "##### Max Mount Count Fixer #####" echo "Please insert Volume Group name to check" read VG lvs |grep $VG | awk {'print $1'} > /tmp/audit.log ... (2 Replies)
Discussion started by: galuzan
2 Replies

5. Shell Programming and Scripting

how to read the contents of two files line by line and compare the line by line?

Hi All, I'm trying to figure out which are the trusted-ips and which are not using a script file.. I have a file named 'ip-list.txt' which contains some ip addresses and another file named 'trusted-ip-list.txt' which also contains some ip addresses. I want to read a line from... (4 Replies)
Discussion started by: mjavalkar
4 Replies

6. Shell Programming and Scripting

Need a program that read a file line by line and prints out lines 1, 2 & 3 after an empty line...

Hello, I need a program that read a file line by line and prints out lines 1, 2 & 3 after an empty line... An example of entries in the file would be: SRVXPAPI001 ERRO JUN24 07:28:34 1775 REASON= 0000, PROCID= #E506 #1065: TPCIPPR, INDEX= 003F ... (8 Replies)
Discussion started by: Ferocci
8 Replies

7. Shell Programming and Scripting

How to read file line by line and compare subset of 1st line with 2nd?

Hi all, I have a log file say Test.log that gets updated continuously and it has data in pipe separated format. A sample log file would look like: <date1>|<data1>|<url1>|<result1> <date2>|<data2>|<url2>|<result2> <date3>|<data3>|<url3>|<result3> <date4>|<data4>|<url4>|<result4> What I... (3 Replies)
Discussion started by: pat_pramod
3 Replies

8. Shell Programming and Scripting

Read line, issue with leading - and {}'s

Heyas With my forum search term 'issue with leading dash' i found 2 closed threads which sadly didnt help me. Also me was to eager to add the script, that i didnt properly test, and just now figured this issue. So i have this code: if ] then while read line do line="${line/-/'\-'}"... (7 Replies)
Discussion started by: sea
7 Replies

9. Shell Programming and Scripting

[BASH] read 'line' issue with leading tabs and virtual line breaks

Heyas I'm trying to read/display a file its content and put borders around it (tui-cat / tui-cat -t(ypwriter). The typewriter-part is a 'bonus' but still has its own flaws, but thats for later. So in some way, i'm trying to rewrite cat using bash and other commands. But sadly it fails on... (2 Replies)
Discussion started by: sea
2 Replies

10. Shell Programming and Scripting

Performance issue - to read line by line

All- We have a performance issue in reading a file line by line. Please find attached scripts for the same. Currently it is taking some 45 min to parse "512444" lines. Could you please have a look at it and provide any suggestions to improve the performance. Thanks, Balu ... (12 Replies)
Discussion started by: balu1729
12 Replies
All times are GMT -4. The time now is 02:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy