Sponsored Content
Top Forums Shell Programming and Scripting Linux shell programming performance issue Post 302915099 by RudiC on Sunday 31st of August 2014 04:00:15 PM
Old 08-31-2014
You don't need awk (or similar) to improve the performance of your script. Just by the look on it, it can be seen that you run six commands (= six new processes) in the inner loop, times 50 for the lines in file 2, times millions for the lines in file1 (opening file2 millions times (even though buffered/cached)).

With your input data, and after cleaning out a few quirks in your code snippet, I find
Code:
time . XX
real    0m0.308s
user    0m0.192s
sys    0m0.119s

, while
Code:
time . YY
real    0m0.014s
user    0m0.012s
sys    0m0.000s

with YY being
Code:
while IFS='' read -r line
         do     while IFS=, read field1 field2
                        do      TMP=${line//$field1}
                                if [ $(( (${#line}- ${#TMP}) / ${#field1} )) -gt 1 ]
                                        then    sed  "s/"$field1"/"$field2"/2g"  <<<"$line" >> tmp.txt
                                        break
                                fi
                        done < file2
        done < file1
cat tmp.txt
TEXAS CALIFORNIA TX
DALLAS CALIFORNIA CALIFORNIA DA DA TEXAS

An even faster solution might be to use an array to hold file2's contents, and have the outer loop read file1, and an inner loop to iterate through the array doing the comparisons/modifications.

---------- Post updated at 22:00 ---------- Previous update was at 21:36 ----------

Modification using arrays; adapt to taste...:
Code:
unset i
while IFS=, read field1[++i] field2[i]; do : ; done < file2
while IFS='' read -r line
         do     for (( i=1; i<=${#field1[@]}; i++ ))
                        do      TMP=${line//${field1[$i]}}
                                if [ $(( (${#line}- ${#TMP}) / ${#field1[$i]} )) -gt 1 ]
                                        then    sed "s/"${field1[$i]}"/"${field2[$i]}"/2g"  <<<"$line" >> tmp.txt
                                        break
                                fi
                        done
        done < file1

Timing is similar to the first version; looks like the disk cache is quite powerful:
Code:
time . ZZ

real    0m0.015s
user    0m0.003s
sys    0m0.013s

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Can somebody advise any free Linux sever for shell programming?

Hi, everybody. I just wonder whether there are a couple of free Linux servers running as terminals where people can practice Unix Shell Programming? I'd like to set up one myself but unfortunatly can't do it. I can't switch to Linux now coz I run a couple of servers on my machine. Cygwin is... (3 Replies)
Discussion started by: belgampaul
3 Replies

2. News, Links, Events and Announcements

Announcing collectl - new performance linux performance monitor

About 4 years ago I wrote this tool inspired by Rob Urban's collect tool for DEC's Tru64 Unix. What makes this tool as different as collect was in its day is its ability to run at a low overhead and collect tons of stuff. I've expanded the general concept and even include data not available in... (0 Replies)
Discussion started by: MarkSeger
0 Replies

3. Shell Programming and Scripting

Sed issue in K Shell programming

I am doing the following script in k shell sed -i 's/FILENAME/$i/g' TEST/test$j.ctl > TEST/control$j.ctl In the file it replaces $i for all FILENAME, it doesnot replace with the value of i. I put single quotes like below sed -i 's/FILENAME/'$i'/g' TEST/test$j.ctl > TEST/control$j.ctl I... (9 Replies)
Discussion started by: toshidas2000
9 Replies

4. UNIX for Advanced & Expert Users

run win app on Linux -performance issue

We develop software for diagnostic tools for cars. we a use a portable PC(x86) runs Win98 to run our applications. Hence the working environment in the company is Windows, specifically we use BASIC to develop the GUI, communication functions, DLL, etc. and run them on the Win98 PC. We suggested... (1 Reply)
Discussion started by: raedbenz
1 Replies

5. UNIX for Advanced & Expert Users

FTP-Shell Script-Performance issue

Hello All, Request any one of Unix/Linux masters to clarify on the below. How far it is feasible to open a new ftp connection for transferring each file when there are multiple files to be sent. I have developed shell script to send all files at single stretch but some how it doesnt suit to... (3 Replies)
Discussion started by: RSC1985
3 Replies

6. UNIX for Dummies Questions & Answers

Linux machine performance issue.

One of our database server is suddenly became very slow and i have no clue what to do .Please help. I m sharing the performance inforamtion regarding cpu,harddisk,ram . ########CPU Information######## Machine Uptime Information: uptime 10:25:06 up 16:50, 1 user, load average: 5.84, 5.65,... (10 Replies)
Discussion started by: pinga123
10 Replies

7. Shell Programming and Scripting

Shell programming in Linux

Hi, I have been working on Sun Solaris since a long time. Recently I got to work on RH Linux. My Linux version details are: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux I have a simple command in my shell script: export BKPTAG=`date... (3 Replies)
Discussion started by: sagarparadkar
3 Replies

8. Red Hat

Performance issue in Linux

IN solaris, for network high-availability we are using IPMP concept, can u tell me in REDHAT LINUX what we are using... also pls share good step to read & understand the that concept... Also performance issue in linux what are step & cmd can u tell me??? (2 Replies)
Discussion started by: tiger09
2 Replies

9. Shell Programming and Scripting

Performance issue in shell script

Hi All, I am facing performance issue while rinning the LINUX shell script. I have file1 and file 2. File one is the source file and file 2 is lookup file. Need to replace if the pattern is matching in file1 with file2. The order of lookup file is important as if any match then exit... (8 Replies)
Discussion started by: ureddy
8 Replies

10. Shell Programming and Scripting

Performance Issue - Shell Script

Hi, I am beginner in shell scripting. I have written a script to parse file(s) having large number of lines each having multiple comma separated strings. But it seems like script is very slow. It took more than 30mins to parse a file with size 120MB (523564 lines), below is the script code ... (4 Replies)
Discussion started by: imrandec85
4 Replies
SG_RBUF(8)							     SG3_UTILS								SG_RBUF(8)

NAME
sg_rbuf - reads data using SCSI READ BUFFER command SYNOPSIS
sg_rbuf [--buffer=EACH] [--dio] [--help] [--mmap] [--quick] [--size=OVERALL] [--test] [--verbose] [--version] DEVICE sg_rbuf [-b=EACH_KIB] [-d] [-m] [-q] [-s=OVERALL_MIB] [-t] [-v] [-V] DEVICE DESCRIPTION
This command reads data with the SCSI READ BUFFER command and then discards it. Typically the data being read is from a disk's memory cache. It is assumed that the data is sourced quickly (although this is not guaranteed by the SCSI standards) so that it is faster than reading data from the media. This command is designed for timing transfer speeds across a SCSI transport. To fetch the data with a SCSI READ BUFFER command and optionally decode it see the sg_read_buffer utility. There is also a sg_write_buffer utility useful for downloading firmware amongst other things. This utility supports two command line syntaxes, the preferred one is shown first in the synopsis and explained in this section. A later section on the old command line syntax outlines the second group of options. OPTIONS
Arguments to long options are mandatory for short options as well. -b, --buffer=EACH where EACH is the number of bytes to be transferred by each READ BUFFER command. The default is the actual available buffer size returned by the READ BUFFER (descriptor) command. The maximum is the same as the default, hence this argument can only be used to reduce the size of each transfer to less than the device's actual available buffer size. -d, --dio use direct IO if available. This option is only available if the DEVICE is a sg driver device node (e.g. /dev/sg1). In this case the sg driver will attempt to configure the DMA from the SCSI adapter to transfer directly into user memory. This will eliminate the copy via kernel buffers. If not available then this will be reported and indirect IO will be done instead. -h, --help print usage message then exit. -m, --mmap use memory mapped IO if available. This option is only available if the DEVICE is a sg driver device node (e.g. /dev/sg1). In this case the sg driver will attempt to configure the DMA from the SCSI adapter to transfer directly into user memory. This will elimi- nate the copy via kernel buffers. -O, --old switch to older style options. -q, --quick only transfer the data into kernel buffers (typically by DMA from the SCSI adapter card) and do not move it into the user space. This option is only available if the DEVICE is a sg driver device node (e.g. /dev/sg1). -s, --size=OVERALL where OVERALL is the size of total transfer in bytes. The default is 200 MiB (200*1024*1024 bytes). The actual number of bytes transferred may be slightly less than requested since all transfers are the same size (and an integer division is involved rounding towards zero). -t, --time times the bulk data transfer component of this command. The elapsed time is printed out plus a MB/sec calculation. In this case "MB" is 1,000,000 bytes. The gettimeofday() system call is used internally for the time calculation. -v, --verbose increase level of verbosity. Can be used multiple times. -V, --version print out version string then exit. NOTES
This command is typically used on modern SCSI disks which have a RAM cache in their drive electronics. If no IO to the magnetic media, or slower devices like flash RAM, is involved then the disk may be able to source data fast enough to saturate the bandwidth of the SCSI transport. The bottleneck may then be the DMA element in the HBA, the Linux drivers or the host machine's hardware (e.g. speed of RAM). Various numeric arguments (e.g. OVERALL) may include multiplicative suffixes or be given in hexadecimal. See the "NUMERIC ARGUMENTS" sec- tion in the sg3_utils(8) man page. EXAMPLES
On the test system /dev/sg0 corresponds to a fast disk on a U2W SCSI bus (max 80 MB/sec). The disk specifications state that its cache is 4 MB. $ time ./sg_rbuf /dev/sg0 READ BUFFER reports: buffer capacity=3434944, offset boundary=6 Read 200 MiB (actual 199 MiB, 209531584 bytes), buffer size=3354 KiB real 0m5.072s, user 0m0.000s, sys 0m2.280s So that is approximately 40 MB/sec at 40 % utilization. Now with the addition of the "-q" option this throughput improves and the utiliza- tion drops to 0%. $ time ./sg_rbuf -q /dev/sg0 READ BUFFER reports: buffer capacity=3434944, offset boundary=6 Read 200 MiB (actual 199 MiB, 209531584 bytes), buffer size=3354 KiB real 0m2.784s, user 0m0.000s, sys 0m0.000s EXIT STATUS
The exit status of sg_rbuf is 0 when it is successful. Otherwise see the sg3_utils(8) man page. OLDER COMMAND LINE OPTIONS
The options in this section were the only ones available prior to sg3_utils version 1.23 . In sg3_utils version 1.23 and later these older options can be selected by either setting the SG3_UTILS_OLD_OPTS environment variable or using '--old' (or '-O) as the first option. -b=EACH_KIB where EACH_KIB is the number of Kilobytes (i.e. 1024 byte units) to be transferred by each READ BUFFER command. Similar to the --buffer=EACH option in the main description but the units are different. -d use direct IO if available. Equivalent to the --dio option in the main description. -m use memory mapped IO if available. Equivalent to the --mmap option in the main description. -N switch to the newer style options. -q only transfer the data into kernel buffers (typically by DMA from the SCSI adapter card) and do not move it into the user space. Equivalent to the --quick option in the main description. -s=OVERALL_MIB where OVERALL_MIB is the size of total transfer in Megabytes (1048576 bytes). Similar to the --size=OVERALL option in the main description but the units are different. -t times the bulk data transfer component of this command. Equivalent to the --time option in the main description. -v increase level of verbosity. Can be used multiple times. -V print out version string then exit. AUTHOR
Written by Douglas Gilbert REPORTING BUGS
Report bugs to <dgilbert at interlog dot com>. COPYRIGHT
Copyright (C) 2000-2007 Douglas Gilbert This software is distributed under the GPL version 2. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PUR- POSE. SEE ALSO
sg_read_buffer, sg_write_buffer, sg_test_rwbuf(all in sg3_utils) sg3_utils-1.23 January 2007 SG_RBUF(8)
All times are GMT -4. The time now is 03:27 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy