Performance Issue - Shell Script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Performance Issue - Shell Script
# 1  
Old 07-28-2016
Performance Issue - Shell Script

Hi,

I am beginner in shell scripting. I have written a script to parse file(s) having large number of lines each having multiple comma separated strings.
But it seems like script is very slow. It took more than 30mins to parse a file with size 120MB (523564 lines), below is the script code
Code:
#!/bin/sh
start=`date +%s`
for FILE in $*
do
`/usr/bin/dos2unix -q $FILE`
for i in $(/bin/cat $FILE); do
        counter=1;
        islng=true;
        seq=1;
        for j in $(echo $i|/bin/sed "s/,/ /g")
        do
                if [[ "$counter" = "1" ]]
                then
                        fm=$j;
                elif [[ "$counter" = "2" ]]
                then
                        to=$j;
                else
                        if [[ "$islng" = "true" ]]
                        then
                                islng=false;
                                lng=$j;
                        else
                                islng=true;
                                lat=$(echo $j|/bin/sed "s/^M//g");
                                echo $fm"|"$to"|"$seq"|"$lng"|"$lat
                                #seq=`expr $seq + 1`;
                                (( seq++ ))
                        fi
                fi
                #counter=`expr $counter + 1`;
                (( counter++ ))
        done
done
done
end=`date +%s`
runtime=$((end-start))
echo $runtime

Input File Example:
---------------------
Code:
[oracle@IE1FUX004 crh]$ cat af5_nosr1.int
4888,4891,19.2076,-34.23549,19.2049,-34.23539
4855,4891,19.2026,-34.23579
4888,4893,19.2135,-34.23559,19.2145,-34.23559,19.2152,-34.23559,19.2164,-34.23549,19.2182,-34.23529,19.2191,-34.23519
4706,4893,19.2199,-34.24119,19.2197,-34.24049,19.2195,-34.23989,19.2193,-34.23919,19.2189,-34.23849,19.2189,-34.23809,19.2189,-34.23729,19.2189,-34.23629,19.2189,-34.23619,19.219,-34.23589,19.2192,-34.23569,19.2195,-34.23539
4897,4916,19.256,-34.23519,19.2552,-34.23529,19.254,-34.23519,19.2524,-34.23479,19.25,-34.23429,19.2495,-34.23409,19.2489,-34.23399,19.2479,-34.23369,19.2458,-34.23319,19.2439,-34.23269,19.242,-34.23219,19.2407,-34.23189,19.24,-34.23179,19.2394,-34.23179,19.2388,-34.23179,19.2384,-34.23189,19.2379,-34.23209,19.2374,-34.23229,19.2365,-34.23279,19.2356,-34.23329,19.2348,-34.23369,19.2342,-34.23389,19.2334,-34.23399

Note: 1. The example file above has 5 lines.
2. Each line begins with 2 non decimal numbers.

Expected Output(considering only first 3 lines above):
-------------------------------------------------------
Code:
4888|4891|1|19.2076|-34.23549
4888|4891|2|19.2049|-34.23539
4855|4891|1|19.2026|-34.23579
4888|4893|1|19.2135|-34.23559
4888|4893|2|19.2145|-34.23559
4888|4893|3|19.2152|-34.23559
4888|4893|4|19.2164|-34.23549
4888|4893|5|19.2182|-34.23529
4888|4893|6|19.2191|-34.23519


The actual file size is much than 120MB so I need to fix the issue. Please suggest!

Thanks,
Imran.




Moderator's Comments:
Mod Comment Please use code (not html) tags as required by forum rules!

Last edited by RudiC; 07-28-2016 at 06:45 AM.. Reason: Added/changed code tags.
# 2  
Old 07-28-2016
No surprise that script is slow when working on large files as it is overcomplicated, duplicates part of what it does, and uses external commands where builtins could be possible.
Does it have to be executed by sh, or would a more advanced shell (bash, ksh) be available? Did you consider a text processing tool (like awk)?
# 3  
Old 07-28-2016
Code:
#! /usr/bin/perl -w
use strict;

my $line = "";
my @elements = ();
open (FH, "< af5_nosr1.int");
while ($line = <FH>) {
    chomp($line);
    @elements = split(/,/, $line);
    my ($seq, $i) = (1, 2);
    for($i = 2; $i <= $#elements; $i += 2) {
        print "$elements[0]|$elements[1]|$seq|$elements[$i]|$elements[$i+1]\n";
        $seq++;
    }
}
close(FH);

# 4  
Old 07-28-2016
How about
Code:
awk '{SQ=0; for (i=3; i<=NF; i+=2) print $1, $2, ++SQ, $i, $(i+1)}' FS=, OFS="|" file
4888|4891|1|19.2076|-34.23549
4888|4891|2|19.2049|-34.23539
4855|4891|1|19.2026|-34.23579
4888|4893|1|19.2135|-34.23559
4888|4893|2|19.2145|-34.23559
4888|4893|3|19.2152|-34.23559
4888|4893|4|19.2164|-34.23549
4888|4893|5|19.2182|-34.23529
4888|4893|6|19.2191|-34.23519
4706|4893|1|19.2199|-34.24119
4706|4893|2|19.2197|-34.24049
.
.
.

Should there be problems with DOS line terminators (<CR>, \r, 0x0D), remove them by adding sub (/\r$/, ""); in front of the SQ=0 statement.
This User Gave Thanks to RudiC For This Post:
# 5  
Old 07-28-2016
Thank you RudiC.

I was sure that awk could be used here but unaware of how it can be used.

Thanks for response.
This User Gave Thanks to imrandec85 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Performance problem in Shell Script

Hi, I am Shell script beginner. I wrote a shell programming that will take each line of a file1 and search for it in another file2 and give me the output of the lines that do not exist in the file2. I wrote it using do while nested loop but the problem here is its running for ever . Is there... (12 Replies)
Discussion started by: sakthisivi
12 Replies

2. Shell Programming and Scripting

Linux shell programming performance issue

Hi All, can any one help me on this please. Replace sting in FILE1.txt with FILE2.txt. FILE1.txt record must have at least one state is repeated once.But need to replace only from second occurrence in record in FILE1.txt Condition: order of searching the records in FILE2.txt is impartent.... (8 Replies)
Discussion started by: ureddy
8 Replies

3. Shell Programming and Scripting

Performance issue in shell script

Hi All, I am facing performance issue while rinning the LINUX shell script. I have file1 and file 2. File one is the source file and file 2 is lookup file. Need to replace if the pattern is matching in file1 with file2. The order of lookup file is important as if any match then exit... (8 Replies)
Discussion started by: ureddy
8 Replies

4. UNIX for Dummies Questions & Answers

awk script performance issue

Hello All, I have the below excerpt of code in my shell script and it taking long time to complete, though it prints the output quickly. Is there a way to make it come out once it finds the first instance as the file size of 4.7 GB it could be going through all lines of the data file to find for... (3 Replies)
Discussion started by: Ariean
3 Replies

5. Shell Programming and Scripting

Script performance issue

hi i have written a shell script which comapare a text file data with files within number of different directories. example. Text File: i have a file /u02/abc.txt which have almost 20000 file names Directories: i have a path /u03 which have some subdirectories like a,b,c which have almost... (2 Replies)
Discussion started by: malikshahid85
2 Replies

6. Shell Programming and Scripting

Improve the performance of a shell script

Hi Friends, I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process. Please help me to make it faster and improve the performace of the script. i would be very... (10 Replies)
Discussion started by: apsprabhu
10 Replies

7. UNIX for Advanced & Expert Users

FTP-Shell Script-Performance issue

Hello All, Request any one of Unix/Linux masters to clarify on the below. How far it is feasible to open a new ftp connection for transferring each file when there are multiple files to be sent. I have developed shell script to send all files at single stretch but some how it doesnt suit to... (3 Replies)
Discussion started by: RSC1985
3 Replies

8. Shell Programming and Scripting

Performance issue with ftp script.

Hi All, I have written a script to FTP files from local server to remote server. When i try it for few number of files the scripts runs successfully. But the same script when i run for 200-300 files it gives me performanace issue by aborting the connection. Please help me out to improve the... (7 Replies)
Discussion started by: Shiv@jad
7 Replies

9. Shell Programming and Scripting

Performance issue with awk script.

Hi, The below awk script is taking about 1 hour to fetch just 11 records(columns). There are about 48000 records. The script file name is take_first_uniq.sh #!/bin/ksh if then while read line do first=`echo $line | awk -F"|" '{print $1$2$3}'` while read line2 do... (4 Replies)
Discussion started by: RRVARMA
4 Replies

10. UNIX for Advanced & Expert Users

Performance of a shell script

Hiii, I wrote a shell script for testing purpose. I have to test around 200thousand entries with the script.When i am doing only for 6000 entries its taking almost 1hour.If i test the whole testingdata it will take huge amount of time. I just want to know is it something dependent on the... (2 Replies)
Discussion started by: namishtiwari
2 Replies
Login or Register to Ask a Question