Speeding up processing a file

07-19-2008

Registered User

49, 0

Join Date: Mar 2008

Last Activity: 30 August 2016, 8:45 AM EDT

Posts: 49

Thanks Given: 3

Thanked 0 Times in 0 Posts

Speeding up processing a file

Hi guys, I'm hoping you can help me here. I've knocked up a script that looks at a (huge) log file, and pulls from each line the hour of each transaction and how long each transaction took.

The data is stored sequentially as:

07:01 blah blah blah 12456 blah
07:03 blah blah blah 234 blah
08:02 blah blah blah 9 blah

My script works, but because it searches through the whole script for each occurence of hour x , it is searching the file 24 times and taking about 45 seconds to complete.

The pseudo code for what I have written is:

HOUR=0
while HOUR < 24; do
read in file line by line
find line matching $HOUR and pass column 5 to array
HOUR = HOUR + 1
done

Obviously my actual code is a lot more complex than that, I can add it if it helps, but I think the above simplifies my request for help

Does anyone have any idea how to speed up the way I am doing this? I had a go at doing it like:

for variable in `cat ANYOLDFILE`
do
HOUR = first 2 chars of line
TRANSACTIONTIME=column 4
done

But that seemed to take just as long and I'm sure there must be someway to speed this up.

Using ksh by the way.

Thanks in advance for any help.

dlam

View Public Profile for dlam

Find all posts by dlam

07-19-2008

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

to calculate the totals per hour:

nawk -f dl.awk ANYOLDFILE

dl.awk:

Code:

{
  tot[substr($1,1,2)] += $5
}
END {
  for(i in tot)
    printf("Hour->[%s]: [%d]\n", i, tot[i])
}

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

07-19-2008

Registered User

49, 0

Join Date: Mar 2008

Last Activity: 30 August 2016, 8:45 AM EDT

Posts: 49

Thanks Given: 3

Thanked 0 Times in 0 Posts

Hi, thanks for that, really handy script, but I was probably a bit vague in my request (my heads a bit fried from getting it working - albeit too slowly)

The output I require is along the lines of:

Hour Count Trans time <1 sec Count Trans time <2 sec etc.
07-08___________54______________________02
08-09___________23______________________07
09-10___________00 _____________________04
...
23-24___________14______________________25

So what it displays is total number of transactions taking less than x seconds between 7 and 8 am and between 8 and 9 etc.

Last edited by dlam; 07-19-2008 at 11:06 AM..

dlam

View Public Profile for dlam

Find all posts by dlam

07-19-2008

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

I'm bit lost of what the meaning of the input columns is AND what you're trying to report on.
Can you try to elaborate on the meaning of the input columns and the desired output. You're quoting a desired output, but it does not jive with your sample output and the description of the output is somewhat hard to decipher.

Another sample input and a corresponding output would help along with the more detailed explanation.

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

07-19-2008

Registered User

49, 0

Join Date: Mar 2008

Last Activity: 30 August 2016, 8:45 AM EDT

Posts: 49

Thanks Given: 3

Thanked 0 Times in 0 Posts

Ok, I'll try and explain it a little better.

The input file contains details of transactions including time of transaction and total time taken. The desired output is the total number of transactions per time taken, broken down into hourly periods.

An example of the input would be:

07:01 blah blah blah 3 blah
07:03 blah blah blah 2 blah
08:02 blah blah blah 1 blah
08:05 blah blah blah 1 blah
09:10 blah blah blah 3 blah

And the desired output would be:

Time___________1 second_______2 second________3 second
07:00 - 07:59_______0____________1______________1
08:00 - 08:59_______2____________0______________0
09:00 - 09:59_______0____________0______________1

Which shows that between 7 and 8 there was one transaction that took 2 seconds and one transaction that took 3 seconds. Between 8 and 9 there were two transactions that took 2 seconds and between 9 and 10 there was 1 transaction that took 3 seconds.

Hope that clarfies it a bit.

dlam

View Public Profile for dlam

Find all posts by dlam

Shell Programming and Scripting

Speeding up processing a file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help 'speeding' up this 'parsing' script - taking 24+ hours to run

Discussion started by: newbie_01

2. Shell Programming and Scripting

Speeding up shell script with grep

Discussion started by: dunryc

3. Shell Programming and Scripting

Help speeding up script

Discussion started by: JohnN6

4. Shell Programming and Scripting

Speeding up substitutions

Discussion started by: senhia83

5. Programming

awk processing / Shell Script Processing to remove columns text file

Discussion started by: ajayram

6. Shell Programming and Scripting

Speeding up search and replace in a for loop

Discussion started by: pbluescript

7. UNIX for Dummies Questions & Answers

Speeding/Optimizing GREP search on CSV files

Discussion started by: Whit3H0rse

8. Shell Programming and Scripting

How to make parallel processing rather than serial processing ??

Discussion started by: Samb95

9. UNIX for Dummies Questions & Answers

Speeding up a Shell Script (find, grep and a for loop)

Discussion started by: Dave Stockdale

10. Shell Programming and Scripting

speeding up the compilation on SUN Solaris environment

Discussion started by: swamymns