The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Data File Processing Help mavsman UNIX for Dummies Questions & Answers 5 03-27-2008 04:49 PM
processing a file with sed and awk manouche Shell Programming and Scripting 4 10-11-2007 04:25 PM
Processing a CSV file janemary.a High Level Programming 1 05-11-2007 06:27 AM
Have a shell script check for a file to exist before processing another file heprox Shell Programming and Scripting 3 11-14-2006 03:26 AM
speeding up the compilation on SUN Solaris environment swamymns Shell Programming and Scripting 2 07-12-2006 12:06 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 07-19-2008
dlam dlam is offline
Registered User
  
 

Join Date: Mar 2008
Posts: 35
Speeding up processing a file

Hi guys, I'm hoping you can help me here. I've knocked up a script that looks at a (huge) log file, and pulls from each line the hour of each transaction and how long each transaction took.

The data is stored sequentially as:

07:01 blah blah blah 12456 blah
07:03 blah blah blah 234 blah
08:02 blah blah blah 9 blah


My script works, but because it searches through the whole script for each occurence of hour x , it is searching the file 24 times and taking about 45 seconds to complete.

The pseudo code for what I have written is:

HOUR=0
while HOUR < 24; do
read in file line by line
find line matching $HOUR and pass column 5 to array
HOUR = HOUR + 1
done


Obviously my actual code is a lot more complex than that, I can add it if it helps, but I think the above simplifies my request for help

Does anyone have any idea how to speed up the way I am doing this? I had a go at doing it like:

for variable in `cat ANYOLDFILE`
do
HOUR = first 2 chars of line
TRANSACTIONTIME=column 4
done


But that seemed to take just as long and I'm sure there must be someway to speed this up.

Using ksh by the way.

Thanks in advance for any help.
  #2 (permalink)  
Old 07-19-2008
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
to calculate the totals per hour:

nawk -f dl.awk ANYOLDFILE

dl.awk:
Code:
{
  tot[substr($1,1,2)] += $5
}
END {
  for(i in tot)
    printf("Hour->[%s]: [%d]\n", i, tot[i])
}
  #3 (permalink)  
Old 07-19-2008
dlam dlam is offline
Registered User
  
 

Join Date: Mar 2008
Posts: 35
Hi, thanks for that, really handy script, but I was probably a bit vague in my request (my heads a bit fried from getting it working - albeit too slowly)

The output I require is along the lines of:


Hour Count Trans time <1 sec Count Trans time <2 sec etc.
07-08___________54______________________02
08-09___________23______________________07
09-10___________00 _____________________04
...
23-24___________14______________________25

So what it displays is total number of transactions taking less than x seconds between 7 and 8 am and between 8 and 9 etc.

Last edited by dlam; 07-19-2008 at 10:06 AM..
  #4 (permalink)  
Old 07-19-2008
vgersh99's Avatar
vgersh99 vgersh99 is online now Forum Staff  
Moderator
  
 

Join Date: Feb 2005
Location: Boston, MA
Posts: 5,119
I'm bit lost of what the meaning of the input columns is AND what you're trying to report on.
Can you try to elaborate on the meaning of the input columns and the desired output. You're quoting a desired output, but it does not jive with your sample output and the description of the output is somewhat hard to decipher.

Another sample input and a corresponding output would help along with the more detailed explanation.
  #5 (permalink)  
Old 07-19-2008
dlam dlam is offline
Registered User
  
 

Join Date: Mar 2008
Posts: 35
Ok, I'll try and explain it a little better.

The input file contains details of transactions including time of transaction and total time taken. The desired output is the total number of transactions per time taken, broken down into hourly periods.

An example of the input would be:

07:01 blah blah blah 3 blah
07:03 blah blah blah 2 blah
08:02 blah blah blah 1 blah
08:05 blah blah blah 1 blah
09:10 blah blah blah 3 blah

And the desired output would be:

Time___________1 second_______2 second________3 second
07:00 - 07:59_______0____________1______________1
08:00 - 08:59_______2____________0______________0
09:00 - 09:59_______0____________0______________1

Which shows that between 7 and 8 there was one transaction that took 2 seconds and one transaction that took 3 seconds. Between 8 and 9 there were two transactions that took 2 seconds and between 9 and 10 there was 1 transaction that took 3 seconds.

Hope that clarfies it a bit.
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 12:59 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0