Speeding up processing a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Speeding up processing a file
# 1  
Old 07-19-2008
Speeding up processing a file

Hi guys, I'm hoping you can help me here. I've knocked up a script that looks at a (huge) log file, and pulls from each line the hour of each transaction and how long each transaction took.

The data is stored sequentially as:

07:01 blah blah blah 12456 blah
07:03 blah blah blah 234 blah
08:02 blah blah blah 9 blah


My script works, but because it searches through the whole script for each occurence of hour x , it is searching the file 24 times and taking about 45 seconds to complete.

The pseudo code for what I have written is:

HOUR=0
while HOUR < 24; do
read in file line by line
find line matching $HOUR and pass column 5 to array
HOUR = HOUR + 1
done


Obviously my actual code is a lot more complex than that, I can add it if it helps, but I think the above simplifies my request for help

Does anyone have any idea how to speed up the way I am doing this? I had a go at doing it like:

for variable in `cat ANYOLDFILE`
do
HOUR = first 2 chars of line
TRANSACTIONTIME=column 4
done


But that seemed to take just as long and I'm sure there must be someway to speed this up.

Using ksh by the way.

Thanks in advance for any help. Smilie
# 2  
Old 07-19-2008
to calculate the totals per hour:

nawk -f dl.awk ANYOLDFILE

dl.awk:
Code:
{
  tot[substr($1,1,2)] += $5
}
END {
  for(i in tot)
    printf("Hour->[%s]: [%d]\n", i, tot[i])
}

# 3  
Old 07-19-2008
Hi, thanks for that, really handy script, but I was probably a bit vague in my request (my heads a bit fried from getting it working - albeit too slowly)

The output I require is along the lines of:


Hour Count Trans time <1 sec Count Trans time <2 sec etc.
07-08___________54______________________02
08-09___________23______________________07
09-10___________00 _____________________04
...
23-24___________14______________________25

So what it displays is total number of transactions taking less than x seconds between 7 and 8 am and between 8 and 9 etc.

Last edited by dlam; 07-19-2008 at 11:06 AM..
# 4  
Old 07-19-2008
I'm bit lost of what the meaning of the input columns is AND what you're trying to report on.
Can you try to elaborate on the meaning of the input columns and the desired output. You're quoting a desired output, but it does not jive with your sample output and the description of the output is somewhat hard to decipher.

Another sample input and a corresponding output would help along with the more detailed explanation.
# 5  
Old 07-19-2008
Ok, I'll try and explain it a little better.

The input file contains details of transactions including time of transaction and total time taken. The desired output is the total number of transactions per time taken, broken down into hourly periods.

An example of the input would be:

07:01 blah blah blah 3 blah
07:03 blah blah blah 2 blah
08:02 blah blah blah 1 blah
08:05 blah blah blah 1 blah
09:10 blah blah blah 3 blah

And the desired output would be:

Time___________1 second_______2 second________3 second
07:00 - 07:59_______0____________1______________1
08:00 - 08:59_______2____________0______________0
09:00 - 09:59_______0____________0______________1

Which shows that between 7 and 8 there was one transaction that took 2 seconds and one transaction that took 3 seconds. Between 8 and 9 there were two transactions that took 2 seconds and between 9 and 10 there was 1 transaction that took 3 seconds.

Hope that clarfies it a bit.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help 'speeding' up this 'parsing' script - taking 24+ hours to run

Hi, I've written a ksh script that read a file and parse/filter/format each line. The script runs as expected but it runs for 24+ hours for a file that has 2million lines. And sometimes, the input file has 10million lines which means it can be running for more than 2 days and still not finish.... (9 Replies)
Discussion started by: newbie_01
9 Replies

2. Shell Programming and Scripting

Speeding up shell script with grep

HI Guys hoping some one can help I have two files on both containing uk phone numbers master is a file which has been collated over a few years ad currently contains around 4 million numbers new is a file which also contains 4 million number i need to split new nto two separate files... (4 Replies)
Discussion started by: dunryc
4 Replies

3. Shell Programming and Scripting

Help speeding up script

This is my first experience writing unix script. I've created the following script. It does what I want it to do, but I need it to be a lot faster. Is there any way to speed it up? cat 'Tax_Provision_Sample.dat' | sort | while read p; do fn=`echo $p|cut -d~ -f2,4,3,8,9`; echo $p >> "$fn.txt";... (20 Replies)
Discussion started by: JohnN6
20 Replies

4. Shell Programming and Scripting

Speeding up substitutions

Hi all, I have a lookup table from which I am looking up values (from col1) and replacing them by corresponding values (from col2) in another file. lookup file a,b c,d So just replace a by b, and replace c by d. mainfile a,fvvgeggsegg,dvs a,fgeggefddddddddddg... (7 Replies)
Discussion started by: senhia83
7 Replies

5. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

6. Shell Programming and Scripting

Speeding up search and replace in a for loop

Hello, I am using sed in a for loop to replace text in a 100MB file. I have about 55,000 entries to convert in a csv file with two entries per line. The following script works to search file.txt for the first field from conversion.csv and then replace it with the second field. While it works fine,... (15 Replies)
Discussion started by: pbluescript
15 Replies

7. UNIX for Dummies Questions & Answers

Speeding/Optimizing GREP search on CSV files

Hi all, I have problem with searching hundreds of CSV files, the problem is that search is lasting too long (over 5min). Csv files are "," delimited, and have 30 fields each line, but I always grep same 4 fields - so is there a way to grep just those 4 fields to speed-up search. Example:... (11 Replies)
Discussion started by: Whit3H0rse
11 Replies

8. Shell Programming and Scripting

How to make parallel processing rather than serial processing ??

Hello everybody, I have a little problem with one of my program. I made a plugin for collectd (a stats collector for my servers) but I have a problem to make it run in parallel. My program gathers stats from logs, so it needs to run in background waiting for any new lines added in the log... (0 Replies)
Discussion started by: Samb95
0 Replies

9. UNIX for Dummies Questions & Answers

Speeding up a Shell Script (find, grep and a for loop)

Hi all, I'm having some trouble with a shell script that I have put together to search our web pages for links to PDFs. The first thing I did was: ls -R | grep .pdf > /tmp/dave_pdfs.outWhich generates a list of all of the PDFs on the server. For the sake of arguement, say it looks like... (8 Replies)
Discussion started by: Dave Stockdale
8 Replies

10. Shell Programming and Scripting

speeding up the compilation on SUN Solaris environment

Dear friends, Please let me know how do I increase the speed of my compilation in SUN Solaris environment. actually I have many subfolders which contains .cc files. when I compile makefile at the root it will take much time to compile all the subfolders and generates object(.o) files. Can... (2 Replies)
Discussion started by: swamymns
2 Replies
Login or Register to Ask a Question