![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Rules & FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| no data redirected to a file with top and grep - why? | fongthai | Shell Programming and Scripting | 15 | 04-24-2008 03:30 AM |
| Using loop reading a file,retrieving data from data base. | Sonu4lov | Shell Programming and Scripting | 1 | 01-18-2007 11:38 PM |
| grep data and add to file | nbananda | Shell Programming and Scripting | 5 | 09-25-2006 07:21 AM |
| Pipe Data From Grep Into A File | katinicsdad | Shell Programming and Scripting | 4 | 09-08-2006 08:20 AM |
| grep data from files | getdpg | Shell Programming and Scripting | 2 | 01-17-2006 08:57 AM |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
Big data file - sed/grep/awk?
Morning guys. Another day another question.
I am knocking up a script to pull some data from a file. The problem is the file is very big (up to 1 gig in size), so this solution: for results in `grep "^\[$STARTHOUR" ANYOLDFILE | awk -F'|' '{print $4}'` do stuff ... works, but takes ages (we're talking minutes) to run. The data is held in this format: [06:26] [200806] [INFO] |58|33|81|UserID : 00012345| [07:26] [200806] [INFO] |63|72|79|UserID : 00012345| [08:26] [200806] [INFO] |41|34|32|UserID : 00012345| [09:26] [200806] [INFO] |54|55|44|UserID : 00012345| I'm guessing that instead of the grep part I should be using a stream editor, but I'm struggling to find out which is best, and what the syntax would be. Any ideas? |
| Forum Sponsor | ||
|
|
|
|||
|
Just getting rid of the grep should save you some cycles. Also reading the result into backticks and then looping over the result is wasteful (although some shells probably optimize that into a loop internally).
Code:
awk -F'|' "/^\[$STARTHOUR/"'{print $4}' ANYOLDFILE |
while read results; do
stuff
done
|
|
|||
|
Thanks, I'll implement that bit now and see if that makes much difference. The "stuff" part is:
for results in `grep "^\[$STARTHOUR" ANYOLDFILE | awk -F'|' '{print $4}'` do if [ $results -gt 9999 ] then (( NUMOFSECONDS[10]=NUMOFSECONDS[10]+1 )) else typeset -Z4 results AMOUNTOFSECONDSFINDER=`echo $results| cut -c1` (( NUMOFSECONDS[AMOUNTOFSECONDSFINDER]=NUMOFSECONDS[AMOUNTOFSECONDSFINDER]+1 )) fi done ... so I'm not sure what fat can be trimmed from here. Feel free to tell me if I'm missing anything obvious! |
|
|||
|
Hmmm, looks like you were right. It actually slows it down slightly reading the file in the way you suggested, so the problem is obviously in the "stuff" part.
If the loop only has a few records to handle it's fast, once it gets to a few thousand it slows to a crawl. Curses! Anyone got any thoughts on way sto improve the performance of the loop? Last edited by dlam; 06-11-2008 at 02:47 AM. |
|
|||
|
This is the line of code that you need to find a way of optimizing. As the number of results increases, this line of code is going to take longer and longer to execute.
Code:
AMOUNTOFSECONDSFINDER=`echo $results| cut -c1` |
|
|||
|
@fpmurphy: I don't think it's growing, it's just looping over the fourth field. As far as I can tell, the root cause would seem to be that the shell's arrays are not scaling nicely.
Most of what you're doing can be accomplished in awk directly just as well. typeset -Z4 appears to be a kshism to pad a number with leading zeros to the specified width, correct? I don't know if this captures all the nuances of your script, but perhaps it can be refined to do what you need. Code:
awk -F '|' '"/^\[$STARTHOUR/"'{
if ($4 > 9999) $4=10000; ++m[int($4/1000)]}
END { for (i=0; i<=10; ++i) printf ("NUMOFSECONDS[%i]=%04i\n", i, m[i]) }' ANYOLDFILE
Last edited by era; 06-11-2008 at 03:41 AM. Reason: Use int() to truncate division |
|
|||
|
Thanks guys. A little bit of editing has shown it is definitely that line causing the problem, but as era says it shouldn't be the size that is casusing the problem because it's just holding one variable at a time so the array's not performing well certainly could be the reason.
I'll have a play with your script and see if I can slot it in. Thanks again. |
|||
| Google UNIX.COM |