Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 07-05-2012
Registered User
 
Join Date: Jul 2009
Posts: 69
Thanks: 24
Thanked 1 Time in 1 Post
Performance analysis sed vs awk

Hi Guys,

I've wondered for some time the performance analysis between using sed and awk. say i want to print lines from a very large file. For ex say a file with 100,000 records. i want to print the lines 25,000 to 26,000 i can do so by the following commands:


Code:
sed -n '25000,26000 p' filename


Code:
awk 'NR==25000,NR==26000' filename

both will yield the same results but which one is better or is there such a thing ?

Thanks
Sponsored Links
    #2  
Old 07-05-2012
Mead Rotor
 
Join Date: Aug 2005
Location: Saskatchewan
Posts: 16,373
Thanks: 490
Thanked 2,535 Times in 2,418 Posts
It depends on your implementations of sed and awk, so isn't a certain thing. It can vary quite a lot, and our results may not be relevant to your system.

I'd be curious whether this is faster than your other awk expression: awk '(NR>=25000)&&(NR<=26000)

The surefire way to find out is to try...

And, of course, it may be possible to alter your program's logic such that you don't need to eat 25,000 useless lines before your program can start working... What exactly are you trying to do here?
The Following User Says Thank You to Corona688 For This Useful Post:
Irishboy24 (07-05-2012)
Sponsored Links
    #3  
Old 07-05-2012
Registered User
 
Join Date: Jul 2009
Posts: 69
Thanks: 24
Thanked 1 Time in 1 Post
i think your awk expression would be much faster:


Code:
awk '(NR>=25000)&&(NR<=26000)

Well i'm not really trying to achieve anything but i've been asked this question many times with regards to performance so i thought i'll post it on the forums to find a good explanation.

Thanks
    #4  
Old 07-05-2012
Mead Rotor
 
Join Date: Aug 2005
Location: Saskatchewan
Posts: 16,373
Thanks: 490
Thanked 2,535 Times in 2,418 Posts
I saw a thread recently where a similar question was asked, and performance shown for something like 4 or 5 different awk and sed implementations. The numbers were quite odd. There didn't seem to be any clear answer.

So, it really just comes down to what works better for you.
Sponsored Links
    #5  
Old 07-05-2012
Registered User
 
Join Date: Jul 2009
Posts: 69
Thanks: 24
Thanked 1 Time in 1 Post
agreed mate. thanks for indulging me though !!
Sponsored Links
    #6  
Old 07-06-2012
Registered User
 
Join Date: Feb 2009
Location: high ground
Posts: 30
Thanks: 1
Thanked 3 Times in 3 Posts
you could always try a comparison script to see how these are performing on your machine - e.g.


Code:
#!/bin/sh
time `sed -n '25000,26000 p' filename`
time `awk 'NR==25000,NR==26000' filename`
exit 0

it would be trivial to add more information, or to cron this and take samples several times a day for a week - while sed and awk are among the more mature chunks of code in a modern Unix system and one would figure that both are about as quick and elegant as they will ever get, there are still plenty of variables that could make a difference. You may find that during certain times of day or when certain other processes are running, the speed differences may vary wildly.
The Following User Says Thank You to zer0sig For This Useful Post:
Irishboy24 (07-06-2012)
Sponsored Links
    #7  
Old 07-06-2012
Scrutinizer's Avatar
Moderator
 
Join Date: Nov 2008
Location: Amsterdam
Posts: 7,345
Thanks: 144
Thanked 1,754 Times in 1,591 Posts
When using shell "time" the backticks are not required as "time" is part of the shell syntax. Also, it is good to direct the output to /dev/null, while testing and the system cache needs to be taken into account, so that either all reads are from an already cached situation (so for example, perform all tests twice and take the latter resuls), or that you create provisions, so that there is no caching or caching is reset for every test.


--
The thread with different awks and greps can be found here. The speed difference between the various awk implementations can vary wildly..
The Following User Says Thank You to Scrutinizer For This Useful Post:
Irishboy24 (07-06-2012)
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
What is the best tools for performance data gathering and analysis? devyfong Red Hat 6 12-21-2011 10:08 AM
Routing table vulnerability comparison between two versions and analysis of performance in a scenari coolvaibhav Linux 0 07-27-2010 02:37 AM
Announcing collectl - new performance linux performance monitor MarkSeger News, Links, Events and Announcements 0 10-26-2007 06:14 PM



All times are GMT -4. The time now is 05:57 PM.