Performance analysis sed vs awk

07-05-2012

Registered User

76, 2

Join Date: Jul 2009

Last Activity: 22 June 2016, 9:40 AM EDT

Posts: 76

Thanks Given: 29

Thanked 2 Times in 2 Posts

Performance analysis sed vs awk

Hi Guys,

I've wondered for some time the performance analysis between using sed and awk. say i want to print lines from a very large file. For ex say a file with 100,000 records. i want to print the lines 25,000 to 26,000 i can do so by the following commands:

Code:

sed -n '25000,26000 p' filename

Code:

awk 'NR==25000,NR==26000' filename

both will yield the same results but which one is better or is there such a thing ?

Thanks

Irishboy24

View Public Profile for Irishboy24

Find all posts by Irishboy24

07-05-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

It depends on your implementations of sed and awk, so isn't a certain thing. It can vary quite a lot, and our results may not be relevant to your system.

I'd be curious whether this is faster than your other awk expression: awk '(NR>=25000)&&(NR<=26000)

The surefire way to find out is to try...

And, of course, it may be possible to alter your program's logic such that you don't need to eat 25,000 useless lines before your program can start working... What exactly are you trying to do here?

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

07-05-2012

Registered User

76, 2

Join Date: Jul 2009

Last Activity: 22 June 2016, 9:40 AM EDT

Posts: 76

Thanks Given: 29

Thanked 2 Times in 2 Posts

i think your awk expression would be much faster:

Code:

awk '(NR>=25000)&&(NR<=26000)

Well i'm not really trying to achieve anything but i've been asked this question many times with regards to performance so i thought i'll post it on the forums to find a good explanation.

Thanks

Irishboy24

View Public Profile for Irishboy24

Find all posts by Irishboy24

07-05-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

I saw a thread recently where a similar question was asked, and performance shown for something like 4 or 5 different awk and sed implementations. The numbers were quite odd. There didn't seem to be any clear answer.

So, it really just comes down to what works better for you.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

07-05-2012

Registered User

76, 2

Join Date: Jul 2009

Last Activity: 22 June 2016, 9:40 AM EDT

Posts: 76

Thanks Given: 29

Thanked 2 Times in 2 Posts

agreed mate. thanks for indulging me though !!

Irishboy24

View Public Profile for Irishboy24

Find all posts by Irishboy24

07-06-2012

Registered User

31, 3

Join Date: Feb 2009

Last Activity: 26 September 2017, 7:43 AM EDT

Location: high ground

Posts: 31

Thanks Given: 1

Thanked 3 Times in 3 Posts

you could always try a comparison script to see how these are performing on your machine - e.g.

Code:

#!/bin/sh
time `sed -n '25000,26000 p' filename`
time `awk 'NR==25000,NR==26000' filename`
exit 0

it would be trivial to add more information, or to cron this and take samples several times a day for a week - while sed and awk are among the more mature chunks of code in a modern Unix system and one would figure that both are about as quick and elegant as they will ever get, there are still plenty of variables that could make a difference. You may find that during certain times of day or when certain other processes are running, the speed differences may vary wildly.

This User Gave Thanks to zer0sig For This Post:

zer0sig

View Public Profile for zer0sig

Find all posts by zer0sig

07-06-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

When using shell "time" the backticks are not required as "time" is part of the shell syntax. Also, it is good to direct the output to /dev/null, while testing and the system cache needs to be taken into account, so that either all reads are from an already cached situation (so for example, perform all tests twice and take the latter resuls), or that you create provisions, so that there is no caching or caching is reset for every test.

--
The thread with different awks and greps can be found here. The speed difference between the various awk implementations can vary wildly..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

UNIX for Dummies Questions & Answers

Performance analysis sed vs awk

8 More Discussions You Might Find Interesting

1. Red Hat

What is the best tools for performance data gathering and analysis?

Discussion started by: devyfong

2. Shell Programming and Scripting

Log Analysis with AWK with Time difference

Discussion started by: fabian3010

3. Shell Programming and Scripting

Date and time range extraction via Awk or analysis script?

Discussion started by: competitions

4. Linux

Routing table vulnerability comparison between two versions and analysis of performance in a scenari

Discussion started by: coolvaibhav

5. Shell Programming and Scripting

Increase sed performance

Discussion started by: gpaulose

6. UNIX for Advanced & Expert Users

WEB Server Log File Analysis using awk/sed/grep

Discussion started by: mike_cataldo@ad

7. UNIX for Advanced & Expert Users

sed performance

Discussion started by: f3k

8. Shell Programming and Scripting

AWK script: decrypt text uses frequency analysis

Discussion started by: SerJel