sed performance


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users sed performance
# 1  
Old 03-11-2008
sed performance

hello experts,

i am trying to replace a line in a 100+mb text file. the structure is similar to the passwd file, id:value1:value2 and so on. using the sed command

Code:
sed -i 's/\(123\):\([^:]\{1,\}\):/\1:bar:/' data.txt

works nicely, the line "123:foo:" is replaced by "123:bar:". however, it takes about 3 seconds.

using the grep command with environment variable LC_ALL set to "C" brings me the result instantly:

Code:
export LC_ALL=C
grep -Eh -- '(123):([^:]{1,}):' data.txt

now, the occurrences i'm looking for are all near the end of the file, so performance loss due to i/o could be avoided, since not the whole file should be rewritten. does anyone have an idea how sed could be accelerated? several ideas pop into my mind:
  • make sed somehow seek to the byte offset grep can deliver rapidly
  • only pipe the tail starting at the occurrence to sed and somehow write the tail back to the file
  • find out why sed is so slow, maybe it is matching in unicode (without LC_ALL=C, grep is slow too)

any hints or leads would be greatly appreciated Smilie

cheers,

-f3k.
# 2  
Old 03-11-2008
take note, that your sed syntax with -i makes an inplace edit, while grep doesn't replace the file. you can consider that part of the reason why you said its slow.
if you roughly know where the pattern near the end is, you can give address ranges
Code:
eg
sed -i '2000,$ s/old/new/' file

# 3  
Old 03-11-2008
thanks for the rapid answer!

i'm not entirely sure why sed is slow, its either because the search takes longer, or because it writes a lot of data, as you stated. i already tried the address parameter passing the line number, it doesn't help much, so i guess it's indeed the fact that it writes the whole file back to the disk.

i just did a quick test where i read the byte offset from grep, fseek to the position, read the rest from there, do the replace and write the result back to the same offset. the whole process took 25ms. however it's not bash Smilie
# 4  
Old 03-11-2008
seriously, 100mb+ file is not big..so i think you shouldn't really worry about that..unless its really very time critical or something...
# 5  
Old 03-11-2008
still, there's a difference between 3s and 25ms Smilie
i don't consider it very large neither, but well... it actually could be time critical, yes. maybe there is a way to tell sed only to write from where it replaced and not the whole file?
# 6  
Old 03-11-2008
The reason is how sed works: it never changes the file it is working on, but - per default - puts it results to <stdout>. The "-i" option, as ghostdog74 has pointed out, is a non-standard extension to sed and it probably works by producing an intermediary file and then replacing the original.

So the difference between what sed has to do and grep has to do is:

Code:
grep                             sed
            read the file
         parse it/change it
output to <stdout>      output to temp file
-                       replace original file with temp file

Probably you could "even the score" by having sed put its output to <stdout> too and compare the times then or - even better, as it eliminates the output delay completely - directing both grep's and sed's output to /dev/null and compare the times then.

I hope this helps.
# 7  
Old 03-11-2008
A totally different perspective

Could it be the first time you tried to edit the file you read if from disk an the second time it was in cache so there was no I/O? It will take a couple of seconds to read a 100MB file off disk.

If this is something you've never paid any attention to before you should run a tool like collectl - SourceForge.net: collectl which can show you what's going on even at the sub-second level on your system. It's amazing how ofter people just look at how long an operation takes to perform vs what the system is doing. With collectl you'll also be able to watch the cpu and memory during your tests...

-mark
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Performance analysis sed vs awk

Hi Guys, I've wondered for some time the performance analysis between using sed and awk. say i want to print lines from a very large file. For ex say a file with 100,000 records. i want to print the lines 25,000 to 26,000 i can do so by the following commands: sed -n '25000,26000 p'... (11 Replies)
Discussion started by: Irishboy24
11 Replies

2. Shell Programming and Scripting

sed / grep / for statement performance - please help

I'm searching the most effective way of doing the following task, so if someone can either provide a working solution with sed or one totally different but more effective then what I've got so far then please go ahead! The debugme directory has 3 subdirectorys and each of them has one .txt file... (7 Replies)
Discussion started by: TehOne
7 Replies

3. Shell Programming and Scripting

Increase sed performance

I'm using sed to do find and replace. But since the file is huge and i have more than 1000 files to be searched, the script is taking a lot of time. Can somebody help me with a better sed command. Below is the details. Input: 1 1 2 3 3 4 5 5 Here I know the file is sorted. ... (4 Replies)
Discussion started by: gpaulose
4 Replies

4. Solaris

best way and best performance

Hi all, I have two storadge 3510 Fc .. with 12 disks 146Gb ..total 1752Gb each storadge. I need to use about 1.4 Tb of it. and want RAID1 .. I need 13 mount points .. So question: for best performance and redundjancy how I must do it. create 13 logical drives on each stordge with same size... (1 Reply)
Discussion started by: samar
1 Replies

5. News, Links, Events and Announcements

Announcing collectl - new performance linux performance monitor

About 4 years ago I wrote this tool inspired by Rob Urban's collect tool for DEC's Tru64 Unix. What makes this tool as different as collect was in its day is its ability to run at a low overhead and collect tons of stuff. I've expanded the general concept and even include data not available in... (0 Replies)
Discussion started by: MarkSeger
0 Replies

6. UNIX for Advanced & Expert Users

I/O performance

i want to determine I/O performance of an executable, but iostat dont give correct results because the disk that i am writing to and reading from, are not physical disk of the host machine, instead of these local disks we are using a network storage. is there any standard way in unix to get... (2 Replies)
Discussion started by: gfhgfnhhn
2 Replies

7. UNIX for Advanced & Expert Users

performance

Hi, I have this on a AIX UNIX machine : ps aux| head -20 USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND root 516 23.7 0.0 12 15808 - A 19:38:15 903:13 wait root 774 23.7 0.0 12 15808 - A 19:38:15 902:13 wait root 1290 23.6 0.0 ... (2 Replies)
Discussion started by: big123456
2 Replies

8. UNIX for Advanced & Expert Users

performance

Hi, 1-in vmstat commande line, in reply, which column is the more important to look and verify if server is very slow ? 2-how can I see how many sessions are opened with the same login ? Many thanks before. (2 Replies)
Discussion started by: big123456
2 Replies

9. UNIX for Advanced & Expert Users

Performance

Hello, i have changed a slow server with Solaris 7 to a bigger one with Solaris 8 (Sun Ultra 2). Now i have a real bad performance problem (only CPU). Solaris 7 ran with standard FTP and Samba 2.0.7. The new machine is running ProFTP and Samba 2.0.9. There are a lot of NFS Shares and... (5 Replies)
Discussion started by: olso
5 Replies
Login or Register to Ask a Question