sed or awk editing help


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed or awk editing help
# 22  
Old 11-01-2010
@ygemici: There is no fundamental difference between having a loop that you go through twice and two or more consecutive search and replace commands and now you come with one with 3 search and replace commands. I have been trying to explain that many times in this thread. One more example: I could write this:
Code:
sed -r ':a;s/(,|^)  *(,|$)/\1\2/g;ta'

or this:
Code:
sed -r 's/(,|^)  *(,|$)/\1\2/g;s/(,|^)  *(,|$)/\1\2/g'

It does not matter: they are practically equivalent. The first one is shorter though.

Last edited by Scrutinizer; 11-01-2010 at 05:01 PM..
# 23  
Old 11-01-2010
A loop saves pasting it in a second time, runtime versus very minor coding effort. Each pass is relatively expensive, so if 1 is not enough and 2 is enough, looping an extra time to 3 tries is silly (and you always have a dry pass). Actually, sometimes 2 is silly:
Code:
$ sed -r '
  s/(,|^)  *(,|$)/,,/g
  t n
  b
  :n
  s/(,|^)  *(,|$)/,,/g
 ' <<! | cat -e
aaa,   bbb,ccc   ,    ,    ,dddd
    ,ee  ee,   
!
aaa,   bbb,ccc   ,,,dddd$
,,ee  ee,,$
 
$

Of course, I have embraced some very fat data sets! Smilie
# 24  
Old 11-01-2010
At the third time there are no further matches (dry pass as you call it) and so that is a relatively cheap pass. I found only a small performance difference between one or the other (<8%). Another consideration is that using the loop means short code that is easy to understand.

---------- Post updated at 21:22 ---------- Previous update was at 21:05 ----------

We could simplify the two statements since we can do the first field in the first run and the last field in the second run:
Code:
sed -r 's/(,|^)  *,/\1,/g;s/,  *(,|$)/,\1/g'

This runs 20% faster.

---------- Post updated at 21:30 ---------- Previous update was at 21:22 ----------

OK, this really is lots and lots faster; it only takes 1/5th of the time !!
Code:
sed -r 's/^  *,/,/;s/,  *,/,,/g;s/,  *,/,,/g;s/,  *$/,/'

but it is not because of the absence of the loop, since:
Code:
sed -r 's/^  *,/,/;:a;s/,  *,/,,/g;ta;s/,  *$/,/'

this is only 10% slower than this fast solution.

Obviously we can now drop the -r:
Code:
sed 's/^  *,/,/;s/,  *,/,,/g;s/,  *,/,,/g;s/,  *$/,/'

Code:
sed 's/^  *,/,/;:a;s/,  *,/,,/g;ta;s/,  *$/,/'

But that made no difference in performance.

==========
Apparently it is the lack of capturing groups, alternation and back references that makes all the difference!!
==========

Last edited by Scrutinizer; 11-01-2010 at 05:48 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 25  
Old 11-01-2010
I created some extra , with that last one. Yes, lack of iteration speeds things up! Does the t n b :n improve things over just two passes?
Code:
$ sed '
  s/^  *,/,/
  s/,  *$/,/
  s/,  *,/,,/g
  t n
  b
  :n
  s/,  *,/,,/g
 ' <<! |cat -e
aaa,   bbb,ccc   ,    ,    ,dddd
    ,ee  ee,   
!
aaa,   bbb,ccc   ,,,dddd$
,ee  ee,$ 
$

PS: How big did you expand the data set for the timings?

Last edited by DGPickett; 11-01-2010 at 06:03 PM..
# 26  
Old 11-01-2010
DG, not lack of iteration, but lack of capturing groups, alternation and back references is what is speeding things up dramatically is what I am finding.

The "t n b :n" bit in your suggestion is on par with my last part with the ta bit, i.e. 10% slower than without the loop.

Quote:
Originally Posted by DGPickett
PS: How big did you expand the data set for the timings?
I used 128K lines

Last edited by Scrutinizer; 11-01-2010 at 06:02 PM..
# 27  
Old 11-01-2010
Iteration in the alternation? Yes, the (|) looks like a time sponge. So, are back references implicitly slow?
# 28  
Old 11-01-2010
Not even iteration in the alternation. The real culprit appears to be the use of capturing groups! To test this I used this:
Code:
sed -r 's/^  *,/,/;s/,  *,/,,/g;s/,  *,/,,/g;s/,  *$/,/'

and changed it to this:
Code:
sed -r 's/(^)  *,/,/;s/(,)  *,/,,/g;s/(,)  *,/,,/g;s/,  *($)/,/'

So only capturing groups without alternation or back references.
=======
The latter took 5x as long!!
=======
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Editing files with sed or something similar

{ "AFafa": "FAFA","AFafa": "FAFA" "baseball":"soccer","wrestling":"dancing" "rhinos":"crocodiles","roles":"foodchain" } I need to insert a new line before the closing brackets "}" so that the final output looks like this: { "AFafa": "FAFA","AFafa": "FAFA"... (6 Replies)
Discussion started by: SkySmart
6 Replies

2. Shell Programming and Scripting

editing file with awk cut and sed

HI All, I am new to unix. I have a file would like to do some editing by using awk, cut and sed. Could anyone help? This file contain 100 lines. There are one line for example: 2,"102343454",5060,"579668","579668","579668","SIP",,,"825922","035885221283026",1,268,"00:59:00.782 APR 17... (2 Replies)
Discussion started by: mimilaw
2 Replies

3. UNIX for Dummies Questions & Answers

sed help finding and editing

With sed 1. I need to find a line that contains "DVM" and "73069". 2. I need to insert a double quote at the beginning of the first line of the file. These two have been driving me crazy for the last 45 minutes. Any help would be greatly appreciated. Thanks (3 Replies)
Discussion started by: nlassiter
3 Replies

4. UNIX for Dummies Questions & Answers

sed editing help....

Hello all, I need some help with sed. seems like i cant get through it. So here is what i am trying. when i do ps -ef|grep bla blah ...like below...i get /u01/app/oracle/11g/bin/tnslsnr .... but i want to replace that string with something using sed. So basically i want to get rid of... (3 Replies)
Discussion started by: abdul.irfan2
3 Replies

5. Shell Programming and Scripting

Line/Variable Editing for Awk sed Cut

Hello, i have a file, i open the file and read the line, i want to get the first item in the csv file and also teh third+6 item and wirte it to a new csv file. only problem is that using echo it takes TOO LONG: please help a newbie. below is my code: WorkingDir=$1 FileName=`cut -d ',' -f... (2 Replies)
Discussion started by: limamichelle
2 Replies

6. Shell Programming and Scripting

Comparison and editing of files using awk.(And also a possible bug in awk for loop?)

I have two files which I would like to compare and then manipulate in a way. File1: pictures.txt 1.1 1.3 dance.txt 1.2 1.4 treehouse.txt 1.3 1.5 File2: pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244 dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2... (1 Reply)
Discussion started by: linuxkid
1 Replies

7. Shell Programming and Scripting

problem in using sed command in editing a file

Hi all, I have a conf file, i want to update some entries in that conf file. Below is the code for that using a temporary file. sed '/workgroup=/ c\workgroup=Workgroup' /usr/local/netx.conf > /usr/local/netx.conf.tmp mv -f /usr/local/netx.conf.tmp /usr/local/netx.conf Sample contents of... (9 Replies)
Discussion started by: ranj14r
9 Replies

8. Homework & Coursework Questions

String editing using sed? awk?

1. The problem statement, all variables and given/known data: Problem Statement for project: When an account is created on the CS Unix network, a public html directory is created in the account's home directory. A default web page is put into that directory. Some users replace or... (13 Replies)
Discussion started by: peage1475
13 Replies

9. Shell Programming and Scripting

Editing Commas in a textfile using sed

Hi guys task removing the last commas of 5th and 6th columns. The bug in the script is causing effect because of whitespaces around commas. I tried to delete white spaces first and running the above script. but still some where getting the results wrong. I already have a script to do this... (12 Replies)
Discussion started by: repinementer
12 Replies

10. Shell Programming and Scripting

Editing File using awk/sed

Hello Awk Gurus, Can anyone of you help me with the below problem. I have got a file having data in below format pmFaultyTransportBlocks ----------------------- 9842993 pmFrmNoOfDiscRachFrames ----------------------- NULL pmNoRecRandomAccSuccess -----------------------... (4 Replies)
Discussion started by: Mohammed
4 Replies
Login or Register to Ask a Question