sed or awk editing help

11-02-2010

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Like many features, pretty but not fast, so you avoid them for heavy lifting if possible, but sometimes you need them even when they are slower.

Of course, if the user wanted to trim leading and trailing, these 4 would run fast, but the ones searching for space first should go last, s sed works his reges left to right, so it is best to have a selective string on the left end (I guess an Arabic/Hebrew language sed would work right to left?):

Code:

sed '
  s/^  *//
  s/,  */,/
  s/  *,/,/
  s/  *$//
 '

---------- Post updated at 04:08 PM ---------- Previous update was at 04:00 PM ----------

Quote:

Originally Posted by ctsgnb

Maybe the parsing Step, indeed, when ambiguous grouping is specified, it goes through a kind of "auto completion" step.
Also maybe using a memory copy instead of a memory mapping?.

Early sed had a limited line size, and was faster with less indirection, but gnu sed and later sed's seem to have very big or realloc()'d buffers. I don't think many programmers use a mmap()'d tmp file for the buffer. Sed is a pipe-oriented stream editor, so it would not be able to map the input file all the time, and even so, it could not write there, and of course it needs to scan intermediate product on multi-command scripts. So, I am not thinking memory map and sed at the same time. I suspect as it rewrites a line, it has an input pointer and an output pointer, and if the line is expanding, then when the pointers collide, there must be either mid-substitute moves in one buffer or copying between two buffers. I am not going to read the code, though!

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

11-02-2010

Registered User

2,977, 644

Join Date: Oct 2010

Last Activity: 14 September 2019, 1:15 PM EDT

Location: France

Posts: 2,977

Thanks Given: 88

Thanked 644 Times in 613 Posts

Yeah ! Thanks for this precisions DGnitPick

By the way, here is a thread in which i put an example few days ago of what i call "ambiguous grouping"
https://www.unix.com/shell-programmin...#post302464388 see in post #21

Consider how it behaves with ambiguous matching and how the \1 and & are auto-completed and \2 also if last appearing in the line (that was on SunOS 5.9, but i got the same results on a GNU linux machine) :

Code:

# echo MPMTR20100706043000.txt|sed -n -e 's/\([0-9][0-9]\).*\(3[0-9][0-9]\)/\1,\2/p'
MPMTR20,3000.txt
# echo MPMTR20100706043000.txt|sed -n -e 's/\([0-9][0-9]\).*\(3[0-9][0-9]\)/\1,\2,&/p'
MPMTR20,300,20100706043000.txt

Last edited by ctsgnb; 11-02-2010 at 05:29 PM..

ctsgnb

View Public Profile for ctsgnb

Find all posts by ctsgnb

11-02-2010

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Hey, I love mmap(), and mmap64() more, a golden door into unlimited VM and RAM use and random access to application data. But sed is all about the (expandable) microcosm of two lines. The early small buffer sed activity fit in the small L1 caches of earlier days. I rarely nit pick below the bit level -- the pixel, maybe, but not the bit! It is about seeing all the choices, weighing all the choices, and making informed choices, investing in knowing the right technique for the next time, investing in yourself, investing in your new friends.

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

11-02-2010

Registered User

2,977, 644

Join Date: Oct 2010

Last Activity: 14 September 2019, 1:15 PM EDT

Location: France

Posts: 2,977

Thanks Given: 88

Thanked 644 Times in 613 Posts

Code:

there must be either mid-substitute moves in one buffer or copying between two buffers

Since i am not a C coder, so i trust your intuition about that dude , we have no better answer so far

ctsgnb

View Public Profile for ctsgnb

Find all posts by ctsgnb

11-02-2010

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

If you have Solaris, the truss -u'*' option shows you more than you want to know about the libc and other calls a running proc is making. JAVA object creation does a lot of memcpy()! The truss or tusc commands are very educational, even if you do not have the code, do not read C/C++, even if the process is already running! It shows all the kernel calls even without the -u'*' feature.

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

11-02-2010

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

I'm sure the multipass solutions should be faster, so
posting just to illustrate some Perl constructs: with one pass (if I'm not missing something):

Code:

perl -ple'
  s/
    ((?<=,)|(?<=^))
    \s+
    ((?=,)|(?=$))
    //xg  
  ' infile

Last edited by radoulov; 11-02-2010 at 07:07 PM..

radoulov

View Public Profile for radoulov

Find all posts by radoulov

11-02-2010

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

The single pass perl solution had is taking 2,5 times longer than the fastest sed solution. But it is twice as fast as the solutions that employed grouping. Interestingly when I tried this:

Code:

perl -ple 's/^ +,/,/;s/, +,/,,/g;s/, +,/,,/g;s/, +$/,/'

It was much faster and only about 20% slower than the equivalent fastest sed solution.

Last edited by Scrutinizer; 11-02-2010 at 07:32 PM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

sed or awk editing help

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Editing files with sed or something similar

Discussion started by: SkySmart

2. Shell Programming and Scripting

editing file with awk cut and sed

Discussion started by: mimilaw

3. UNIX for Dummies Questions & Answers

sed help finding and editing

Discussion started by: nlassiter

4. UNIX for Dummies Questions & Answers

sed editing help....

Discussion started by: abdul.irfan2

5. Shell Programming and Scripting

Line/Variable Editing for Awk sed Cut

Discussion started by: limamichelle

6. Shell Programming and Scripting

Comparison and editing of files using awk.(And also a possible bug in awk for loop?)

Discussion started by: linuxkid

7. Shell Programming and Scripting

problem in using sed command in editing a file

Discussion started by: ranj14r

8. Homework & Coursework Questions

String editing using sed? awk?

Discussion started by: peage1475

9. Shell Programming and Scripting

Editing Commas in a textfile using sed

Discussion started by: repinementer

10. Shell Programming and Scripting

Editing File using awk/sed

Discussion started by: Mohammed