Good sed Book?

Login or Register to Ask a Question and Join Our Community

Good sed Book?

Tags

awk, sed, shell scripts

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting Good sed Book?

11-12-2012

Registered User

46, 1

Join Date: Oct 2012

Last Activity: 25 April 2019, 11:50 AM EDT

Location: Dallas, TX

Posts: 46

Thanks Given: 26

Thanked 1 Time in 1 Post

Good sed Book?

I am beginning to write many korn shell scripts these days, and was wondering what book is good as far as sed goes. I know there is a book on both sed and awk from O'Reilly, but was wondering if there is a decent book on sed alone.

I have this for awk, which has been around for a while but still seems to be the favorite amongst many developers:

The AWK Programming Language: Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger: 9780201079814: Amazon.com: Books

Any advice will be greatly appeciated!

MIA651

View Public Profile for MIA651

Find all posts by MIA651

11-12-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Standard sed doesn't support that many functions and all of its functionality is covered by a reasonably brief document: sed @ opengroup

If you feel that you need more examples, you can grep through your system's start-up scripts, search the web for sed one-liners and longer scripts, and peruse this very forum.

While I did nothing more than skim them briefly, these two links seem like a good source of examples:
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/scripts/

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

11-12-2012

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Well, sed is so simple, I taught myself. I find there are two flavors of script, loopers and filters. Loopers read most or all lines but the first using N, so they can see multiple lines in the buffer, and loop using branch features. Filters just do things on each line, as one mainly thinks of sed doing.

Awk has variables, arrays, hash tables, a sense of fields and records divorced from the line feed, which sed lacks, but is complex enough that one might just as well learn PERL, which is more able and orthogonal. Adding to the pifalls of awk is that there was an old awk and new awk, sometimes nawk, so examples may vary in dialect.

Most of sed is also usable in grep, vi, ex, and on the command line of ksh and bash when in 'set -o vi' mode, my favorite. I started on regex line editors with qedx in MULTICs, then used them in GCOS with University of Waterloo FRED the friendly editor, before I arrived in UNIX/vi land.

Regex are extended in several flavors, so what works in sed is more than basic grep but less that egrep/"grep -E" and that is different from awk.

You can make executable pure sed or awk files if the first line is "#!/bin/sed -f". I usually write sed right on the command line, barefoot inside ', like this extra blank (all white space) line remover, a classic looper (\t is a real tab, [f a real form feed):

Code:

sed '
  s/[ \t\f]*$//
  :loop
  $b
  N
  s/[ \t\f]*$//
  s/^\n$//
  t loop
  P
  s/.*\n//
  t loop
 '

The '$b' ensures that N is not run at EOF, losing a line on some early, defective sed implementations. Early proprietary sed was fixed buffer but faster than GNU and later sed's with realloc()-able indirect buffers. I often kept both around for script compatibility, renaming GNU sed 'gsed'. In one app, I turned whole pages into lines using 'tr' to swap line feeds and form feeds, so I could mark up the pages in sed to insert them into the database as big strings. I am not shy about running multiple sed's in a row, as looper sed and filter sed are not friends, and get some multiprocessing in the deal:

Code:

sed '
  s/[ \t\f]*$//
 ' | sed '
  :loop
  $b
  N
  s/^\n$//
  t loop
  P
  s/.*\n//
  t loop
 '

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

11-12-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by DGPickett

The '$b' ensures that N is not run at EOF, losing a line on some early, defective sed implementations

N not printing the pattern space when EOF is encountered is not a sign of a defective implementation. It is a sign of POSIX compliance.

To quote POSIX:

Quote:

If no next line of input is available, the N command verb shall branch to the end of the script and quit without starting a new cycle or copying the pattern space to standard output.

That is how nearly all sed implementations work. GNU sed is the defective exception (with regard to historical practice and the standard), not the rule.

Years ago, intending to report this "bug", I found that apparently enough people had reported it that the Free Software Foundation felt it necessary to address the matter in their sed bug reporting page, Reporting Bugs - sed, a stream editor:

Quote:

Originally Posted by FSF

This choice is by design.

Regards,
Alister

Last edited by alister; 11-12-2012 at 06:46 PM..

alister

View Public Profile for alister

Find all posts by alister

11-15-2012

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

So, it is a area where behavior is not trustworthy, and thus I never go there! Files with no line feed right before EOF tend to have that last "line" ignored by sed. Maybe that's POSIX, too. I think EOF, new line and form feed should all be treated as end of line, but it is a bit late, never mind those MAC people with just carriage return and the DOS people with both. Both made sense for teletype: the cariage took more time to return 80 columns than the platen to rise one line, so it was sent on its way first.

Since sed is pretty easy about white space, I put sed on different lines than shell, indented meaningfully, and so have never needed one or more -e options! You, too, are worthy of well formatted code, reducing your errors, potential confusion and that of future maintainers.

I have never used 'G' and 'H' or space exchange but g and h are nice for parsing situations where something is missing, so you want to annotate the original line with an error prefix and write it to a reject log. You h it on the way in, in case of rejections, and upon rejection, g it before annotation on the way out. Similarly, usually I do not use 'D', but 's/.*\n//' so the second line is not released.

The 't' is a great time saver, as the s can both modify and recognize '/../' what had been there with one regex search. Just make sure, especially in a looper, that it gets cleaned out before reuse, as the flag reflects all s since the last t or automatic read.

I have been warming up to the -n and 's/.../.../p' lately, as it fits many situations (frequency not variety), but initially I ignored them as I was interested in the most versatile tactics.

I would note that many sed flavors do not tolerate comments # whatever, which is a shame. Inline documentation can help maintenance. In C, C++, JAVA, shell and SQL I like the switch/case/when/then/else, as each case can be commented neatly!

In data warehouses and similar places with crushingly big data sets, sed's lack of temp files and near-C speed are very well respected. It has a very important role to play in a pipe-oriented shell programming paradigm, where there are no intermediate or temp files, or any temp files are managed by the tool like sort. This results in lower latency and pipeline parallel multiprocessing, as many steps run concurrently.

Using literal '|' and named pipes (/sbin/mknod p p -- one of those p's is a file name), especially the self-managed named pipes '<(...)' and '>(...)' in bash and luckier systems' ksh, you can build a tree of pipelines working one or many inputs to produce one or many outputs. (On unlucky systems, bash makes named pipes somewhere under /var/tmp that accumulate, a bug I reported.) Unfortunately for sed, the self-managed named pipes '<(...)' and '>(...)' are parsed as words in ksh (according to David Korne) and probably bash; they have virtual spaces around them that you cannot erase without passing them through a shell function call or the like. Life is sometimes excessively complicated! In the following example, the first '>(...)' after a 'w' command in $1 does not work, might resolve to, essentially, ' /dev/fd/3 ' (the writable fd number from a pipe() call), so the pipe's file name '/dev/fd/3' is unrecognized sed command line option $2, the next part of the sed script is $3 and the next named pipe, perhaps '/dev/fd/5', is $4:

Code:

$ sed '
  /xyz/w '>( sort -u >file_1 )'
  /abc/w '>( sort -u >file_2 )'
 .... '

Last edited by DGPickett; 11-15-2012 at 05:21 PM..

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

11-15-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by DGPickett

Files with no line feed right before EOF tend to have that last "line" ignored by sed. Maybe that's POSIX, too.

If a non-empty character sequence doesn't end with a newline, POSIX sed considers it invalid input. The result of such a scenario is undefined and implementation dependent.

POSIX Definitions:

Quote:

3.205 Line

A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

... <snip> ...

3.395 Text File

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

POSIX sed man page:

Quote:

INPUT FILES

The input files shall be text files.

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

11-16-2012

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Yes, it is just a bit of a contrast with vi, which generally shows that last line and just warns and adds the missing line feed on the save.

Code:

$ echo '1
2\c'>no-final-lf
$ vi no-final-lf
1
2
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
"no-final-lf" [Incomplete last line] 2 lines, 3 characters
:wq
1
2
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
"no-final-lf" 2 lines, 4 characters 
$ od -bc no-final-lf       
0000000   1  \n   2  \n
        061 012 062 012
0000004
$

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Looking for good book on awk

I am not sure if I am posting to the right forum but I would like to buy a book which goes into Awk in detail and covers the most advanced Awk programming techniques. Would anybody be able to recommend a good book? I see plenty of books available on Amazon but I am not sure how detailed they are....

2. UNIX for Dummies Questions & Answers

Good book on Unix

Please suggest me few good books on Unix. I have currently purchased Unix Power Tools.

3. Programming

Good book to learn C

I'd like to learn C but I wanted to ask if anyone knows of a good book to start with. I came across some folks who said the best one is 'The C programming language, second edition' but some reviews said that it's not for beginners. I am learning Java and UNIX on my Mac and am familiar with...

4. BSD

Good book about the freeBSD architecure

Hi Guys, I need some help in getting a good book that describes the internals of the freeBSD OS, like the architecure, the process and memory management, etc.. I have some book which is named : the design and implementation of the freeBSD operating system, but I feel it's somewhat...

5. UNIX for Dummies Questions & Answers

Good book on Unix

I'm learning about Unix on my mac through the terminal application. I like it quite a bit. I'm finishing the chapter on Unix from my Mac OS X the missing manual, and it's whetted my appetite. Can anyone recommend a good book on beginning Unix (starting at beginner to intermediate). I'd like...

6. AIX

Good book for AIX

Hi guys, From where can i download a good book on AIX other than redbooks from IBM website. I am also looking for the below book. AIX 5L Administration By Randal K. Michael

7. AIX

Need a Good AIX Book

I'm an AIX newby:eek: and need to learn fast (I go on a course in a few week's time but I need to know some stuff now:mad:). Can anybody recommend a good AIX book please? Not too basic though - I've been in software for many years (8bit/16bit/32bit, etc, Intel/Microsoft/FORTRAN/68000/anything...

8. Shell Programming and Scripting

Need a good scripting book

Just a quick request guys As you might have guessed I've just started getting involved in Unix The guys and the boss in the unix team (not with them yet) have given me some projects to do at my request. Some of which involve scripting. The work is paying for me to go on a scripting...

9. UNIX for Dummies Questions & Answers

Good Solaris Admin book??

Can anyone recommend a good Solaris 8 or 9 Admin book?

10. Programming

Good book

I just want to know if someone can tell me if this book "C Programming Language (2nd Edition) by Brian W. Kernighan, Dennis M. Ritchie" is a good book to learn C on unix/linux ??? i'm an old (33 :)) mainframe programmer who wants to learn something else besides cobol and pl/1 ......

Login or Register to Ask a Question