I am beginning to write many korn shell scripts these days, and was wondering what book is good as far as sed goes. I know there is a book on both sed and awk from O'Reilly, but was wondering if there is a decent book on sed alone.
I have this for awk, which has been around for a while but still seems to be the favorite amongst many developers:
Standard sed doesn't support that many functions and all of its functionality is covered by a reasonably brief document: sed @ opengroup
If you feel that you need more examples, you can grep through your system's start-up scripts, search the web for sed one-liners and longer scripts, and peruse this very forum.
Well, sed is so simple, I taught myself. I find there are two flavors of script, loopers and filters. Loopers read most or all lines but the first using N, so they can see multiple lines in the buffer, and loop using branch features. Filters just do things on each line, as one mainly thinks of sed doing.
Awk has variables, arrays, hash tables, a sense of fields and records divorced from the line feed, which sed lacks, but is complex enough that one might just as well learn PERL, which is more able and orthogonal. Adding to the pifalls of awk is that there was an old awk and new awk, sometimes nawk, so examples may vary in dialect.
Most of sed is also usable in grep, vi, ex, and on the command line of ksh and bash when in 'set -o vi' mode, my favorite. I started on regex line editors with qedx in MULTICs, then used them in GCOS with University of Waterloo FRED the friendly editor, before I arrived in UNIX/vi land.
Regex are extended in several flavors, so what works in sed is more than basic grep but less that egrep/"grep -E" and that is different from awk.
You can make executable pure sed or awk files if the first line is "#!/bin/sed -f". I usually write sed right on the command line, barefoot inside ', like this extra blank (all white space) line remover, a classic looper (\t is a real tab, [f a real form feed):
The '$b' ensures that N is not run at EOF, losing a line on some early, defective sed implementations. Early proprietary sed was fixed buffer but faster than GNU and later sed's with realloc()-able indirect buffers. I often kept both around for script compatibility, renaming GNU sed 'gsed'. In one app, I turned whole pages into lines using 'tr' to swap line feeds and form feeds, so I could mark up the pages in sed to insert them into the database as big strings. I am not shy about running multiple sed's in a row, as looper sed and filter sed are not friends, and get some multiprocessing in the deal:
If no next line of input is available, the N command verb shall branch to the end of the script and quit without starting a new cycle or copying the pattern space to standard output.
That is how nearly all sed implementations work. GNU sed is the defective exception (with regard to historical practice and the standard), not the rule.
Years ago, intending to report this "bug", I found that apparently enough people had reported it that the Free Software Foundation felt it necessary to address the matter in their sed bug reporting page, Reporting Bugs - sed, a stream editor:
So, it is a area where behavior is not trustworthy, and thus I never go there! Files with no line feed right before EOF tend to have that last "line" ignored by sed. Maybe that's POSIX, too. I think EOF, new line and form feed should all be treated as end of line, but it is a bit late, never mind those MAC people with just carriage return and the DOS people with both. Both made sense for teletype: the cariage took more time to return 80 columns than the platen to rise one line, so it was sent on its way first.
Since sed is pretty easy about white space, I put sed on different lines than shell, indented meaningfully, and so have never needed one or more -e options! You, too, are worthy of well formatted code, reducing your errors, potential confusion and that of future maintainers.
I have never used 'G' and 'H' or space exchange but g and h are nice for parsing situations where something is missing, so you want to annotate the original line with an error prefix and write it to a reject log. You h it on the way in, in case of rejections, and upon rejection, g it before annotation on the way out. Similarly, usually I do not use 'D', but 's/.*\n//' so the second line is not released.
The 't' is a great time saver, as the s can both modify and recognize '/../' what had been there with one regex search. Just make sure, especially in a looper, that it gets cleaned out before reuse, as the flag reflects all s since the last t or automatic read.
I have been warming up to the -n and 's/.../.../p' lately, as it fits many situations (frequency not variety), but initially I ignored them as I was interested in the most versatile tactics.
I would note that many sed flavors do not tolerate comments # whatever, which is a shame. Inline documentation can help maintenance. In C, C++, JAVA, shell and SQL I like the switch/case/when/then/else, as each case can be commented neatly!
In data warehouses and similar places with crushingly big data sets, sed's lack of temp files and near-C speed are very well respected. It has a very important role to play in a pipe-oriented shell programming paradigm, where there are no intermediate or temp files, or any temp files are managed by the tool like sort. This results in lower latency and pipeline parallel multiprocessing, as many steps run concurrently.
Using literal '|' and named pipes (/sbin/mknod p p -- one of those p's is a file name), especially the self-managed named pipes '<(...)' and '>(...)' in bash and luckier systems' ksh, you can build a tree of pipelines working one or many inputs to produce one or many outputs. (On unlucky systems, bash makes named pipes somewhere under /var/tmp that accumulate, a bug I reported.) Unfortunately for sed, the self-managed named pipes '<(...)' and '>(...)' are parsed as words in ksh (according to David Korne) and probably bash; they have virtual spaces around them that you cannot erase without passing them through a shell function call or the like. Life is sometimes excessively complicated! In the following example, the first '>(...)' after a 'w' command in $1 does not work, might resolve to, essentially, ' /dev/fd/3 ' (the writable fd number from a pipe() call), so the pipe's file name '/dev/fd/3' is unrecognized sed command line option $2, the next part of the sed script is $3 and the next named pipe, perhaps '/dev/fd/5', is $4:
Last edited by DGPickett; 11-15-2012 at 05:21 PM..
Files with no line feed right before EOF tend to have that last "line" ignored by sed. Maybe that's POSIX, too.
If a non-empty character sequence doesn't end with a newline, POSIX sed considers it invalid input. The result of such a scenario is undefined and implementation dependent.
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.
... <snip> ...
3.395 Text File
A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
I am not sure if I am posting to the right forum but I would like to buy a book which goes into Awk in detail and covers the most advanced Awk programming techniques. Would anybody be able to recommend a good book? I see plenty of books available on Amazon but I am not sure how detailed they are.... (2 Replies)
I'd like to learn C but I wanted to ask if anyone knows of a good book to start with. I came across some folks who said the best one is 'The C programming language, second edition' but some reviews said that it's not for beginners. I am learning Java and UNIX on my Mac and am familiar with... (6 Replies)
Hi Guys,
I need some help in getting a good book that describes the internals of the freeBSD OS, like the architecure, the process and memory management, etc..
I have some book which is named : the design and implementation of the freeBSD operating system, but I feel it's somewhat... (2 Replies)
I'm learning about Unix on my mac through the terminal application. I like it quite a bit. I'm finishing the chapter on Unix from my Mac OS X the missing manual, and it's whetted my appetite. Can anyone recommend a good book on beginning Unix (starting at beginner to intermediate). I'd like... (1 Reply)
Hi guys,
From where can i download a good book on AIX other than redbooks from IBM website. I am also looking for the below book.
AIX 5L Administration
By Randal K. Michael (3 Replies)
I'm an AIX newby:eek: and need to learn fast (I go on a course in a few week's time but I need to know some stuff now:mad:).
Can anybody recommend a good AIX book please? Not too basic though - I've been in software for many years (8bit/16bit/32bit, etc, Intel/Microsoft/FORTRAN/68000/anything... (9 Replies)
Just a quick request guys
As you might have guessed I've just started getting involved in Unix
The guys and the boss in the unix team (not with them yet) have given me some projects to do at my request.
Some of which involve scripting. The work is paying for me to go on a scripting... (2 Replies)
I just want to know if someone can tell me if this
book "C Programming Language (2nd Edition)
by Brian W. Kernighan, Dennis M. Ritchie" is
a good book to learn C on unix/linux ???
i'm an old (33 :)) mainframe programmer who wants to learn something else besides cobol and pl/1 ...... (2 Replies)