Sponsored Content
The Lounge What is on Your Mind? Speculative Shell Feature Brainstorming Post 302507729 by tetsujin on Thursday 24th of March 2011 04:07:51 PM
Old 03-24-2011
Speculative Shell Feature Brainstorming

Hi - little introductory post for this thread:

The discussion started in the "What's your most useful shell?" poll thread and I think the discussion's gone on long enough that I don't want new posts related to that discussion to go there any more. It's a big discussion and it only gets bigger. So if a mod moves the other posts of the discussion, I guess this is the place it should go.

Anyway, the discussion started because my answer to "what's your most useful shell?" is "the one that hasn't been written yet" - I feel that there is room for improvement in the shell. I think the shell's usefulness has been hindered by some of its long-standing design decisions and that what we should expect out of a shell is a programming environment that's every bit as useful and nice to work with as the commonly-used "scripting languages" such as Python or Ruby. I like to think about how I would address some of those problems.

One of the things I suggested was that the shell's mechanisms for handling open files could be improved: it's only fairly recently (I think) that bash has provided support for opening a file on a numeric FD without the user having to explicitly select that FD number and ensure that it's a safe choice. I also considered the idea of tying FDs to variable lifetimes, providing scoping for them this way, and looked at different kinds of syntax for handing these files off to processes to use them.

That's kind of where the discussion resumes:
Quote:
Originally Posted by Corona688
Quote:
(I had suggested that shell variables could hold "open file handle" objects resulting from a file open operation - and that you could then redirect to/from these open filehandles with syntax like this:

Code:
$ cmd <& $fd

Maybe it's "conceptionally" the same but it's not actually the same. That's kind of more important.
Why? Programming languages are representations of ideas. As long as the ideas are conceptually sound, why does it matter if the back-end implementation is different? We use the same operators for integer math as we do for floating point math, right?

Quote:
But code written for the Bourne shell ought to and does work in any of them -- but might not work in your "shell" because your extensions break compatibility with basic, basic, basic Bourne shell features. Pick something else.
My system already has five different Bourne-compatible shells installed. If I need a Bourne-compatible shell, I can run one of those.

"Pick something else" is not as easy as it sounds. Believe me, I have looked at the problem of extending shell syntax, there's not a lot of free space, actually, short of bringing in more exotic "special characters" that wouldn't be on my keyboard (let alone international keyboards...) Bash gets pretty crazy with its syntax extensions to get around this: things like the optional Kleene star operator feature in its globbing syntax - it's kind of crazy, and it's not completely compatible either, which is why it's optional... It seems to be a choice between abandoning Bourne-shell compatibility altogether, or else making the new syntax very, very ugly and/or inconvenient. I'd choose the former.

Quote:
Quote:
(I asked, basically, what's wrong with my suggested approach to file descriptors as a special type of variable...)
It changes the meaning of existing code and existing variables, that's how. Even worse, it does it implicitly. It also completely changes the language from a weakly typed one into some bizarre mixture of weak and strong types.
The only way it breaks existing code is if
1: Someone actually tries to run a Bourne shell script in this shell (why would they do this? I don't take bash scripts and expect them to run correctly in mksh... And if I had a script that really was strictly Bourne-shell compatible, I'd probably run it in dash or something to save resources.)
2: The syntax used to open the files is the same syntax used to open files in Bourne shell.

#2 is a big one, because the Bourne shell syntax for opening a file isn't applicable to creating a "special file descriptor" thing. Since Bourne shell syntax for opening a file normally applies it to a specific, numbered file descriptor, a mechanism that creates these "special file descriptor" objects would almost certainly use a different syntax. (It could potentially use the "{varname}<filename" syntax, but there's no reason to do that - as you say it'd just cause headaches.)

Quote:
Current shells do a fairly good job of combining everything under the umbrella of "string". If you start adding variables that have no sensible meanings as strings, a Perl-like mess is what you get.
Perl is messy because it was designed to be messy: a quick and dirty means of solving problems, using syntax people would be familiar with from other tools. As such, it's kind of ugly but generally regarded as being quite effective. IMO the shell is seriously limited by this "everything is a string" philosophy. The shell should be a powerful tool for tying other tools together - but instead it's barely adequate for the job.

Shells already do support variable types that "don't translate nicely to strings". They do so because it's a useful feature. Arrays in general, associative arrays in particular.

Quote:
Quote:
(I had questioned the value of explicitly opening a file on a specific numeric file descriptor)
It's one of the most important features of the shell and a fundamental part of how inter-process communication works in a shell. It's very powerful even if you don't think you need it.
I really don't think it is. But I could be missing something...

I mean, it's very common to redirect to one of the commonly used file descriptors. Redirecting #0 and #1 is so common that you don't even have to specify the numbers explicitly. Redirecting #2 is less common but still very important. Redirecting any other number is (as far as I can tell) exceedingly rare. The only programs I know of that explicitly support it are screen and xterm - though obviously one could always write a program or shell script that uses a file descriptor specified numerically on the command line... In practice it's rarely done.

So you look at how rare it is to redirect a file descriptor other than #0 or #1 and then think about how frequently it's useful to pick one of those other file descriptors, open a file on it, and have that file remain open on every job you run until you specify otherwise. If you open a file on FD #7, most programs you run aren't even going to know, or care, that that file descriptor is open. The examples I cited (screen and xterm) will only do something with this FD if you explicitly tell them to via command-line option. This is the basis of my argument that the current design of file operations in the shell is wrong. The case where one of those jobs actually does use that FD is the exception,. not the rule - therefore I think it's more sensible for the syntax to reflect this. Rather than opening a file on FD #7, running eight jobs that don't care about FD #7 and two that do, I think it's better to open a file on a dynamically-assigned FD that's not exported to child processes by default (close-on-exec or whatever) - and then explicitly pass that FD to the child process, either via numeric FD redirection or another mechanism, in those cases where it's needed. This also makes the relationship between the open file and the programs using it more explicit in the code. You can see that a specific program is using a specific file because that file appears on the line where that job is being run.

Of course, FDs #0, #1, #2 are another story. Opening a file on FD #2 (via "exec 2> filename" or whatever) is a convenient way to specify a log file for a bunch of jobs you're going to run in a script. And you could do similarly to apply something as stdin or stdout for a whole set of jobs. They are, I'd say, the exception to my rule, the one case where applying a file to a specific file descriptor is useful,. because these file descriptors have well-established meanings that apply to nearly every job.

I think the way I'd address that, and the rare case where it's useful to attach some file to a specific numeric FD for a whole set of jobs being run, would be to stick those jobs in a code block and redirect the FDs in question for the whole block. This does have some limitations (i.e. it's more work to take an existing script and apply this kind of redirection - because you have to introduce a block around the code before you can apply a blanket redirection for a specific numeric FD) - plus it's subject to all the other criticisms I made when you suggested block redirection as a solution to providing "scoped file descriptors" (i.e. the redirection has to happen in a specific place rather than wherever it's convenient, you have to introduce another level of nesting any time you want another change to the FD binding, etc.) - so it may be worth considering other approaches - but it strikes me as a very clean way to solve the problem, and to make it very clear where that FD redirection applies.

Quote:
If you don't want it, you don't have to use a shell language.
I believe characteristics of the shell can be changed, without turning it into something other than a "shell". I do want to use a shell language, and I'm a little sick of you telling me that I don't - I just want to think about what a shell could be, beyond what it is.

Quote:
I don't think you're really reading what I'm writing here.
Don't confuse disagreement with a lack of understanding. You have raised some valid points (for instance, about redirection into loop blocks as a way to scope an open file) and I have done my best to address these sensibly (i.e. acknowledging where it works, pointing out apparent shortcomings, etc.). Where I am wrong, where my understanding of the existing shell features is incomplete, I try to be straightforward about this. Where my ideas may have shortcomings, I try to be realistic and pragmatic about these shortcomings.


---


Now, how about some fresh fodder for discussion? One thing I've been thinking about lately is how multiple redirections per process could be better handled. And just in general, how the syntax of a shell could be improved to support more complicated arrangements of jobs, jobs incorporating networks of processes instead of linear chains, doing things like feedback loops and so on...

For the purposes of this example I'm going to assume a shell that does remain Bourne compatible, at least to the extent that Bash does. So none of that "special file descriptor object" stuff here - and syntax that's basically Bash syntax, but doing things Bash can't...

First, a basic example: handling stderr. I guess this doesn't work?
Code:
$ write-error 2| error-filter | stdout-filter

What a drag. I think it'd be neat to send the error log through one pipeline and stdout through another. Something like this would be cool:
Code:
# Suppose "cmd1 2| cmd2" pipes stderr of cmd1 to stdin of cmd2, AND stdout of cmd2 to stderr:
$ cmd1 2| error-filter | stdout-filter | another 2>logfile
# Output of error-filter, and any stderr from the other processes in the job, is collected in logfile.

One could get around this by creating a fifo:
Code:
$ mkfifo fifo
$ (write-error 2> fifo | stdout-filter & < ./fifo stderr-filter >&2)
$ rm fifo   # cleanup

The second line assumes we want to filter stdout and stderr separately, and put the output of the respective filters on the desired FDs. You could do this without the parentheses (which have the undesired side-effect of treating the code evaluated within as if it were being run in a subshell - i.e. you can't change variable definitions inside the parens and have them affect things outside) - but the parens are syntactically necessary if you want to treat the thing as a unit, for instance if you want to provide additional stdout redirections but (for whatever reason) don't want to put them to the left of that ampersand.

Of course, there's some shortcomings here: For starters, you're creating a file on the filesystem for no good reason. One could get around that with a bit of syntax added to the shell:
Code:
# In the vein of "exec {f}<filename", which opens "filename" for reading, picks a numeric FD for it, and assigns that numeric FD to $f:
$ exec {in}|&{out}
# This calls pipe() and assigns the resulting numeric file descriptors to $in and $out
# Alternately, something like this could be implemented as a shell builtin without introducing new syntax.

The example then looks more like this:
Code:
$ exec {err_in}|&{err_out}
$ (write-error 2>&$err_in | stdout-filter & <&$err_out stderr-filter >&2)
# cleanup:
$ exec $err_in>&- $err_out<&-

The advantage is that there's no impact on the filesystem: the pipe only ever exists as a pair of open file descriptors in the shell process. But the cleanup is a bit messier, since two file descriptors have to be closed, instead of one file being removed.

Another subtle difference is that the pipe is always open until it's closed: unlike the mkfifo example, which essentially creates a new pipe each time the fifo file object is opened (and which blocks attempts to open the fifo for writing or reading until the opposite end of the fifo is opened by another process). This limitation of mkfifo makes it difficult to use as a general-purpose mechanism for creating pipe() FD pairs in the shell: you can only really use it if two separate jobs are opening the different ends of the fifo asynchronously. Opening both ends in the shell process simultaneously means you can apply them to multiple jobs via redirection.

So this could be used for feedback processes:
Code:
# I can't actually think of why you'd want to calculate the Fibonacci sequence in the shell
# but if you did, and wanted a list-processing style of loop...
$ exec {pin}|&{pout}
$ <&$pout (echo -e "1\n1"; read prev; while true; do read cur; echo $((prev + cur)); prev = cur; done) | tee /dev/fd/$pin

This could be further improved by localizing the definitions of $pin and $pout, as well as the pipe they represent, to the single job being run. (I think this would just require another set of parens around the whole two-line invocation...) Of course, this also means you can't perform any other actions that would affect the top-level environment of the shell, either...

What I'm trying to build toward here is a setup where one could run a bunch of jobs asynchronously and link them together in a non-linear fashion via pipes. To make this truly useful, there may be points where multiple sources of input need to be merged (for instance, if multiple jobs produced input for a single job, they couldn't simply all output to the same FIFO, they'd each need their own, and some process would have to read the data they generated and synchronize it somehow...)

Since the shell knows nothing about the format of the messages, it's not in a position to take on this job itself. (Though the shell could provide some standard ones, like synchronizing on line breaks - but it's just as easily provided by an external tool) But one solution could be to provide a syntax that allows you to define a pipe and attach one end of it to a process, all in one go:

Code:
# This is kind of a mix of process substitution syntax with the pipe syntax I introduced above...
# We have a program called message-merge which takes "filenames" of files 
# (rather, FIFOs, TCP connections - anything that can be read but can also block on a read operation)
# and which attempts to read a whole message from any one of them before writing that message to its output.
# We also have an open pipe, whose input FD is given in the environment variable $pin
$ message-merge <({m1}|) <({m2}|) >&$pin &

To make some sense of this:
"{m1}|" specifies the creation of a pipe. Since the right-hand side of the pipe doesn't go anywhere, it goes to stdout. Since the left-hand side is a name in brackets, it's a dynamically-assigned numeric FD, whose numeric value is stored in the variable $m1. An alternate might be to write "{m1}|&1" - "create a pipe whose input FD is stored in $m1, and whose output goes to FD #1 (stdout)"
"<()" is the (pre-existing) process substitution syntax. The "substitution" is a filename in /dev/fd/ - this is what message-merge sees as its command-line argument. The corresponding file descriptor (the output end of the pipe) is also made available to the message-merge process when it is run. In this case the parens also serve to terminate the pipe syntax, so it's clear that pipe "goes nowhere"
The job is run in the background because it's one of many processes in this "processing network".

Of course, for that to work properly, the code inside "<()" must not be considered a "subshell environment" - (presently, in bash, it is a subshell) otherwise, newly-opened files and new variable definitions won't make it out of that context...

Defining some elaborate, multi-job process network like this introduces another problem: cleanup of the processes, variables, and open files.

Cleanup of the open files and variables is relatively simple: stick the whole thing in parens, it'll be treated as a "subshell" evaluation (though it won't necessarily spawn a new shell process) - the jobs could be another matter. Collecting PIDs and waiting on those PIDs is a bit awkward (a job can contain a lot of processes if it's a pipeline...)

I guess if you could count on the processes terminating via SIGPIPE, then closing the pipes could be sufficient. Otherwise it might be helpful in cases like that to have a way of killing (or waiting on) all the jobs you spawned in a particular block of code. A more comprehensive solution might be something that says "run this asynchronously but treat it, and the other asynchronously-run jobs as foreground jobs" - so the script would block waiting for those jobs to finish before terminating, and would propagate termination signals to the child jobs, without the programmer needing to explicitly code in that behavior for each individual job...

Last edited by tetsujin; 03-24-2011 at 06:35 PM..
 

4 More Discussions You Might Find Interesting

1. SCO

BASH-like feature. is possible ?

Greetings... i was wondering if there is any shell configuration or third party application that enables the command history by pressing keyboard up arrow, like GNU/BASH does or are there an SCO compatible bash versio to download? where ? just wondering (sory my stinky english) (2 Replies)
Discussion started by: nEuRoMaNcEr
2 Replies

2. Shell Programming and Scripting

Creating a command history feature in a simple UNIX shell using C

I'm trying to write a history feature to a very simple UNIX shell that will list the last 10 commands used when control-c is pressed. A user can then run a previous command by typing r x, where x is the first letter of the command. I'm having quite a bit of trouble figuring out what I need to do, I... (2 Replies)
Discussion started by: -=Cn=-
2 Replies

3. UNIX for Dummies Questions & Answers

brainstorming automated response

I am managing a database of files for which there is a drop-box and multiple users. what i would like to do is set a criteria for files coming into the drop-box based on file structure. (they must be like this W*/image/W*-1234/0-999.tif) If the files do not match the criteria i want them to be... (1 Reply)
Discussion started by: Movomito
1 Replies

4. UNIX for Beginners Questions & Answers

Can we create any check-point feature in shell ?

I have a script as below and say its failed @ function fs_ck {} then it should exit and next time i execute it it should start from fs_ck {} only Please advise #!/bin/bash logging {} fs_ck {} bkp {} dply {} ## main function### echo Sstarting script echo '####' logging fs_ck... (3 Replies)
Discussion started by: abhaydas
3 Replies
All times are GMT -4. The time now is 11:11 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy