Speculative Shell Feature Brainstorming

03-24-2011

Registered User

34, 0

Join Date: Apr 2009

Last Activity: 12 July 2011, 3:09 PM EDT

Posts: 34

Thanks Given: 0

Thanked 0 Times in 0 Posts

Speculative Shell Feature Brainstorming

Hi - little introductory post for this thread:

The discussion started in the "What's your most useful shell?" poll thread and I think the discussion's gone on long enough that I don't want new posts related to that discussion to go there any more. It's a big discussion and it only gets bigger. So if a mod moves the other posts of the discussion, I guess this is the place it should go.

Anyway, the discussion started because my answer to "what's your most useful shell?" is "the one that hasn't been written yet" - I feel that there is room for improvement in the shell. I think the shell's usefulness has been hindered by some of its long-standing design decisions and that what we should expect out of a shell is a programming environment that's every bit as useful and nice to work with as the commonly-used "scripting languages" such as Python or Ruby. I like to think about how I would address some of those problems.

One of the things I suggested was that the shell's mechanisms for handling open files could be improved: it's only fairly recently (I think) that bash has provided support for opening a file on a numeric FD without the user having to explicitly select that FD number and ensure that it's a safe choice. I also considered the idea of tying FDs to variable lifetimes, providing scoping for them this way, and looked at different kinds of syntax for handing these files off to processes to use them.

That's kind of where the discussion resumes:

Quote:

Originally Posted by Corona688

Quote:

(I had suggested that shell variables could hold "open file handle" objects resulting from a file open operation - and that you could then redirect to/from these open filehandles with syntax like this:

Code:

$ cmd <& $fd

Maybe it's "conceptionally" the same but it's not actually the same. That's kind of more important.

Why? Programming languages are representations of ideas. As long as the ideas are conceptually sound, why does it matter if the back-end implementation is different? We use the same operators for integer math as we do for floating point math, right?

Quote:

But code written for the Bourne shell ought to and does work in any of them -- but might not work in your "shell" because your extensions break compatibility with basic, basic, basic Bourne shell features. Pick something else.

My system already has five different Bourne-compatible shells installed. If I need a Bourne-compatible shell, I can run one of those.

"Pick something else" is not as easy as it sounds. Believe me, I have looked at the problem of extending shell syntax, there's not a lot of free space, actually, short of bringing in more exotic "special characters" that wouldn't be on my keyboard (let alone international keyboards...) Bash gets pretty crazy with its syntax extensions to get around this: things like the optional Kleene star operator feature in its globbing syntax - it's kind of crazy, and it's not completely compatible either, which is why it's optional... It seems to be a choice between abandoning Bourne-shell compatibility altogether, or else making the new syntax very, very ugly and/or inconvenient. I'd choose the former.

Quote:

(I asked, basically, what's wrong with my suggested approach to file descriptors as a special type of variable...)

It changes the meaning of existing code and existing variables, that's how. Even worse, it does it implicitly. It also completely changes the language from a weakly typed one into some bizarre mixture of weak and strong types.

The only way it breaks existing code is if
1: Someone actually tries to run a Bourne shell script in this shell (why would they do this? I don't take bash scripts and expect them to run correctly in mksh... And if I had a script that really was strictly Bourne-shell compatible, I'd probably run it in dash or something to save resources.)
2: The syntax used to open the files is the same syntax used to open files in Bourne shell.

#2 is a big one, because the Bourne shell syntax for opening a file isn't applicable to creating a "special file descriptor" thing. Since Bourne shell syntax for opening a file normally applies it to a specific, numbered file descriptor, a mechanism that creates these "special file descriptor" objects would almost certainly use a different syntax. (It could potentially use the "{varname}<filename" syntax, but there's no reason to do that - as you say it'd just cause headaches.)

Quote:

Current shells do a fairly good job of combining everything under the umbrella of "string". If you start adding variables that have no sensible meanings as strings, a Perl-like mess is what you get.

Perl is messy because it was designed to be messy: a quick and dirty means of solving problems, using syntax people would be familiar with from other tools. As such, it's kind of ugly but generally regarded as being quite effective. IMO the shell is seriously limited by this "everything is a string" philosophy. The shell should be a powerful tool for tying other tools together - but instead it's barely adequate for the job.

Shells already do support variable types that "don't translate nicely to strings". They do so because it's a useful feature. Arrays in general, associative arrays in particular.

Quote:

(I had questioned the value of explicitly opening a file on a specific numeric file descriptor)

It's one of the most important features of the shell and a fundamental part of how inter-process communication works in a shell. It's very powerful even if you don't think you need it.

I really don't think it is. But I could be missing something...

I mean, it's very common to redirect to one of the commonly used file descriptors. Redirecting #0 and #1 is so common that you don't even have to specify the numbers explicitly. Redirecting #2 is less common but still very important. Redirecting any other number is (as far as I can tell) exceedingly rare. The only programs I know of that explicitly support it are screen and xterm - though obviously one could always write a program or shell script that uses a file descriptor specified numerically on the command line... In practice it's rarely done.

So you look at how rare it is to redirect a file descriptor other than #0 or #1 and then think about how frequently it's useful to pick one of those other file descriptors, open a file on it, and have that file remain open on every job you run until you specify otherwise. If you open a file on FD #7, most programs you run aren't even going to know, or care, that that file descriptor is open. The examples I cited (screen and xterm) will only do something with this FD if you explicitly tell them to via command-line option. This is the basis of my argument that the current design of file operations in the shell is wrong. The case where one of those jobs actually does use that FD is the exception,. not the rule - therefore I think it's more sensible for the syntax to reflect this. Rather than opening a file on FD #7, running eight jobs that don't care about FD #7 and two that do, I think it's better to open a file on a dynamically-assigned FD that's not exported to child processes by default (close-on-exec or whatever) - and then explicitly pass that FD to the child process, either via numeric FD redirection or another mechanism, in those cases where it's needed. This also makes the relationship between the open file and the programs using it more explicit in the code. You can see that a specific program is using a specific file because that file appears on the line where that job is being run.

Of course, FDs #0, #1, #2 are another story. Opening a file on FD #2 (via "exec 2> filename" or whatever) is a convenient way to specify a log file for a bunch of jobs you're going to run in a script. And you could do similarly to apply something as stdin or stdout for a whole set of jobs. They are, I'd say, the exception to my rule, the one case where applying a file to a specific file descriptor is useful,. because these file descriptors have well-established meanings that apply to nearly every job.

I think the way I'd address that, and the rare case where it's useful to attach some file to a specific numeric FD for a whole set of jobs being run, would be to stick those jobs in a code block and redirect the FDs in question for the whole block. This does have some limitations (i.e. it's more work to take an existing script and apply this kind of redirection - because you have to introduce a block around the code before you can apply a blanket redirection for a specific numeric FD) - plus it's subject to all the other criticisms I made when you suggested block redirection as a solution to providing "scoped file descriptors" (i.e. the redirection has to happen in a specific place rather than wherever it's convenient, you have to introduce another level of nesting any time you want another change to the FD binding, etc.) - so it may be worth considering other approaches - but it strikes me as a very clean way to solve the problem, and to make it very clear where that FD redirection applies.

Quote:

If you don't want it, you don't have to use a shell language.

I believe characteristics of the shell can be changed, without turning it into something other than a "shell". I do want to use a shell language, and I'm a little sick of you telling me that I don't - I just want to think about what a shell could be, beyond what it is.

Quote:

I don't think you're really reading what I'm writing here.

Don't confuse disagreement with a lack of understanding. You have raised some valid points (for instance, about redirection into loop blocks as a way to scope an open file) and I have done my best to address these sensibly (i.e. acknowledging where it works, pointing out apparent shortcomings, etc.). Where I am wrong, where my understanding of the existing shell features is incomplete, I try to be straightforward about this. Where my ideas may have shortcomings, I try to be realistic and pragmatic about these shortcomings.

---

Now, how about some fresh fodder for discussion? One thing I've been thinking about lately is how multiple redirections per process could be better handled. And just in general, how the syntax of a shell could be improved to support more complicated arrangements of jobs, jobs incorporating networks of processes instead of linear chains, doing things like feedback loops and so on...

For the purposes of this example I'm going to assume a shell that does remain Bourne compatible, at least to the extent that Bash does. So none of that "special file descriptor object" stuff here - and syntax that's basically Bash syntax, but doing things Bash can't...

First, a basic example: handling stderr. I guess this doesn't work?

Code:

$ write-error 2| error-filter | stdout-filter

What a drag. I think it'd be neat to send the error log through one pipeline and stdout through another. Something like this would be cool:

Code:

# Suppose "cmd1 2| cmd2" pipes stderr of cmd1 to stdin of cmd2, AND stdout of cmd2 to stderr:
$ cmd1 2| error-filter | stdout-filter | another 2>logfile
# Output of error-filter, and any stderr from the other processes in the job, is collected in logfile.

One could get around this by creating a fifo:

Code:

$ mkfifo fifo
$ (write-error 2> fifo | stdout-filter & < ./fifo stderr-filter >&2)
$ rm fifo   # cleanup

The second line assumes we want to filter stdout and stderr separately, and put the output of the respective filters on the desired FDs. You could do this without the parentheses (which have the undesired side-effect of treating the code evaluated within as if it were being run in a subshell - i.e. you can't change variable definitions inside the parens and have them affect things outside) - but the parens are syntactically necessary if you want to treat the thing as a unit, for instance if you want to provide additional stdout redirections but (for whatever reason) don't want to put them to the left of that ampersand.

Of course, there's some shortcomings here: For starters, you're creating a file on the filesystem for no good reason. One could get around that with a bit of syntax added to the shell:

Code:

# In the vein of "exec {f}<filename", which opens "filename" for reading, picks a numeric FD for it, and assigns that numeric FD to $f:
$ exec {in}|&{out}
# This calls pipe() and assigns the resulting numeric file descriptors to $in and $out
# Alternately, something like this could be implemented as a shell builtin without introducing new syntax.

The example then looks more like this:

Code:

$ exec {err_in}|&{err_out}
$ (write-error 2>&$err_in | stdout-filter & <&$err_out stderr-filter >&2)
# cleanup:
$ exec $err_in>&- $err_out<&-

The advantage is that there's no impact on the filesystem: the pipe only ever exists as a pair of open file descriptors in the shell process. But the cleanup is a bit messier, since two file descriptors have to be closed, instead of one file being removed.

Another subtle difference is that the pipe is always open until it's closed: unlike the mkfifo example, which essentially creates a new pipe each time the fifo file object is opened (and which blocks attempts to open the fifo for writing or reading until the opposite end of the fifo is opened by another process). This limitation of mkfifo makes it difficult to use as a general-purpose mechanism for creating pipe() FD pairs in the shell: you can only really use it if two separate jobs are opening the different ends of the fifo asynchronously. Opening both ends in the shell process simultaneously means you can apply them to multiple jobs via redirection.

So this could be used for feedback processes:

Code:

# I can't actually think of why you'd want to calculate the Fibonacci sequence in the shell
# but if you did, and wanted a list-processing style of loop...
$ exec {pin}|&{pout}
$ <&$pout (echo -e "1\n1"; read prev; while true; do read cur; echo $((prev + cur)); prev = cur; done) | tee /dev/fd/$pin

This could be further improved by localizing the definitions of $pin and $pout, as well as the pipe they represent, to the single job being run. (I think this would just require another set of parens around the whole two-line invocation...) Of course, this also means you can't perform any other actions that would affect the top-level environment of the shell, either...

What I'm trying to build toward here is a setup where one could run a bunch of jobs asynchronously and link them together in a non-linear fashion via pipes. To make this truly useful, there may be points where multiple sources of input need to be merged (for instance, if multiple jobs produced input for a single job, they couldn't simply all output to the same FIFO, they'd each need their own, and some process would have to read the data they generated and synchronize it somehow...)

Since the shell knows nothing about the format of the messages, it's not in a position to take on this job itself. (Though the shell could provide some standard ones, like synchronizing on line breaks - but it's just as easily provided by an external tool) But one solution could be to provide a syntax that allows you to define a pipe and attach one end of it to a process, all in one go:

Code:

# This is kind of a mix of process substitution syntax with the pipe syntax I introduced above...
# We have a program called message-merge which takes "filenames" of files 
# (rather, FIFOs, TCP connections - anything that can be read but can also block on a read operation)
# and which attempts to read a whole message from any one of them before writing that message to its output.
# We also have an open pipe, whose input FD is given in the environment variable $pin
$ message-merge <({m1}|) <({m2}|) >&$pin &

To make some sense of this:
"{m1}|" specifies the creation of a pipe. Since the right-hand side of the pipe doesn't go anywhere, it goes to stdout. Since the left-hand side is a name in brackets, it's a dynamically-assigned numeric FD, whose numeric value is stored in the variable $m1. An alternate might be to write "{m1}|&1" - "create a pipe whose input FD is stored in $m1, and whose output goes to FD #1 (stdout)"
"<()" is the (pre-existing) process substitution syntax. The "substitution" is a filename in /dev/fd/ - this is what message-merge sees as its command-line argument. The corresponding file descriptor (the output end of the pipe) is also made available to the message-merge process when it is run. In this case the parens also serve to terminate the pipe syntax, so it's clear that pipe "goes nowhere"
The job is run in the background because it's one of many processes in this "processing network".

Of course, for that to work properly, the code inside "<()" must not be considered a "subshell environment" - (presently, in bash, it is a subshell) otherwise, newly-opened files and new variable definitions won't make it out of that context...

Defining some elaborate, multi-job process network like this introduces another problem: cleanup of the processes, variables, and open files.

Cleanup of the open files and variables is relatively simple: stick the whole thing in parens, it'll be treated as a "subshell" evaluation (though it won't necessarily spawn a new shell process) - the jobs could be another matter. Collecting PIDs and waiting on those PIDs is a bit awkward (a job can contain a lot of processes if it's a pipeline...)

I guess if you could count on the processes terminating via SIGPIPE, then closing the pipes could be sufficient. Otherwise it might be helpful in cases like that to have a way of killing (or waiting on) all the jobs you spawned in a particular block of code. A more comprehensive solution might be something that says "run this asynchronously but treat it, and the other asynchronously-run jobs as foreground jobs" - so the script would block waiting for those jobs to finish before terminating, and would propagate termination signals to the child jobs, without the programmer needing to explicitly code in that behavior for each individual job...

Last edited by tetsujin; 03-24-2011 at 06:35 PM..

tetsujin

View Public Profile for tetsujin

Find all posts by tetsujin

03-24-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Maybe there's a better way I can explain myself. I originally started programming in gwbasic.

Code:

LET B=1
PRINT "Okay, B is " + B

Almost immediately I wanted to feed PRINT's output into a string. There's no direct way to do it. I ended up saving its output into a file then reading it back, which worked.

Eventually I graduated to quickbasic, which let me make subroutines. Out of curiosity, I tried to see if I could make my own PRINT.

Turns out, you can't. Print's a builtin, which makes it "special".

You can't make a sub that takes an unknown number of arguments.
Arguments have to be seperated by "," so a sub will never work quite the same way as PRINT.
All subroutines must be called with CALL. You can't just say the name and have it work.

You can't have a variable that does anything either, even though there are some are kludged into in BASIC as builtins -- like DATE$, which sets the date. (and causes havoc for vbscript writers who aren't aware of this behavior.)

Later I tried to make a subroutine that evaluated expressions. Not flexible enough. What I eventually had to do was convert the expression it was fed into a .BAS file, and execute another instance of qbasic to evaluate it. Basic's not orthogonal: You can't use the things you build for it the same way you can use the things that're built in.

Even the worst Bourne shell makes these problems almost trivial, because -- for all its faults -- the Bourne shell is very orthogonal. There's almost no corners where environment vars, global vars, local vars, files, strings, builtins, aliases, functions or external programs have to be used in one particular way over another, and it can feed its own output and expressions back into itself seamlessly and Do The Right Thing(tm).

Now imagine it had three mutually exclusive ways of handling things -- one for stdin and stdout, one for all other files, and one for strings. By building those things as independent features you've inadvertently built walls between them which programmers will be hitting like grasshoppers on your windshield unless you kludge bridges over them somehow.

Quote:

Originally Posted by tetsujin

Why? Programming languages are representations of ideas. As long as the ideas are conceptually sound, why does it matter if the back-end implementation is different? We use the same operators for integer math as we do for floating point math, right?

In a strongly typed language, that's easy. It doesn't have to decide on the fly what routines to use, and can even warn you if you mix and match them in problematic ways.

In the shell it's still doable because they can all be stored as strings. With some work you can still keep it orthogonal, make it do the right thing under any circumstances -- so there's not a conflict.

Files would be a whole new kettle of fish. Suddenly variables are special things that might be strings, might be files, might be syntax errors, might be I/O errors, or might just block. Everything gets more complicated.

How about a new special character like @filehandle? Then you won't get files showing up in places you didn't expect. I wouldn't go too overboard like Perl did, but just one shouldn't end the world.

Quote:

My system already has five different Bourne-compatible shells installed. If I need a Bourne-compatible shell, I can run one of those.

If you're inventing a totally new language, I wouldn't just extend a shell, I'd rip out a ton of old stuff too.

Which syntax for arithmetic do you prefer?

Code:

# old-fashioned hack
C=`expr $B + 5`
# good old-fashioned BASIC
let A=B+5
# newer C-like
((C=B+5))

etc, etc, etc. There's tons of redundancy that could be stripped out.

Almost anything but pipes could be revamped. I know you'll be tempted to turn <file into an anonymous filehandle instead of stdin, but consider how many programs read from stdin and you might want to keep stdin/stdout as easily available as they are already.

Another neat operator could be something like @*.txt | command ...to make shell globbing print to stdout, like find doues. Argument limit, what argument limit?

Quote:

#2 is a big one, because the Bourne shell syntax for opening a file isn't applicable to creating a "special file descriptor" thing.

What is the difference between an ordinary file and a special file? Presently, nothing -- which is extremely nice. Remember you're building a language which people other than you will use. If you don't allow possibilities you didn't think of, users can't program in ways you didn't think of.

Quote:

Perl is messy because it was designed to be messy: a quick and dirty means of solving problems, using syntax people would be familiar with from other tools. As such, it's kind of ugly but generally regarded as being quite effective. IMO the shell is seriously limited by this "everything is a string" philosophy. The shell should be a powerful tool for tying other tools together - but instead it's barely adequate for the job.

I think you just have a lot more to learn about it.

Quote:

Shells already do support variable types that "don't translate nicely to strings". They do so because it's a useful feature. Arrays in general, associative arrays in particular.

Arrays translate very nicely and usefully into strings.

Code:

$ A[0]=1
$ A[1]=2
$ A[2]=3
$ IFS=","
$ echo "${A[*]}"
1,2,3
$

...which you can also use backwards:

Code:

$ IFS=","
$ STR="a,b,c"
$ A=( $STR )
$ echo "${A[0]}"
a
$

This was in BASH but should work in KSH too. Even an associative array will work.

Quote:

I mean, it's very common to redirect to one of the commonly used file descriptors.

"It's very common to print to the screen, or maybe a file. But who'd ever want to print to a string?" That line of thought left a huge hole in the BASIC language that had to be kludged around.

Quote:

Redirecting #0 and #1 is so common that you don't even have to specify the numbers explicitly. Redirecting #2 is less common but still very important. Redirecting any other number is (as far as I can tell) exceedingly rare. The only programs I know of that explicitly support it are screen and xterm - though obviously one could always write a program or shell script that uses a file descriptor specified numerically on the command line...

Don't forget the read builtin. Also, the linux flock utility. Also don't forget how any program taking a filename can get /proc/self/fd/23 shoehorned into it. Or you can just redirect anything you want into stdin. Once again, the shell's flexibility is key.

Getting rid of that feature wouldn't be a problem for most of my shell scripts. But it would be impossible to write a few small but really important ones.

Quote:

In practice it's rarely done.

That's a self-fulfilling prophecy -- partly because one file-descriptor is generally enough, and partly because of the syntax limits you want to fix. But I think it's very, very important to fix it in a way that doesn't rob the shell of any features it had before, and doesn't build any new walls.

Quote:

So you look at how rare it is to redirect a file descriptor other than #0 or #1

? You do it all the time.

When you do command < filename do you think the shell opens filename as FD 1? No, it becomes some random FD, and the shell just duplicates it over stdin.

Quote:

...and then think about how frequently it's useful to pick one of those other file descriptors, open a file on it, and have that file remain open on every job you run until you specify otherwise.

You're beginning to contradict yourself, you just told me almost nothing cares about other open files.

Quote:

This is the basis of my argument that the current design of file operations in the shell is wrong. The case where one of those jobs actually does use that FD is the exception,. not the rule - therefore I think it's more sensible for the syntax to reflect this.

It already does.

Quote:

Rather than opening a file on FD #7, running eight jobs that don't care about FD #7 and two that do, I think it's better to open a file on a dynamically-assigned FD that's not exported to child processes by default (close-on-exec or whatever) - and then explicitly pass that FD to the child process, either via numeric FD redirection or another mechanism, in those cases where it's needed.

...except for special files because they're special.

All this new special syntax is wholly unnecessary. Just make some sort of exec-substitute that returns a file-descriptor number in the form of a string gets you everything you wanted and more, because leaving it as general-purpose lets you use it in ways you didn't originally think of.

I guarantee that if you obscure file descriptors behind anonymous handles, you'll have to kludge a way around that. Maybe not today, maybe not tomorrow, but soon, and for the rest of your life.

See man fileno.

Last edited by Corona688; 03-24-2011 at 07:15 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

03-24-2011

Registered User

34, 0

Join Date: Apr 2009

Last Activity: 12 July 2011, 3:09 PM EDT

Posts: 34

Thanks Given: 0

Thanked 0 Times in 0 Posts

You say I may have a lot to learn - I acknowledge the possibility. If I did not have the conviction to test my ideas like this, I would miss an opportunity to find out where my knowledge is lacking.

I've made a lot of progress along these lines already. Things I thought the shell couldn't do, things I thought it wasn't appropriate for - as I learned more about the advanced features of the shell and how they're used in practice I started to understand some of my understanding was wrong.

This is very important from my perspective. If I don't know what the shell currently provides, I can't hope to successfully "improve" upon it. If I change the shell syntax, and lose various features in the process, I have to understand what those features are and how serious the loss is.

Bash's diverse variable substitution syntax is a good example. It tends to strike me as a giant mess. But it's powerful stuff, too. Things like "This variable, unless it's unset in which case this string instead" or "this variable, but trim a bit of text out of it first" - it's good stuff.

Quote:

Originally Posted by Corona688

Now imagine it had three mutually exclusive ways of handling things -- one for stdin and stdout, one for all other files, and one for strings. By building those things as independent features you've inadvertently built walls between them which programmers will be hitting like grasshoppers on your windshield unless you kludge bridges over them somehow.

You do make a good point here, I think... It may be a good argument against the idea that things I've described can coexist with the more standard shell approach.

Quote:

In a strongly typed language, that's easy. It doesn't have to decide on the fly what routines to use, and can even warn you if you mix and match them in problematic ways.

In the shell it's still doable because they can all be stored as strings. With some work you can still keep it orthogonal, make it do the right thing under any circumstances -- so there's not a conflict.

If everything's stored as a string, or if you pretend it is, then you can't specialize for different data types. For polymorphism to work you need a clear idea of the "class" that defines the object's behavior. If one sticks to the idea that everything should have an automatic translation back to string, then any time you do an operation on that thing there's the question "did you mean this to be a string operation?"

For instance, a fairly common expectation is that string concatenation (which, of course, is not addition) should use the addition operator. In Perl this wouldn't work, because that would create an ambiguity. "1" + "2" = "3" and not "12" - so they have a separate operator instead.

You see this a lot in bash as well. "Is that numeric comparison or lexicographical string comparison?" I kind of hate that, honestly.

Quote:

How about a new special character like @filehandle? Then you won't get files showing up in places you didn't expect. I wouldn't go too overboard like Perl did, but just one shouldn't end the world.

I think there will be other types of "special" variable. I can't give each one its own special character...

Quote:

If you're inventing a totally new language, I wouldn't just extend a shell, I'd rip out a ton of old stuff too.

Pretty much that's what I'm going for, yeah. A totally new shell, but with familiar elements of Bourne-derived shells in the syntax and operation.

Arithmetic is actually a good example IMO. I feel like the shell should be an environment where that kind of operation comes naturally - it should be easier to express in the shell syntax, I think.

Quote:

Almost anything but pipes could be revamped. I know you'll be tempted to turn <file into an anonymous filehandle instead of stdin, but consider how many programs read from stdin and you might want to keep stdin/stdout as easily available as they are already.

I don't think I've described anything that affects stdin/stdout...

Quote:

Another neat operator could be something like @*.txt | command ...to make shell globbing print to stdout, like find doues. Argument limit, what argument limit?

Didn't they get rid of the argument limit? (In Linux, anyway...?) And did it even apply to builtin echo?

Quote:

What is the difference between an ordinary file and a special file? Presently, nothing -- which is extremely nice.

What I've described as "special file handle objects" - I mostly use that phrase to tag places where I'm talking about some representation of an open file that's not how things currently operate in the shell. I have, in a few places, considered the possibility that these objects could coexist with the current mechanism for opening files (as in "exec 3<filename" or "exec {f}<filename" for the current mechanism, something else for this "special" crap) but if I'm not going for Bourne compatibility I think there's no point.

It's the difference between having "open this file on this specific descriptor table index, leave it open and pass it on to all child processes until I tell you otherwise" and "open this thing, I don't care where, close it when I'm not using it any more and pass it on only when I say so". I think the latter makes a lot more sense.

Quote:

Arrays translate very nicely and usefully into strings.

Code:

$ A[0]=1
$ A[1]=2
$ A[2]=3
$ IFS=","
$ echo "${A[*]}"
1,2,3

Yeeg, there's an awkward distinction...

Code:

$ echo ${A[*]}
1 2 3
$ echo "${A[*]}"
1,2,3
$ x="${A[*]}"
$ echo $x
1 2 3
$ echo "$x"
1,2,3

That is just screwy. (Bash 4 - I don't know what you're using...)

I mean, I get the distinction in the first two commands, even appreciate it: without quotes, ${A[*]} expands each array element to a positional argument, preserving the distinction between neighboring elements - while if you quote it, it's applying $IFS. But what about $x? I set it equal to the quoted version of the contents of A but it preserved that delimitation, as if $x were an array and as if I hadn't included those quotes...

(Obviously when I find things that I think are wrong with bash, that doesn't undermine the merits of bash's basic approach - I don't mean it that way... Just... this seems very inconsistent!)

Consider this:

Code:

# $IFS is still ","
$ A[1]="4,5"
$ echo "${A[*]}"
1,4,5,3

You have lost information in the translation... Hence, I think it's inappropriate to treat arrays as string-equivalent.

Quote:

I mean, it's very common to redirect to one of the commonly used file descriptors.

"It's very common to print to the screen, or maybe a file. But who'd ever want to print to a string?" That line of thought left a huge hole in the BASIC language that had to be kludged around.

I'm not saying redirection to other file descriptors shouldn't be provided:
In fact, my personal feeling is quite the opposite. I believe this feature is essential.

However, it's not the common case that someone will need to open a file on a specific FD and have all child processes inherit that file on that specific numeric FD. As such, I think it's reasonable to fit the syntax to what I believe is the more common case of opening a file: within the shell, you don't care what numeric FD it winds up on. It's only when you launch a child process that uses that numeric FD - and most child processes won't.

Quote:

Don't forget the read builtin. Also, the linux flock utility.

What of them?

Quote:

Also don't forget how any program taking a filename can get /proc/self/fd/23 shoehorned into it.

I actually used this in several examples already... (Well, /dev/fd/ - but it's the same thing)

Quote:

Getting rid of that feature wouldn't be a problem for most of my shell scripts. But it would be impossible to write a few small but really important ones.

Let's be clear here: what feature?

Getting rid of numeric redirection?

Code:

cmd 4< file

No. This is valuable.

Getting rid of the traditional mechanism for opening files in the shell, in which the user explicitly selects a numeric FD?

Code:

exec 4<file
cmd --do-something-with-fd=4

Maybe. I know of two programs that operate like this. As such I think this kind of blanket redirection doesn't make a lot of sense. At the very least it shouldn't be the mechanism for opening files in the shell.

Quote:

So you look at how rare it is to redirect a file descriptor other than #0 or #1

? You do it all the time.

When you do command < filename do you think the shell opens filename as FD 1? No, it becomes some random FD, and the shell just duplicates it over stdin.

Give me a little credit here. I know how open(), dup(), etc. work. I've read APUE. Best $70 I ever spent. And yes, I actually do need to read it again.

(The first couple times through I had trouble understanding controlling terminals and job control... I think I got it eventually but I'm getting rusty on the details)

But the relevant point is that when the user runs cmd < filename they don't care what that call to open() returned. They don't need to. They never need to provide or obtain the answer to that question. All they care about is that the file is open for reading, "cmd" sees it as FD 0, and the file is closed when the job is done.

Quote:

...and then think about how frequently it's useful to pick one of those other file descriptors, open a file on it, and have that file remain open on every job you run until you specify otherwise.

You're beginning to contradict yourself, you just told me almost nothing cares about other open files.

Poor choice of words, perhaps. I was inviting you to think about how frequent that usage is (as in "measure frequency of this occurrence" rather than "boy, is that occurrence frequent!"), with the suggestion that it isn't common. Seriously, do you do that a lot? Say "open this file on decriptor 7" and then actually use descriptor 7 in a bunch of programs, as opposed to redirecting the file to some other FD?

Quote:

...except for special files because they're special.

What are you getting at here? I don't get it. I read it and it just looks like you're being rude, giving me a hard time for using the word "special" or something. I don't know if that's your intent.

Quote:

All this new special syntax is wholly unnecessary. Just make some sort of exec-substitute that returns a file-descriptor number in the form of a string gets you everything you wanted and more, because leaving it as general-purpose lets you use it in ways you didn't originally think of.

It doesn't get me scoped file handles, or control over which child processes inherit that open file descriptor, which is kind of where this whole thing started.

As for fileno():
Within the shell it wouldn't normally be important where an open file resides. The use case would be to dup the file descriptor to some other number via redirection syntax:

Code:

# $fh is some "file handle" - a managed open file descriptor.
# User doesn't generally care what its number is, and its behavior in substitutions is not entirely straightforward.
$ read <& $fh
# Dup it to FD 0 to use it with read
$ some-proc --using-fd=7 7<& $fh
# Need it on another FD?  No problem...

Mostly there's no point in knowing where it is as long as you can control where it's going to be.

But there's no reason a feature couldn't be included to determine where the file is. There's not a lot of value to that (debugging the shell would be one possibility, I guess?) but it's doable.

tetsujin

View Public Profile for tetsujin

Find all posts by tetsujin

03-25-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by tetsujin

You do make a good point here, I think... It may be a good argument against the idea that things I've described can coexist with the more standard shell approach.

There's still a contradiction in that.

Quote:

If everything's stored as a string, or if you pretend it is, then you can't specialize for different data types. For polymorphism to work you need a clear idea of the "class" that defines the object's behavior.

If you want these features, don't use a shell language. You can't stick a polymorphic object into a commandline argument.

Quote:

If one sticks to the idea that everything should have an automatic translation back to string, then any time you do an operation on that thing there's the question "did you mean this to be a string operation?"

There's no ambiguity. The answer is always, always "yes". Remember that this is a shell. You have to put these things into commandline programs. Strings are the only things that make sense.

Quote:

For instance, a fairly common expectation is that string concatenation (which, of course, is not addition) should use the addition operator.

I haven't seriously used a language that did that since I stopped writing in BASIC.

Quote:

You see this a lot in bash as well. "Is that numeric comparison or lexicographical string comparison?" I kind of hate that, honestly. Smilie

Well, nobody forced you to use a shell language.

Quote:

I think there will be other types of "special" variable. I can't give each one its own special character...

And again you've missed the point. Every time you decide something deserves to be a special case and give it its own, special types and syntax you create more and more walls between your builtins' syntax and what you allow the programmer to do. Don't specialize, generalize.

Quote:

I don't think I've described anything that affects stdin/stdout...

Yeah, just everything else. More and more special magic things.

Quote:

Didn't they get rid of the argument limit? (In Linux, anyway...?)

No.

Quote:

And did it even apply to builtin echo?

Who says you're running "echo"? Feeding them all into echo like that would have side-effects on the contents of the string.

Besides, imagine what you'd be able to do by fiddling IFS. You could make it produce all the strings seperated by commas, or by NULLs, without needing an external program.

Quote:

I have, in a few places, considered the possibility that these objects could coexist with the current mechanism for opening files (as in "exec 3<filename" or "exec {f}<filename" for the current mechanism, something else for this "special" crap) but if I'm not going for Bourne compatibility I think there's no point.

The point is to not deny yourself features. Just because you don't need it this instant doesn't mean you're not going to need it the instant you make it opaque. They've tried many times. stdio tried to wall it all off behind an opaque pointer because you don't "need" anything else, and ended up needing fdopen() and fileno() because sometimes, you actually, genuinely need it. They tried hiding it all behind C++ iostreams and people wouldn't have it, resulting in a stew of nonstandard iostream extensions allowing you to get or at least set the file number for one of them. Stop deciding what programmers "really need" and just hand out the stupid fileno. All you have to do is shove it in a string variable and it's as tidy as you could want.

Quote:

It's the difference between having "open this file on this specific descriptor table index, leave it open and pass it on to all child processes until I tell you otherwise" and "open this thing, I don't care where, close it when I'm not using it any more and pass it on only when I say so". I think the latter makes a lot more sense.

False dilemma. You don't have to have it 100% your way to get everything you want. Just an exec-like command that puts a file descriptor number in a string does what you want. And a shell builtin to set the kernel's close-on-exec ioctl gives you the rest. You could even have that as a flag for opening it. But as the default? Remember that people other than you might have to program in this language, and might expect "open the file" to just open the file without cleaning their laundry too.

Quote:

Yeeg, there's an awkward distinction...

It makes perfect sense when you think about it -- IFS is the character a shell splits strings on! How do you stop a string from being split? Quote it, of course.

Quote:

I mean, I get the distinction in the first two commands, even appreciate it: without quotes, ${A[*]} expands each array element to a positional argument, preserving the distinction between neighboring elements - while if you quote it, it's applying $IFS.

There's nothing inconsistent about it, it's always substituting in the commas -- but if you don't quote it, it splits on the commas just like it'd usually split on spaces! This is what IFS does -- it's the characters the shell splits on, not just in arrays, but most substitution.

Quote:

(Obviously when I find things that I think are wrong with bash, that doesn't undermine the merits of bash's basic approach - I don't mean it that way... Just... this seems very inconsistent!)

Don't jump to conclusions, take a closer look.

Quote:

...You have lost information in the translation.

Only because you intentionally threw it away.

Quote:

Hence, I think it's inappropriate to treat arrays as string-equivalent.

That's quite all right, but if you don't want to program in a shell language, you don't have to program in a shell language. It's all about strings and splitting strings and can't really be anything else because that's the only thing you can put in a commandline.

Quote:

However, it's not the common case that someone will need to open a file on a specific FD and have all child processes inherit that file on that specific numeric FD.

That's good, because they don't:

Code:

command < file | command2 | command3

Only the first command of that line gets it. You have exceedingly fine control over it already.

Quote:

As such, I think it's reasonable to fit the syntax to what I believe is the more common case of opening a file: within the shell, you don't care what numeric FD it winds up on.

You always care what FD it winds up on: Generally zero.

Quote:

It's only when you launch a child process that uses that numeric FD - and most child processes won't.

You know what they say about assumptions: They make an ass of you and me.

How users want to use files is a question of programming style. Don't give them less options, give them more!

Quote:

It doesn't get me scoped file handles

Yes it can.

Code:

(
        open FOUT>filename
        open FIN<filename
        # How about this syntax, or something like it, for an implied close-on-exec?
        open -FD<filename
        
        do_stuff_with <$FIN >$FOUT
)
# How about a new kind of bracket that scopes files but not variables?

${
        open -FD<filename
        stuff <$FD
$}

Quote:

Getting rid of the traditional mechanism for opening files in the shell, in which the user explicitly selects a numeric FD?

Code:

exec 4<file
cmd --do-something-with-fd=4

Maybe. I know of two programs that operate like this.

The point is, you don't need to throw away this feature. You can get what you want without throwing it away and adding 9 new variable types plus operator overloading instead. That you don't find it personally useful doesn't mean it's useless.

Quote:

As such I think this kind of blanket redirection doesn't make a lot of sense.

This kind of "blanket redirection" is the foundation of how files, pipes, redirection, and terminals all behave. If you want to close that all off, make these "special cases" all part of your language instead of part of the environment, fine, but that's not a shell.

Quote:

Give me a little credit here. I know how open(), dup(), etc. work.

...but I'm not sure you understand why.

Quote:

But the relevant point is that when the user runs cmd < filename they don't care what that call to open() returned.

So what?

Quote:

They don't need to. They never need to provide or obtain the answer to that question.

We've been asked on these forums before, how to do exactly that.

Quote:

Poor choice of words, perhaps. I was inviting you to think about how frequent that usage is (as in "measure frequency of this occurrence" rather than "boy, is that occurrence frequent!"), with the suggestion that it isn't common. Seriously, do you do that a lot?

Occasionally. If you removed that feature from the shell I'd have to hack it back in since there's things that can't be done without it.

Quote:

What are you getting at here? I don't get it. I read it and it just looks like you're being rude, giving me a hard time for using the word "special" or something. I don't know if that's your intent.

Think back to how I started my post, explaining about a language crammed with special cases that had to be worked around in very awkward ways. The more special cases you add, the more awkward corners you add.

Quote:

It doesn't get me scoped file handles, or control over which child processes inherit that open file descriptor, which is kind of where this whole thing started. Smilie

You can get that kind of control without completely obliterating the model we have.

Quote:

As for fileno():
Within the shell it wouldn't normally be important where an open file resides.

It's not "usually" relevant in a C program either, but it's quite important enough.

Quote:

The use case would be to dup the file descriptor to some other number via redirection syntax:

Code:

# $fh is some "file handle" - a managed open file descriptor.
# User doesn't generally care what its number is, and its behavior in substitutions is not entirely straightforward.
$ read <& $fh

I'd like to point out, yet again, that this is already valid shell syntax. You don't need to add new variable types to make this works. Set the string fh to a valid file descriptor number, use a builtin to set close-on-exec, and you've got it.

Quote:

Mostly there's no point in knowing where it is as long as you can control where it's going to be.

But there's no reason a feature couldn't be included to determine where the file is.

My question is: Why make it a new special type of variable in the first place? File descriptors are actually, really, genuinely integers. You're actually making it harder on yourself intentionally hiding the integer from view instead of just storing it in a variable where anything, shell included, can use it.

Last edited by Corona688; 03-25-2011 at 12:54 AM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

03-25-2011

Registered User

34, 0

Join Date: Apr 2009

Last Activity: 12 July 2011, 3:09 PM EDT

Posts: 34

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Corona688

If you want these features, don't use a shell language.
Well, nobody forced you to use a shell language.
That's quite all right, but if you don't want to program in a shell language, you don't have to program in a shell language.

One can take this argument to the extreme: say anything that's not part of one of the "standard" commonly-used shells is not a feature of a "shell language". If it wasn't invented at least 15 years ago and implemented as part of something with "sh" in the name, it's no good.

One could also take the broader view: if features are identified that could be of value, find a way to incorporate them while still keeping the shell as conceptually sound as possible.

Quote:

You can't stick a polymorphic object into a commandline argument.

True, just like you can't stick an array into a command line argument. You can stick a representation of an array into a command line argument, you can stick array elements into separate command line arguments - but not the array itself, as a single element.

Quote:

If one sticks to the idea that everything should have an automatic translation back to string, then any time you do an operation on that thing there's the question "did you mean this to be a string operation?"

There's no ambiguity. The answer is always, always "yes".

Unless you're doing a mathematical operation, for instance.

Quote:

For instance, a fairly common expectation is that string concatenation (which, of course, is not addition) should use the addition operator.

I haven't seriously used a language that did that since I stopped writing in BASIC.

Shell doesn't use the addition operator for string concatenation, so there you go.

Quote:

Besides, imagine what you'd be able to do by fiddling IFS. You could make it produce all the strings seperated by commas, or by NULLs, without needing an external program.

The latter would be great - but as far as I can tell you can't store a null byte in a shell variable - and therefore you can't specify the null byte as $IFS.

Though, again, that could be fixed without revolutionary change: just make it possible to create a variable in the shell whose contents include a zero byte. You wouldn't be able to export it, of course ('cause the zero byte would be taken as a terminator) but it'd still be useful internally as $IFS.

Quote:

The point is to not deny yourself features. Just because you don't need it this instant doesn't mean you're not going to need it the instant you make it opaque.

Removing features is sometimes necessary to incorporate new ideas and result in a coherent whole. If every design decision is hobbled by a strict "break nothing that already exists", the process goes nowhere.

Quote:

Remember that people other than you might have to program in this language, and might expect "open the file" to just open the file without cleaning their laundry too.

Other people might expect lots of things. If they're unwilling to learn a new set of rules, nobody is forcing them to use my shell language, you know? Obviously not everyone is going to think the "improvements" I choose to make are improvements. Different people have different ideas. But progress is not made unless ideas are tested.

Quote:

I mean, I get the distinction in the first two commands, even appreciate it: without quotes, ${A[*]} expands each array element to a positional argument, preserving the distinction between neighboring elements - while if you quote it, it's applying $IFS.

There's nothing inconsistent about it, it's always substituting in the commas -- but if you don't quote it, it splits on the commas just like it'd usually split on spaces!

That is not what it's doing.

If you don't quote it, it expands the variable's elements to separate arguments on the command where it appears. (If an array element contains $IFS, the shell does not split that argument when the array reference is expanded.)

The bit that's really screwy is if you do this:

Code:

$ b="${A[*]}"
$ echo $b

$b is not an array, and I've explicitly set it to the quoted (that is, delimited) form of A's contents. But in this case 'echo $b' produces the same result as 'echo ${A[*]}' - $b is expanded to positional arguments as if it were an array (even though it's not an array, and was initialized with a quoted string!)

I don't see how that can be anything but a bug...

But as I said - I don't want to leave the impression that I'm taking bugs as evidence that there's something wrong with the fundamental approach. That wouldn't be sensible or fair.

Quote:

Don't jump to conclusions, take a closer look.

I am taking a closer look. And I already have. I know enough about bash that I can answer the kinds of questions about it that most people don't ask. And sometimes, in answering them, I learn a bit more about Bash's strengths.

Quote:

...You have lost information in the translation. (From array to flat string)

Only because you intentionally threw it away.

On the contrary: you have to bend over backward to translate arrays to strings in a way that preserves their structure. To do it right requires translating the array contents to syntax.

I did, after all, translate the array to a string in exactly the same way as you did. Hence, my argument that arrays don't translate to strings. They are an exception to the common rule in the shell that everything should translate neatly to string form.

Quote:

However, it's not the common case that someone will need to open a file on a specific FD and have all child processes inherit that file on that specific numeric FD.

That's good, because they don't:

Code:

command < file | command2 | command3

Only the first command of that line gets it.
[/quote]

That isn't the case I'm talking about.

Code:

# open the file in the shell environment
$ exec 7<file
$ cmd1 | cmd2 | cmd3
# Which processes have FD 7 open?  Which ones needed it?

Quote:

You always care what FD it winds up on: Generally zero.

Ugh. This is the problem with no nested quotes.

You had pointed out that when you open a file and immediately redirect it to FD 0 (or whatever) the open() call initially puts it on a different FD, and a dup() call moves it to FD 0 for the context to which the redirection applies.

I pointed out that this has no bearing on anything. Nobody cares what that "intermediate" FD number was, they only care about the final one that applies to their process.

Quote:

You know what they say about assumptions: They make an ass of you and me.

Strict adherence to this rule is the path to madness via Solipism.

So, yeah, I'll make assumptions here and there. That's how language design works, you start with an idea of how things could or should work. It doesn't mean it's always true, and the real proof of whether the idea was any good is in the implementation and how it plays out in actual use.

Quote:

How users want to use files is a question of programming style. Don't give them less options, give them more!

Sometimes "more options" really does just make the language a mess. Remember how we were talking about Perl a little while ago?

Quote:

It doesn't get me scoped file handles

Yes it can.

Code:

(
        open FOUT>filename
        open FIN<filename
        # How about this syntax, or something like it, for an implied close-on-exec?
        open -FD<filename
        
        do_stuff_with <$FIN >$FOUT
)

You're scoping the file descriptors and their variables by using subshell syntax. That means you are unavoidably limiting the scope of everything else, too. If you wanted to use this block to read a file and populate a variable, you're out of luck, 'cause that variable binding won't make it outside of the "subshell" block.

Quote:

This kind of "blanket redirection" is the foundation of how files, pipes, redirection, and terminals all behave. If you want to close that all off

You're taking this all-or-nothing kind of position and you haven't justified it. I can remove "exec 7<file" from shell syntax and replace it with improved support for redirecting code blocks (i.e. provide code blocks that aren't loops or subshells) - I don't see how anything of value is lost.

Quote:

I'd like to point out, yet again, that this is already valid shell syntax.

I know it's already valid shell syntax. That's why I'm using it as an example of how things could work. As pre-existing syntax, it's something that you can read and know what it's supposed to be doing. I don't have to explain what it does, 'cause you already know.

Quote:

My question is: Why make it a new special type of variable in the first place?

Because if file descriptors are "just integers" then the shell can't manage their lifetime. Scoping is a worthwhile feature IMO.

tetsujin

View Public Profile for tetsujin

Find all posts by tetsujin

03-29-2011

Registered User

34, 0

Join Date: Apr 2009

Last Activity: 12 July 2011, 3:09 PM EDT

Posts: 34

Thanks Given: 0

Thanked 0 Times in 0 Posts

Time to move on perhaps? The whole concept of storing "objects" in shell variables is a bit outside of the usual expectations for a Unix shell, so it's bound to be a contentious idea - I don't think the argument about it is getting us anywhere.

Another feature I've been thinking about: parallelizable loops.

Code:

# This version of "for"/"do" can take a "-j" argument, similar to that of make, etc.
# The -j argument tells the loop how many parallel processes to spawn.
for f in ./*.avi; do -j4
    # This ffmpeg syntax isn't exactly right, but you get the idea...
    ffmpeg $f -vcodec copy -acodec copy ${f//.avi}.mp4
done

That much is pretty simple (though, of course, one may want the "-j" argument on "for" instead of "do"...) but there are some complications.

The most apparent complication is that the jobs run in parallel won't be synchronized in their use of stdout and the tty - and sharing stdin presents a similar problem. There are different ways these issues could be addressed:

First, for stdout: the shell doesn't know anything about what kind of output a program generates, but if you give it a hint there are different strategies for stdout multiplexing that could work. For instance, if the command being run generates a list of values separated by newlines, and you're OK with values from the different sources being interleaved, the shell could line-buffer the output of each loop iteration and interleave output as full lines are ready. Or if the command's stdout isn't really something you can interleave sensibly, the shell could let one iteration's output through at a time, but buffer the others - though this would have the limitation that those "others" could wind up blocking on output if their buffers fill up before the first iteration terminates. More advanced methods for interleaving values could be accomplished by use of external tools:

Code:

for f in *; do --sync=mysyncfunction -j4
...

"mysyncfunction" could be an external command or script or a function defined within the shell: it takes as its input a copy of the output from a loop iteration, and as its output it provides numeric values specifying how many bytes of that output to take as a "single value". The shell spawns a copy of this command for each loop iteration, "tee"'s a copy of the loop output to it - buffers a copy of that output for itself, and then reads the numeric values that come out of the sync function to see how many bytes to take...

Another problem is that, if you parallelize the shell's loop construct, then (presumably) code inside the loop can no longer affect the shell's environment. (A feature like this might be implemented by spawning a new shell process for each iteration running concurrently, which would isolate the new process from the original process's environment... But even if you didn't implement things that way, doing parallel evaluation introduces problems of synchronization...

An alternate approach could be to have the sync function actually perform the task of merging the output of the loop iterations itself. To do this, the sync function would be implemented as a command that takes an arbitrary number of filenames as input, reads them, implementing the desired synchronization behavior and producing the desired final output. The shell would, of course, have a pipe() output end corresponding to each loop iteration process: it would hand off these file descriptors to the sync function when running it, and provide /dev/fd/ paths corresponding to these FDs as arguments to the sync function. (This approach puts more burden on the sync function - it must not only determine where synchronization should occur, but actually implement it for an arbitrary number of inputs...)

Synchronizing input could take a similar path: assuming the input is a big collection of "stuff" meant to be distributed to different loop iterations, you could implement different ways of splitting that input... Simple stuff like splitting on newlines or on a delimiter character, and more complicated stuff could be implemented as external filters. It is also possible that in some cases one would want the input "mirrored" to all loop processes (though this would be kind of odd - you'd be buffering up that input stream and dispatching copies of it either to each iteration or to each iteration process - dispatching to each iteration would probably make the most sense but it'd require so much buffering that it'd be hard to justify...)

Then there's the case where the jobs inside the loop use the TTY, or treat stdout like a TTY: in those cases, the best option might be to hand off TTY synchronization to a program like "screen" (or spawn xterms if in a GUI):

Code:

while whatever; do -j4 -tty-sync=screen-wrapper.sh;
...

screen-wrapper.sh would have to accept either PTY master names or file descriptor identification (/dev/fd/ probably) and initiate some kind of display-sharing for the concurrent iterations of the loop. (I'm not actually sure if screen can connect to an already-open PTY master the way xterm can... I know it can resume a disconnected session which is kind of the same idea... But for the sake of this, imagine that it can.)

input/output synchronization and TTY sharing/instantiation would not necessarily be mutually exclusive, though I think the case where someone would use both at once would be relatively rare. So in cases where the shell's stdout is TTY and the loop's input/output isn't being redirected (and, thus, stdin/stdout is still the TTY from the loop's perspective) - there are different cases to consider:

Neither stdio nor TTY synchronization mechanisms are specified: loop iteration I/O is handled as if the loop iterations were regular background jobs.
stdio is unsynchronized, TTY is syncrhonized (via screen or whatever) - stdin/stdout for each iteration job is attached to that job's respective TTY (PTTY slave created by the shell for the loop)
stdio is synchronized, TTY is unsynchronized: all jobs get the shell's TTY as their TTY, and the ends of the multiplexers (the pre-splitting stdin, the post-merge stdout) are also attached to the TTY.
both stdio and TTY are synchronized: this may be an invalid case, if the shell's stdout really is the TTY... The shell could handle it by creating an additional TTY and attaching the ends of the multiplexers to that. (So if the loop had both stdin and stdout multiplexers specified, and you were running four loop iterations in parallel, "screen" or whatever would show five windows: four would be connected to /dev/tty of one of the four loop jobs, while the fifth would be connected to the merged stdin and/or merged stdout.)

In cases where stdin/stdout aren't connected to the TTY in the first place, of course, it's much simpler: stdin/stdout and TTY can simply be treated as separate channels which don't interact.

Of course, for cases that don't require specialized behavior, a sensible default behavior would be to treat each loop iteration like a background job: meaning it can't get stdin or read from the TTY, and stdout/TTY output is an unsynchronized free-for-all...

There are also various cases of commands that aren't full-on TTY apps, but which do use the TTY (progress bars in wget and so on) - more specialized TTY sharing strategies could be developed for cases where each job just does simple line-oriented display characters - but I don't think you can get around issuing each loop iteration its own PTY slave if you want to share the display (the program has to have a TTY as its output or it won't treat output as a TTY - and if its output is the shared TTY, you don't get the opportunity to alter those display codes) - so in most cases dispatching to "screen" or similar is probably the way to go.

Dispatching to "screen" raises another issue, of course: $TERM has to be set properly... I don't have a great solution to that one, honestly.

Last edited by tetsujin; 03-29-2011 at 02:57 PM..

tetsujin

View Public Profile for tetsujin

Find all posts by tetsujin

03-29-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by tetsujin

One can take this argument to the extreme: say anything that's not part of one of the "standard" commonly-used shells is not a feature of a "shell language". If it wasn't invented at least 15 years ago and implemented as part of something with "sh" in the name, it's no good.

One could also take the broader view: if features are identified that could be of value, find a way to incorporate them while still keeping the shell as conceptually sound as possible.

I've already suggested several ways.

I think the argument comes down to this.

Quote:

Because if file descriptors are "just integers" then the shell can't manage their lifetime.

Yes you can. Yes, you, can.

The main thing you wanted, close-on-exec, is a kernel feature. One ioctl and it's done. The FD isn't just sufficient for it -- it's mandatory.

Any other kind of scoping you could possibly want is still doable too. The shell just has to keep track of it internally, like local variables, instead of crutching it with a "special" variable.

It might be easier for you as the programmer of the language -- on first glance, anyway -- to add all these new special kinds of variables, but I think you'll paint yourself into a corner really quickly. And paint other people into a corner besides.

---------- Post updated at 12:37 PM ---------- Previous update was at 12:05 PM ----------

Quote:

Originally Posted by tetsujin

The whole concept of storing "objects" in shell variables is a bit outside of the usual expectations for a Unix shell, so it's bound to be a contentious idea - I don't think the argument about it is getting us anywhere.

The point is, you don't need to. There's ways of doing what you want without breaking the very concept of a shell variable.

Anyway.

Quote:

Another feature I've been thinking about: parallelizable loops.

Of course, you could just add an & to the end of that to get what you want, but I think I know what you're getting at. I've thought about that a fair bit myself, and noticed the same problems.

One possible solution, I think, would be pipes. Instead of giving everything a raw stdin/stdout, give them pipes. The shell will read and print output from their stdout pipes in order to enforce proper order of their outputs that way. Completely silent commands will run completely parallel. Ones that aren't might still accomplish some work before the pipe blocks. (You could use temp files instead of pipes for stdout to make a larger 'buffer'.)

stdin would work similarly. As long as process #1 is alive, the shell feeds stdin input into its input, and only moves on to process #2 once process #1 dies. There might also be situations where you'd want it to work in a round-robin fashion, first line going into process 1, second line into process 2, third into 3, fourth into 1, etc, etc.

Interactive commands wouldn't work right, but that goes without saying. To make interactive commands work in parallel, each one would need their own independent virtual terminal -- which is possible but the CPU costs of that could start adding up.

You don't need screen to do that, by the way. Any C program can create a PTY.

I'm not exceptionally concerned about running mounds of interactive things in parallel, really. Usually that doesn't make sense. Things you might want to do that to, like ssh, have anticipated this and have noninteractive mechanisms to accomodate this. For really badly thought-out applications, or things which really, really demand a real human be there, we have the expect language -- a last resort, as it should be.

Synchronizing variables would be more of a problem. I'm not 100% sure how it'd work yet, but I can envision the shell processing a set of lines and compiling an order of operations that it shoves into a queue(okay, these four processes all set this variable once, and the thing below this loop reads it once, so let the loop finish before reading this variable...) ...then following it lock-step to give each operation. It helps that these aren't processes, but always shell syntax or program output fed into shell syntax. A program's never going to do an end-run around us and set a shell variable with a system call or anything. Maybe something like a semaphore, keeping a use-count that unlocks it once there's nothing behind you in line using it... Reading or setting a variable means waiting your turn, reading or writing to stdout means waiting your turn, anything else lets you scream on ahead.

Quote:

There are also various cases of commands that aren't full-on TTY apps, but which do use the TTY (progress bars in wget and so on)

wget can print progress to non-terminals, it just prints lines of dots.

ls is also one of these special commands. When it prints to a terminal, many implementations can print multiple columns, but they switch to single-column output when the output isn't a terminal.

In both cases the output is useful either way when the thing reading them is a script instead of a human, so I don't think it's too important.

Last edited by Corona688; 03-29-2011 at 04:25 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

What is on Your Mind?

Speculative Shell Feature Brainstorming

4 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Can we create any check-point feature in shell ?

Discussion started by: abhaydas

2. UNIX for Dummies Questions & Answers

brainstorming automated response

Discussion started by: Movomito

3. Shell Programming and Scripting

Creating a command history feature in a simple UNIX shell using C

Discussion started by: -=Cn=-

4. SCO

BASH-like feature. is possible ?

Discussion started by: nEuRoMaNcEr