Speculative Shell Feature Brainstorming

03-29-2011

Registered User

34, 0

Join Date: Apr 2009

Last Activity: 12 July 2011, 3:09 PM EDT

Posts: 34

Thanks Given: 0

Thanked 0 Times in 0 Posts

Dropping the file handle discussion - I'll read and carefully consider what you had to say on the subject. I do recognize this as an important problem: as I think about these things that I would want to put in a shell, not all of them necessarily fit, and in the end not all of them are worth the trouble.

For now I want to continue the discussion with the assumption that features we talk about would be implemented without too many fundamental changes to how the shell works.

Quote:

Originally Posted by Corona688

You don't need screen to do that, by the way. Any C program can create a PTY.

Well, yeah, any program can create a PTY - but if you want to actually display multiple TTY-using programs simultaneously, you need something like screen or xterm - a virtual terminal. I could write my own virtual terminal like "screen" but for a first pass it'd probably be more productive to bend "screen" to my needs.

Quote:

I'm not exceptionally concerned about running mounds of interactive things in parallel, really. Usually that doesn't make sense.

True... The main use cases I'm thinking about here would be cases where the program expects to display to a TTY. For the jobs to get input from the TTY - well, screen or xterm could handle that but if your loop is full of interactive prompts then maybe you need to redesign the code that's inside that loop, as you say.

But if someone wanted to do that anyway, it could nevertheless be done via the screen mechanism I described...

Quote:

Synchronizing variables would be more of a problem. I'm not 100% sure how it'd work yet, but I can envision the shell processing a set of lines and compiling an order of operations that it shoves into a queue(okay, these four processes all set this variable once, and the thing below this loop reads it once, so let the loop finish before reading this variable...)

Another way to look at this would be to say it's part of an introduction of proper threading to the shell - meaning that multiple loop iterations running concurrently can modify the variable (and the shell would provide enough synchronization to ensure that the environment isn't corrupted, structurally speaking) - and if better synchronization is needed, then the user has to write critical regions into their shell code.

Introducing the concept of threads may complicate other areas of the shell, however: for instance at present if you pipe a bunch of different shell commands together, all but one of those is going to be run in a subshell with no ability to change the environment of the main shell process. Introducing threading would raise the question of whether all those built-in jobs (and even the parenthesis syntax in general) should now be threads. This would be an incompatible change (though behavior of which part of a pipeline is executed in the main shell process varies between shells already) but it might also make sense.

Quote:

wget can print progress to non-terminals, it just prints lines of dots.

My installed version of wget always writes out to /dev/tty (and using some kind of control characters to update the progress bar in-place) when /dev/tty is available.

When there's no TTY (running inside emacs, in this case) the progress bar is redrawn every 50KB or so - on a new line. If two wgets (with no /dev/tty) are run concurrently with shared stdout, the display winds up corrupted as the two processes write to stdout simultaneously: (not necessarily a full line at a time: a dot is written for each kilobyte received)

Code:

   700K .......... ......... 10%  513K 14s
   750K ................. .......... .......... .... .......... .......... .......... .......... 11%  559K 14s
   800K ............. 10%  434K 15s
   750K .......... ............ ........ .......... ......... .......... .......... ............. .......... 11%  488K 15s
   800K .......... ........ 12%  385K 14s
   850K .......... .......... ............. .......... ... ............... ........... .......... 12%  637K 13s
   900K ............ 12%  473K 14s

If you were running those two programs concurrently while you're at the console, watching, that's not a particularly useful way to display them. You don't really need to see all those lines of history, you already know that before you reached 12% you were at 11%, and before that 10%, and so on. (For that matter, that kind of information isn't useful for a log, either... That kind of output is mostly just useful if you want to read it into another program that implements another progress display method.) Really you just want to see the filenames of the files currently being fetched and the progress on each one. wget already has a good mechanism for displaying its own current status: using "screen" or similar makes it easy to take advantage of this.

Quote:

In both cases the output is useful either way when the thing reading them is a script instead of a human, so I don't think it's too important.

Still, regardless of whether the output is readable before you interleave it, it's not going to be readable after you interleave it unless you make the right kind of choice about how the values should be interleaved. (Or, in the case of programs that display to the TTY, some kind of TTY-sharing mechanism.)

A related problem that I didn't address is stderr. Synchronizing on newlines would probably be adequate for most cases, but it'd probably make sense to provide all the same multiplexing/redirection options that stdout gets, in case somebody needed it. (For instance, if you were running gcc in the loop, probably you'd want to synchronize per loop step rather than letting stdout and stderr get interleaved per-line or "whenever"...)

tetsujin

View Public Profile for tetsujin

Find all posts by tetsujin

03-29-2011

Registered User

4,996, 477

Join Date: Dec 2003

Last Activity: 12 June 2016, 11:03 PM EDT

Location: /dev/ph

Posts: 4,996

Thanks Given: 73

Thanked 477 Times in 439 Posts

Quote:

The whole concept of storing "objects" in shell variables is a bit outside of the usual expectations for a Unix shell

Actually no, it is not. It is the direction a number of Unix and GNU/Linux shells have been going for quite a while. Think compound variables in ksh93 and prototypical inheritance in JavaScript for starters. And before anybody says JavaScript is not a shell, it is in fact a quite popular shell and becoming more popular by the day now that server side JavaScript has gone mainstream with node.js, etc.

You do not appear to have done a survey of the internals of modern Unix/Linux shells. I would recommend that you closely examine the internals of such shells and not just the Bash shell which is rather archaic (sorry Chet!) in many respects.

fpmurphy

View Public Profile for fpmurphy

Find all posts by fpmurphy

03-30-2011

Registered User

34, 0

Join Date: Apr 2009

Last Activity: 12 July 2011, 3:09 PM EDT

Posts: 34

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by fpmurphy

Actually no, it is not. It is the direction a number of Unix and GNU/Linux shells have been going for quite a while. Think compound variables in ksh93 and prototypical inheritance in JavaScript for starters.

Admittedly my experience in Korn Shell is rather limited. I don't know everything it's capable of. Compound variables are a new one on me. I had been using mksh for most of my Korn shell experimentation, but I guess I've got to stick with ksh 93 if I want the compound vars...

From what I can tell, though, compound vars aren't "objects", rather they're "structures". "objects" would have methods and (most relevant to the whole file descriptor thing) destructors.

I had never heard of Javascript being used as a shell. The only "Javascript shells" I've seen are in-browser debugging tools. (Quite helpful! But not shells for the Unix environment.) Are you talking strictly about it being used to interact with Javascript code that's running, or interactively to run other programs on the system as well?

Quote:

You do not appear to have done a survey of the internals of modern Unix/Linux shells. I would recommend that you closely examine the internals of such shells and not just the Bash shell which is rather archaic (sorry Chet!) in many respects.

Suggestions, then? What modern Unix/Linux shell should I have looked at that I (probably) haven't?

tetsujin

View Public Profile for tetsujin

Find all posts by tetsujin

03-30-2011

Registered User

4,996, 477

Join Date: Dec 2003

Last Activity: 12 June 2016, 11:03 PM EDT

Location: /dev/ph

Posts: 4,996

Thanks Given: 73

Thanked 477 Times in 439 Posts

Quote:

From what I can tell, though, compound vars aren't "objects", rather they're "structures". "objects" would have methods and (most relevant to the whole file descriptor thing) destructors.

See ksh93 discipline functions. These are equivalent to your methods. BTW, lots of OO-type languages do not use terms like classes or methods. For example , the UEFI specification uses the term "protocols" and does not even have an equivalent to C++ classes.

Quote:

From what I can tell, though, compound vars aren't "objects", rather they're "structures". "objects" would have methods and (most relevant to the whole file descriptor thing) destructors.

Err, objects are usually implemented internally as structures. Objects do not need explicit destructors as in C++. This can be handled automatically by a shell - reference count goes to zero, goes out of scope, or many other ways.

As regards JavaScript shells, a simple Google search will educate you on the numerous variety of such shells out there.

fpmurphy

View Public Profile for fpmurphy

Find all posts by fpmurphy

03-30-2011

Registered User

34, 0

Join Date: Apr 2009

Last Activity: 12 July 2011, 3:09 PM EDT

Posts: 34

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by fpmurphy

As regards JavaScript shells, a simple Google search will educate you on the numerous variety of such shells out there.

Yeah, I try web searching for "Javascript shell" and I get a bunch of in-browser debugging tools.

Quote:

Objects do not need explicit destructors as in C++. This can be handled automatically by a shell - reference count goes to zero, goes out of scope, or many other ways.

The relevant point of a destructor is that when one of those things happens - reference count goes to zero, variable goes out of scope, whatever - the destructor is run and something happens. That allows you to use object lifetime to control resource allocation - including allocation of resources the shell may not have been explicitly designed to manage.

Last edited by tetsujin; 03-30-2011 at 12:54 PM..

tetsujin

View Public Profile for tetsujin

Find all posts by tetsujin

03-30-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by tetsujin

Well, yeah, any program can create a PTY - but if you want to actually display multiple TTY-using programs simultaneously, you need something like screen or xterm - a virtual terminal.

Displaying multiple simultaneous jobs is a rather different process than running multiple simultaneous jobs. Maybe it should be seperate. Either way though it's probably best left 'till last.

Quote:

True... The main use cases I'm thinking about here would be cases where the program expects to display to a TTY.

Hmm. You use 3/4 of the same machinery either way though. And as mentioned, a terminal may not even be necessary.

Quote:

Another way to look at this would be to say it's part of an introduction of proper threading to the shell - meaning that multiple loop iterations running concurrently can modify the variable

As in, no data dependencies or fixed order of operations? You'd need explicit means of synchronization, then.

Quote:

...and if better synchronization is needed, then the user has to write critical regions into their shell code.

Not just critical sections, but means to tell when a section of code has completed to put inside the critical sections.

Quote:

Introducing the concept of threads may complicate other areas of the shell, however: for instance at present if you pipe a bunch of different shell commands together, all but one of those is going to be run in a subshell with no ability to change the environment of the main shell process. Introducing threading would raise the question of whether all those built-in jobs (and even the parenthesis syntax in general) should now be threads.

That could solve a whole lot of problems if you could do it. But you can't avoid creating processes for external commands. It'd also mean you'd lose all the easy ways to scope files and variables, you'd have to have to manage them all brute-force instead of operating with the kernel's support.

I just realized another big problem with running external programs in a threaded shell. fork()ing clones all your threads! You need a mutex or semaphore you can stop all your threads with so you can stop the universe, fork(), redirect and exec in the child, then resume. Or a seperate process you feed commands into with a pipe that creates children for you -- might actually be more efficient than forcing all your threads to run in lockstep.

Quote:

My installed version of wget

AFAIK there's not a bunch of different wgets out there, it's all the GNU wget. They might be compiled with different options -- Windows versions might lack SSL and always print dots -- but they're still the same source code.

Quote:

When there's no TTY (running inside emacs, in this case) the progress bar is redrawn every 50KB or so - on a new line.

You can verify that it's printing dots one-at-a-time with strace:

Code:

$ strace wget http://website/ -O /dev/null > wget.log 2>wget.err 
$ grep "write(2" wget.err | tail
write(2, ".", 1.)                        = 1
write(2, ".", 1.)                        = 1
write(2, ".", 1.)                        = 1
write(2, " ", 1 )                        = 1
write(2, ".", 1.)                        = 1
write(2, ".", 1.)                        = 1
write(2, ".", 1.)                        = 1
write(2, ".", 1.)                        = 1
write(2, ".", 1.)                        = 1
write(2, ".", 1.)                        = 1
$

Something else is doing line-buffering for you, maybe the application you're watching the file with, or maybe a pipe.

So there's only two cases, terminal vs non-terminal, not three, terminal vs pipe vs file.

Whew.

Quote:

If two wgets (with no /dev/tty) are run concurrently with shared stdout, the display winds up corrupted as the two processes write to stdout simultaneously:

Naturally. I'm not sure it's the shell's job to fix this, though. Anyway it's probably not worth worrying about at this point, when there's not even an implementation of the basic no-frills algorithm.

Quote:

If you were running those two programs concurrently while you're at the console, watching, that's not a particularly useful way to display them.

That's not the shell's fault and not the shell's problem, though. If you discovered something with even uglier output, would you add more special modes to accommodate it?

Taking over the terminal that way also means there's no longer one place to type input and one place to read output. That's fine in screen when you created the windows and know what they are, but when they create and destroy themselves in droves, the interface might seem just as bad a scramble as you were trying to fix.

Terminal-using programs like login systems might also end up printing normal prompts in bizarre places because your terminal isn't working the usual way -- or you could not give everything a terminal unless asked for explicitly, in which case they'd just fail to work at all... Or you could give each and every external command its own little shell window on the off-chance that it might need it.

I suppose you could reserve the lower half of the screen for normal console-like I/O but you'd have to emulate it with a pty, which'd turn your shell into a hacking tool -- people could spoof password logins by default. And your shell would look more like an IDE than a terminal.

Not that it wouldn't be useful but maybe these things really do deserve to be separate. It should be kept out of the way and only used where you tell it to explicitly, like a debugging feature, and even then, only when a terminal's available, and with the debug window things output-only.

Another tidbit that could help simplify this a lot is how some terminals can set a scrolling region. It's like a window you can create inside the terminal itself just by printing certain escape sequences. You could set aside the upper half of the screen for some shell window things and use the lower half as a real terminal for things like ssh and passwd to pop up password prompts in.

Of course, only some terminals support this, so it's not portable. You'd also have to carefully time what things get written when.

Bottom line is that taking over the terminal like this means all the usual terminal behavior you take for granted and depend on can't be depended on unless you do it all yourself.

Quote:

(For that matter, that kind of information isn't useful for a log, either...

Funny you should say that, there's actually something on my system that does log wget's dot-lines. Whether it's useful is a matter of taste anyway. It does tell you more information than a single line of progress bar -- it tells you transfer rates over time, not just the average speed.

Quote:

That kind of output is mostly just useful if you want to read it into another program that implements another progress display method.) Really you just want to see the filenames of the files currently being fetched and the progress on each one.

Again remember that you're writing a programming language, not a GUI. People might put it to uses you don't expect, or might not be able to because you didn't give them means to do what they "usually" don't need to. Holding too strictly to things which prettify the terminal could make your language hard to use without one.

Quote:

Still, regardless of whether the output is readable before you interleave it, it's not going to be readable after you interleave it unless you make the right kind of choice about how the values should be interleaved. (Or, in the case of programs that display to the TTY, some kind of TTY-sharing mechanism.)

There's a lot of different ways programs could print output. Do you want to add special modes for all of them, or just give the programmer a way to get at what's there?

Last edited by Corona688; 03-30-2011 at 02:49 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

03-30-2011

Registered User

34, 0

Join Date: Apr 2009

Last Activity: 12 July 2011, 3:09 PM EDT

Posts: 34

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Corona688

Quote:

If two wgets (with no /dev/tty) are run concurrently with shared stdout, the display winds up corrupted as the two processes write to stdout simultaneously:

Naturally. I'm not sure it's the shell's job to fix this, though.

In the design I've suggested here, the shell doesn't take responsibility for fixing that. Rather, it merely provides a mechanism that allows an external program to fix that... The shell creates the PTYs for each thread (if the user requests TTY sharing) and connects them appropriately, but the job of determining how those PTY masters are used is left up to the program specified by the user.

And if someone doesn't want that, they don't specify the TTY multiplexer option when they write their loop. They can just specify the "-j" option to thread their loop iterations and not sweat the details of how output is interleaved or how TTY access is managed. The end result there is that the user gets a jumbled display if their loop iterations write to the TTY at the same time - but that's their choice.

Quote:

Taking over the terminal that way also means there's no longer one place to type input and one place to read output. That's fine in screen when you created the windows and know what they are, but when they create and destroy themselves in droves, the interface might seem just as bad a scramble as you were trying to fix.

In the design I described, "screen" windows don't get created and destroyed during the loop.

Rather, when writing the loop, if the user has explicitly requested a terminal-sharing mechanism be used to synchronize display between the multiple loop iterations being run concurrently, then there will be one terminal created for each thread, not one for each iteration.

This doesn't create a perfect display, because as each loop iteration ends, a new one takes its place on the display. There's no display of history, basically. But it's a quick & dirty way for the user to get a display that's at least readable...

Quote:

I suppose you could reserve the lower half of the screen for normal console-like I/O but you'd have to emulate it with a pty, which'd turn your shell into a hacking tool -- people could spoof password logins by default.

Could you explain the password spoofing issue to me? Depending on the nature of the issue it could obviously be serious...

Quote:

Not that it wouldn't be useful but maybe these things really do deserve to be separate.

Well, they are separate... What I'm describing here is a mechanism that would allow someone to easily provide complicated behavior via external utilities. The added syntax and functionality just makes it a lot easier to hook up the needed pipes and PTYs.

Quote:

Again remember that you're writing a programming language, not a GUI.

I consider a shell to be a programming language and a UI. To me, it's pretty much the unique defining characteristic of a shell.

Quote:

People might put it to uses you don't expect, or might not be able to because you didn't give them means to do what they "usually" don't need to.

Considering the "usual" case is still useful, though, for deciding what things should be well-supported and convenient in the syntax.

Quote:

Holding too strictly to things which prettify the terminal could make your language hard to use without one. There's a lot of different ways programs could print output. Do you want to add special modes for all of them, or just give the programmer a way to get at what's there?

In this case I'm proposing only that the shell provides the mechanism that a programmer would need to easily "prettify" the output himself. Strictly speaking it's a capability that's mostly already present in the shell - it's just not something that's easy to do.

For instance, to do the equivalent of multi-threaded loops in a current version of bash: running each loop step as a background subshell job would be a pretty simple way to accomplish that.

If you wanted to limit how many of them run at once (after all, running a substantially higher number of jobs than your number of CPU cores at best is going to get you around some I/O blocking, at worst it's going to slow you down via VM thrashing) - that's more complicated. When the loop starts up you need to make sure the first four iterations run in the background, and then on successive loop iterations you need to wait for an already-running previous step to terminate before running the next one in the background. This could be done with something like the wait() call (I know there's the "wait" builtin - honestly I never got them to work right for subshells run in the background) - or if that were unusable for whatever reason, one could create a pipe (mkfifo wouldn't work unless you could open both ends of the fifo before the loop starts) and have each loop iteration write a byte to the pipe when it's done and attempt to read a byte from the pipe before it starts. ("wait"ing is probably the better approach)

Then if you wanted to distribute input to the stdin of the individual loop iterations, or combine output from the individual loop iterations to produce a single stdout stream (implementing some kind of useful synchronization method to produce the desired kind of output) - then first off individual loop steps need to know not only when one of those four "slots" is open, but it'd need to know which one - the separate "threads" have to have "identity" so they can attach to distinct points on the multiplexer/demultiplexer. Then for an output multiplexer, you need to create a pipe for the stdout of each "thread" and feed the "downstream" end of each of those pipes to the multiplexer program, and redirect each loop iteration to the proper "upstream" end when you run it...

Then if you want to do TTY sharing, you need to create a set of PTYs (open /dev/ptmx, assuming we're not running on a system that lacks that - get the TTY name and use it to open the slave end - notably I think we're missing "ttyname" in the shell) and distribute them to the respective "threads" (one would need a means of setting that PTY as the controlling terminal for the job) as with the I/O pipes, and then feed the PTY master filenames or file descriptors to the command that takes responsibility for managing the display... And then, I guess, rely on the terminal sharing program (screen or whatever) to propagate signals to the individual jobs... (There's other ways you could do this, too, like run multiple instances of a program in screen, and send these instances instructions on how to perform each piece of work you want to do in your loop)

So not much of what I describe is beyond current shells' capabilities - it's just not an easy thing to do.

The idea is to think about what kinds of facilities the shell can reasonably provide that will make it easier for people to do whatever they want to do.

Of course, even just implementing the "-j" option without any of the multiplexing stuff would be useful for a lot of cases.

Going back to the issue of shell threading:
I hadn't thought about the interaction between fork() and threads - But, then, "threads" in an interpreted language don't have to be interpreted as actual execution threads: Python (the C implementation) for instance, implements threads internally. From the perspective of the OS these threads don't exist (they're not separate entities in the scheduler) but within the context of the language itself they work as any other threads implementation would.

That could complicate the implementation of built-ins if those built-ins might block on input or output - so I guess I'd have to think about that one, consider how much complication it'd introduce to the implementation of built-ins vs. the impact of using real threads and having to synchronize access to the environment and deal with the fork() issue you describe...

If the implementation did use real threads, another option for dealing with forking would be to fork off a process that does nothing but listen to a pipe that tells it what to fork and run, and a Unix domain socket that feeds it file descriptors to attach to the new processes... Of course it'd also have to communicate back information about when jobs terminate... Apart from the fact that it solves the thread problem pretty handily, it seems like kind of an ugly solution, really.

As for the other impacts of threading - parens could still be used specifically to specify a subshell context (after all, both bash and ksh provide curly braces as a way to group commands without creating a subshell context) - the main impact then would be that things would not be implicitly shuffled off to subshell context as a result of their position in a pipeline. (Whether that's better than, say, ksh's approach of establishing the convention that only the last part of a pipeline is in the current job is debatable, I guess. I think it'd be preferable. "Don't subshell anything unless I say so" instead of "accept as a necessity that all but one of the commands on a pipeline must be run in a separate process and therefore a separate environment")

tetsujin

View Public Profile for tetsujin

Find all posts by tetsujin

What is on Your Mind?

Speculative Shell Feature Brainstorming

4 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Can we create any check-point feature in shell ?

Discussion started by: abhaydas

2. UNIX for Dummies Questions & Answers

brainstorming automated response

Discussion started by: Movomito

3. Shell Programming and Scripting

Creating a command history feature in a simple UNIX shell using C

Discussion started by: -=Cn=-

4. SCO

BASH-like feature. is possible ?

Discussion started by: nEuRoMaNcEr