Sponsored Content
The Lounge What is on Your Mind? Speculative Shell Feature Brainstorming Post 302508978 by tetsujin on Tuesday 29th of March 2011 01:49:04 PM
Old 03-29-2011
Time to move on perhaps? The whole concept of storing "objects" in shell variables is a bit outside of the usual expectations for a Unix shell, so it's bound to be a contentious idea - I don't think the argument about it is getting us anywhere.

Another feature I've been thinking about: parallelizable loops.

Code:
# This version of "for"/"do" can take a "-j" argument, similar to that of make, etc.
# The -j argument tells the loop how many parallel processes to spawn.
for f in ./*.avi; do -j4
    # This ffmpeg syntax isn't exactly right, but you get the idea...
    ffmpeg $f -vcodec copy -acodec copy ${f//.avi}.mp4
done

That much is pretty simple (though, of course, one may want the "-j" argument on "for" instead of "do"...) but there are some complications.

The most apparent complication is that the jobs run in parallel won't be synchronized in their use of stdout and the tty - and sharing stdin presents a similar problem. There are different ways these issues could be addressed:

First, for stdout: the shell doesn't know anything about what kind of output a program generates, but if you give it a hint there are different strategies for stdout multiplexing that could work. For instance, if the command being run generates a list of values separated by newlines, and you're OK with values from the different sources being interleaved, the shell could line-buffer the output of each loop iteration and interleave output as full lines are ready. Or if the command's stdout isn't really something you can interleave sensibly, the shell could let one iteration's output through at a time, but buffer the others - though this would have the limitation that those "others" could wind up blocking on output if their buffers fill up before the first iteration terminates. More advanced methods for interleaving values could be accomplished by use of external tools:

Code:
for f in *; do --sync=mysyncfunction -j4
...

"mysyncfunction" could be an external command or script or a function defined within the shell: it takes as its input a copy of the output from a loop iteration, and as its output it provides numeric values specifying how many bytes of that output to take as a "single value". The shell spawns a copy of this command for each loop iteration, "tee"'s a copy of the loop output to it - buffers a copy of that output for itself, and then reads the numeric values that come out of the sync function to see how many bytes to take...

Another problem is that, if you parallelize the shell's loop construct, then (presumably) code inside the loop can no longer affect the shell's environment. (A feature like this might be implemented by spawning a new shell process for each iteration running concurrently, which would isolate the new process from the original process's environment... But even if you didn't implement things that way, doing parallel evaluation introduces problems of synchronization...

An alternate approach could be to have the sync function actually perform the task of merging the output of the loop iterations itself. To do this, the sync function would be implemented as a command that takes an arbitrary number of filenames as input, reads them, implementing the desired synchronization behavior and producing the desired final output. The shell would, of course, have a pipe() output end corresponding to each loop iteration process: it would hand off these file descriptors to the sync function when running it, and provide /dev/fd/ paths corresponding to these FDs as arguments to the sync function. (This approach puts more burden on the sync function - it must not only determine where synchronization should occur, but actually implement it for an arbitrary number of inputs...)

Synchronizing input could take a similar path: assuming the input is a big collection of "stuff" meant to be distributed to different loop iterations, you could implement different ways of splitting that input... Simple stuff like splitting on newlines or on a delimiter character, and more complicated stuff could be implemented as external filters. It is also possible that in some cases one would want the input "mirrored" to all loop processes (though this would be kind of odd - you'd be buffering up that input stream and dispatching copies of it either to each iteration or to each iteration process - dispatching to each iteration would probably make the most sense but it'd require so much buffering that it'd be hard to justify...)

Then there's the case where the jobs inside the loop use the TTY, or treat stdout like a TTY: in those cases, the best option might be to hand off TTY synchronization to a program like "screen" (or spawn xterms if in a GUI):
Code:
while whatever; do -j4 -tty-sync=screen-wrapper.sh;
...

screen-wrapper.sh would have to accept either PTY master names or file descriptor identification (/dev/fd/ probably) and initiate some kind of display-sharing for the concurrent iterations of the loop. (I'm not actually sure if screen can connect to an already-open PTY master the way xterm can... I know it can resume a disconnected session which is kind of the same idea... But for the sake of this, imagine that it can.)

input/output synchronization and TTY sharing/instantiation would not necessarily be mutually exclusive, though I think the case where someone would use both at once would be relatively rare. So in cases where the shell's stdout is TTY and the loop's input/output isn't being redirected (and, thus, stdin/stdout is still the TTY from the loop's perspective) - there are different cases to consider:
  • Neither stdio nor TTY synchronization mechanisms are specified: loop iteration I/O is handled as if the loop iterations were regular background jobs.
  • stdio is unsynchronized, TTY is syncrhonized (via screen or whatever) - stdin/stdout for each iteration job is attached to that job's respective TTY (PTTY slave created by the shell for the loop)
  • stdio is synchronized, TTY is unsynchronized: all jobs get the shell's TTY as their TTY, and the ends of the multiplexers (the pre-splitting stdin, the post-merge stdout) are also attached to the TTY.
  • both stdio and TTY are synchronized: this may be an invalid case, if the shell's stdout really is the TTY... The shell could handle it by creating an additional TTY and attaching the ends of the multiplexers to that. (So if the loop had both stdin and stdout multiplexers specified, and you were running four loop iterations in parallel, "screen" or whatever would show five windows: four would be connected to /dev/tty of one of the four loop jobs, while the fifth would be connected to the merged stdin and/or merged stdout.)


In cases where stdin/stdout aren't connected to the TTY in the first place, of course, it's much simpler: stdin/stdout and TTY can simply be treated as separate channels which don't interact.

Of course, for cases that don't require specialized behavior, a sensible default behavior would be to treat each loop iteration like a background job: meaning it can't get stdin or read from the TTY, and stdout/TTY output is an unsynchronized free-for-all...

There are also various cases of commands that aren't full-on TTY apps, but which do use the TTY (progress bars in wget and so on) - more specialized TTY sharing strategies could be developed for cases where each job just does simple line-oriented display characters - but I don't think you can get around issuing each loop iteration its own PTY slave if you want to share the display (the program has to have a TTY as its output or it won't treat output as a TTY - and if its output is the shared TTY, you don't get the opportunity to alter those display codes) - so in most cases dispatching to "screen" or similar is probably the way to go.

Dispatching to "screen" raises another issue, of course: $TERM has to be set properly... I don't have a great solution to that one, honestly. Smilie

Last edited by tetsujin; 03-29-2011 at 02:57 PM..
 

4 More Discussions You Might Find Interesting

1. SCO

BASH-like feature. is possible ?

Greetings... i was wondering if there is any shell configuration or third party application that enables the command history by pressing keyboard up arrow, like GNU/BASH does or are there an SCO compatible bash versio to download? where ? just wondering (sory my stinky english) (2 Replies)
Discussion started by: nEuRoMaNcEr
2 Replies

2. Shell Programming and Scripting

Creating a command history feature in a simple UNIX shell using C

I'm trying to write a history feature to a very simple UNIX shell that will list the last 10 commands used when control-c is pressed. A user can then run a previous command by typing r x, where x is the first letter of the command. I'm having quite a bit of trouble figuring out what I need to do, I... (2 Replies)
Discussion started by: -=Cn=-
2 Replies

3. UNIX for Dummies Questions & Answers

brainstorming automated response

I am managing a database of files for which there is a drop-box and multiple users. what i would like to do is set a criteria for files coming into the drop-box based on file structure. (they must be like this W*/image/W*-1234/0-999.tif) If the files do not match the criteria i want them to be... (1 Reply)
Discussion started by: Movomito
1 Replies

4. UNIX for Beginners Questions & Answers

Can we create any check-point feature in shell ?

I have a script as below and say its failed @ function fs_ck {} then it should exit and next time i execute it it should start from fs_ck {} only Please advise #!/bin/bash logging {} fs_ck {} bkp {} dply {} ## main function### echo Sstarting script echo '####' logging fs_ck... (3 Replies)
Discussion started by: abhaydas
3 Replies
All times are GMT -4. The time now is 01:31 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy