How Would You Like Your Loops Served Today?


 
Thread Tools Search this Thread
The Lounge What is on Your Mind? How Would You Like Your Loops Served Today?
# 1  
Old 06-13-2012
How Would You Like Your Loops Served Today?

Scrutinizer and i had a discussion about loops in shell scripts and you might be interested in joining in and share your experiences:

i wrote an example script which basically employed the following logic:

Code:
cat /some/file | while read var ; do
     echo var = $var           # just do something with $var
done

Scrutinizer said, this is a UUOC. Well, in principle, he is of course right. We could write the same this way:

Code:
while read var ; do
     echo var = $var           # just do something with $var
done < /some/file

But still, i beg to differ. This is not a useless but a very sensible use of cat! Suppose the loop would not be as short as the example here, but several screenpages long. To understand what goes into "$var" one would have to scroll down to its end, then, to find out what is done with "$var", scroll back up again.

Is it only me that i hate to have to scroll up and down repeatedly? I find it a lot easier to read if i "steer my loops from the top" instead of from the (maybe far-away) bottom.

Of course, there is this alluring GNU shellnik startup called bash. In bash pipelines have some really weird side effects, like variables being local to them. The following works in both shells:

Code:
cat /some/list | while read entry ; do
     line="$line $entry"
done
echo $line           # what is in there?

But while in ksh "$line" would hold all the list entries after the loop in bash the variable would be empty! Is it only me or do you think this is counter-intuitive too?

So probably in bash one has to resort to this ugly style of meticulously telling the shell the recipe in length while being totally silent about the ingredients you want to use - until the very end. Could you imagine cook books to be written that way? It would look like:

Quote:
First you take something *) and some other thing **) and mix it together, scramble it, then heat up something ***) in a something ****). Now put the mixed together something *) and something **) into the something ****) and cook for some minutes.

_______________

*) eggs
**) some cheese and ham
***) a drop of butter
****) a pan
But the question stays: do you think this - in strictest terms - UUOC should be avoided even if it has no negatie side effects or do you think the gain in clarity outweighs this?

Discuss!

bakunin
# 2  
Old 06-13-2012
Pipes are not free, so from the potential of efficiency I prefer to avoid the cat construct. Further, I prefer my shell loops this way:

Code:
#!/usr/bin/env ksh

while read -u $rfd buf
do
    echo "$buf"
done {rfd}<some-file

Having the shell open the input to the while on a file descriptor other than standard input prevents me, or some future maintainer, from adding an ssh command (or similar stdin gobbling binary) and forgetting to redirect stdin from /dev/null or using an alternate mechanism (-n in the case of ssh) to prevent the binary from causing odd problems with the loop. Maybe it's me, but it seems that there have been a fair few posts on this forum that were related to a while loop's stdin being 'eaten' by a process in the loop.

Letting bash try to run this results in several errors.

A slight twist on the code above allows for the input to be defined at the top of the loop without requiring the extra cat:

Code:
exec {rfd}<data
while read -u $rfd buf
do
    echo "$buf"
done

Again, it doesn't work in bash. The loop below does, but I don't like having to pick the file descriptor value, and rfd=3; exec ${rfd}<data, which would allow me to hard code the constant only once, seems not to work (parsing and expansion order I believe). IMHO, Having the shell automatically assign an available file descriptor value just seems the right thing to do.

Code:
exec 3<data
while read -u 3 buf
do
    echo "$buf"
done

# 3  
Old 06-14-2012
I have preference for whatever is simplest and the most intuitive to read..
I agree that specifying data at the end of the loop is a bit of an oddball, but:
  • To me it is the simplest and cleanest code, there is no need for a cat-and-pipe or an extra file descriptor
  • As noted above, in shells other than ksh it does not send the loop into a subshell, so not only is that more efficient, it ensures variables set inside the loop are available outside. I tend to go with what works in all shells. The way it is done in ksh is great, but it is not specified in POSIX.
  • Whenever I have a loop with more than 20 lines of code, I tend to start thinking about splitting it into functions with mnemonic names.
  • With regards to the use of redirects, I have a preference for using them only in the context in which they are used. Also, feeding them into the loop at the bottom is ideal, since the file descriptors get closed when the loop ends. With the exec examples you would need to use an explicit close afterwards.

Last edited by Scrutinizer; 06-14-2012 at 03:35 AM..
# 4  
Old 06-14-2012
I prefer to redirect at the end of the loop too because I think it's the most portable construct. If the loop is several pages long I usually put a comment at the beginning that tells what is read from.
# 5  
Old 06-14-2012
I much prefer top-down flow in any programming language and adopt the modular approach with the main program logic as the simplest control flow possible.

Each time this debate comes up there is no proof that the Shell inward redirect is faster than using cat. I can't see why the Posix folks don't make cat a Shell built-in rather than try to retire the command.

Have you read the "Useful uses of cat" collection from the excellent Mascheck site:
Useful use of cat(1)
That list includes a contribution from a certain Chris F.A. Johnson !
An enhanced version of the "convert file contents into arguments" contribution came up on unix.com yesterday.
# 6  
Old 06-14-2012
I too wish you could do <filename while read LINE ... like you can commands, but you can't, and putting the redirection at the end is the most portable.

If I had a shell loop 3 pages long, I'd try and reduce it with functions.
# 7  
Old 06-14-2012
@methyl cat is a very useful command and many of its uses are listed on that site, who could argue with that? But one will find that the particular case that is the subject of this thread is not listed there. Interestingly, speed had not been brought up as an argument here, but since you mentioned it, I thought I'd run a couple of simple tests on a 66 MB file:

test 1:
Code:
cat test.txt | while IFS= read -r line; do echo "$line"; done

test 2
Code:
while IFS= read -r line; do echo "$line"; done < test.txt

testshellrealusersys
test1bash31m5.429s0m43.494s0m21.982s
test2bash30m43.040s0m38.658s0m4.382s
test1ksh930m35.22s0m14.89s0m20.36s
test2ksh930m6.87s0m6.84s0m0.02s

So it seems there can also be a significant speed difference..

--
bash 3.2.48 / ksh93s+



---
I am not really into UUOC'ing BTW, it arose as part of a humorous exchange..

Last edited by Scrutinizer; 06-14-2012 at 06:18 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

4 More Discussions You Might Find Interesting

1. AIX

VIOS backupios -mksysb - does it need to be served by a NIM server ?

working through VIOS backup options. Generally, we store mksysb's on a server and then NFS mount them from it to copy to a VIO optical library, etc. In the case of a VIO backup, I see the -mksysb option to backupios and understand that it doesn't include the NIM resources in the backup. ... (3 Replies)
Discussion started by: maraixadm
3 Replies

2. Shell Programming and Scripting

meaning of today=${1:-${today}}

what does today=${1:-${today}} mean??? I saw a script which has these two lines: today=`date '+%y%m%d'` today=${1:-${today}} but both gives the same value for $today user:/export/home/user>today=`date '+%y%m%d'` user:/export/home/user>echo $today 120326... (2 Replies)
Discussion started by: Vidhyaprakash
2 Replies

3. Web Development

Symbol Links amongst Apache's served files, is this a security-don't?

I read somewhere that you should make sure Apache is configured to not allow symbolic links to be followed outside the webroot, as this can compromise security. I can imagine how this could lead to a security risk: eg: Is my assumption correct? -- Is it nothing more than: "its just... (0 Replies)
Discussion started by: jzacsh
0 Replies

4. IP Networking

to serve or be served??

I have two machines on my network - one OSX mac and one linux box. The mac is my main workhorse, and the linux box does occasional chores and webserving. Currently the mac shares (via NFS) files with the Liinux box. Would it be less demanding on the mac if I made it a client, and moved my files... (2 Replies)
Discussion started by: mistafeesh
2 Replies
Login or Register to Ask a Question