How Would You Like Your Loops Served Today?

Login or Register to Ask a Question and Join Our Community

How Would You Like Your Loops Served Today?

Tags

Login to Discuss or Reply to this Discussion in Our Community

The Lounge What is on Your Mind? How Would You Like Your Loops Served Today?

06-13-2012

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

How Would You Like Your Loops Served Today?

Scrutinizer and i had a discussion about loops in shell scripts and you might be interested in joining in and share your experiences:

i wrote an example script which basically employed the following logic:

Code:

cat /some/file | while read var ; do
     echo var = $var           # just do something with $var
done

Scrutinizer said, this is a UUOC. Well, in principle, he is of course right. We could write the same this way:

Code:

while read var ; do
     echo var = $var           # just do something with $var
done < /some/file

But still, i beg to differ. This is not a useless but a very sensible use of cat! Suppose the loop would not be as short as the example here, but several screenpages long. To understand what goes into "$var" one would have to scroll down to its end, then, to find out what is done with "$var", scroll back up again.

Is it only me that i hate to have to scroll up and down repeatedly? I find it a lot easier to read if i "steer my loops from the top" instead of from the (maybe far-away) bottom.

Of course, there is this alluring GNU shellnik startup called bash. In bash pipelines have some really weird side effects, like variables being local to them. The following works in both shells:

Code:

cat /some/list | while read entry ; do
     line="$line $entry"
done
echo $line           # what is in there?

But while in ksh "$line" would hold all the list entries after the loop in bash the variable would be empty! Is it only me or do you think this is counter-intuitive too?

So probably in bash one has to resort to this ugly style of meticulously telling the shell the recipe in length while being totally silent about the ingredients you want to use - until the very end. Could you imagine cook books to be written that way? It would look like:

Quote:

First you take something *) and some other thing **) and mix it together, scramble it, then heat up something ***) in a something ****). Now put the mixed together something *) and something **) into the something ****) and cook for some minutes.

_______________

*) eggs
**) some cheese and ham
***) a drop of butter
****) a pan

But the question stays: do you think this - in strictest terms - UUOC should be avoided even if it has no negatie side effects or do you think the gain in clarity outweighs this?

Discuss!

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

06-13-2012

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

Pipes are not free, so from the potential of efficiency I prefer to avoid the cat construct. Further, I prefer my shell loops this way:

Code:

#!/usr/bin/env ksh

while read -u $rfd buf
do
    echo "$buf"
done {rfd}<some-file

Having the shell open the input to the while on a file descriptor other than standard input prevents me, or some future maintainer, from adding an ssh command (or similar stdin gobbling binary) and forgetting to redirect stdin from /dev/null or using an alternate mechanism (-n in the case of ssh) to prevent the binary from causing odd problems with the loop. Maybe it's me, but it seems that there have been a fair few posts on this forum that were related to a while loop's stdin being 'eaten' by a process in the loop.

Letting bash try to run this results in several errors.

A slight twist on the code above allows for the input to be defined at the top of the loop without requiring the extra cat:

Code:

exec {rfd}<data
while read -u $rfd buf
do
    echo "$buf"
done

Again, it doesn't work in bash. The loop below does, but I don't like having to pick the file descriptor value, and rfd=3; exec ${rfd}<data, which would allow me to hard code the constant only once, seems not to work (parsing and expansion order I believe). IMHO, Having the shell automatically assign an available file descriptor value just seems the right thing to do.

Code:

exec 3<data
while read -u 3 buf
do
    echo "$buf"
done

agama

View Public Profile for agama

Find all posts by agama

06-14-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

I have preference for whatever is simplest and the most intuitive to read..
I agree that specifying data at the end of the loop is a bit of an oddball, but:

To me it is the simplest and cleanest code, there is no need for a cat-and-pipe or an extra file descriptor
As noted above, in shells other than ksh it does not send the loop into a subshell, so not only is that more efficient, it ensures variables set inside the loop are available outside. I tend to go with what works in all shells. The way it is done in ksh is great, but it is not specified in POSIX.
Whenever I have a loop with more than 20 lines of code, I tend to start thinking about splitting it into functions with mnemonic names.
With regards to the use of redirects, I have a preference for using them only in the context in which they are used. Also, feeding them into the loop at the bottom is ideal, since the file descriptors get closed when the loop ends. With the exec examples you would need to use an explicit close afterwards.

Last edited by Scrutinizer; 06-14-2012 at 03:35 AM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

06-14-2012

Registered User

472, 104

Join Date: Aug 2006

Last Activity: 19 October 2018, 6:30 AM EDT

Posts: 472

Thanks Given: 4

Thanked 104 Times in 95 Posts

I prefer to redirect at the end of the loop too because I think it's the most portable construct. If the loop is several pages long I usually put a comment at the beginning that tells what is read from.

cero

View Public Profile for cero

Find all posts by cero

06-14-2012

Registered User

6,402, 678

Join Date: Mar 2008

Last Activity: 8 June 2016, 9:58 PM EDT

Posts: 6,402

Thanks Given: 288

Thanked 678 Times in 647 Posts

I much prefer top-down flow in any programming language and adopt the modular approach with the main program logic as the simplest control flow possible.

Each time this debate comes up there is no proof that the Shell inward redirect is faster than using cat. I can't see why the Posix folks don't make cat a Shell built-in rather than try to retire the command.

Have you read the "Useful uses of cat" collection from the excellent Mascheck site:
Useful use of cat(1)
That list includes a contribution from a certain Chris F.A. Johnson !
An enhanced version of the "convert file contents into arguments" contribution came up on unix.com yesterday.

methyl

View Public Profile for methyl

Find all posts by methyl

06-14-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

I too wish you could do <filename while read LINE ... like you can commands, but you can't, and putting the redirection at the end is the most portable.

If I had a shell loop 3 pages long, I'd try and reduce it with functions.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

06-14-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

@methyl cat is a very useful command and many of its uses are listed on that site, who could argue with that? But one will find that the particular case that is the subject of this thread is not listed there. Interestingly, speed had not been brought up as an argument here, but since you mentioned it, I thought I'd run a couple of simple tests on a 66 MB file:

test 1:

Code:

cat test.txt | while IFS= read -r line; do echo "$line"; done

test 2

Code:

while IFS= read -r line; do echo "$line"; done < test.txt

test	shell	real	user	sys
test1	bash3	1m5.429s	0m43.494s	0m21.982s
test2	bash3	0m43.040s	0m38.658s	0m4.382s
test1	ksh93	0m35.22s	0m14.89s	0m20.36s
test2	ksh93	0m6.87s	0m6.84s	0m0.02s

So it seems there can also be a significant speed difference..

--
bash 3.2.48 / ksh93s+

---
I am not really into UUOC'ing BTW, it arose as part of a humorous exchange..

Last edited by Scrutinizer; 06-14-2012 at 06:18 PM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Login or Register to Ask a Question

Previous Thread | Next Thread

4 More Discussions You Might Find Interesting

1. AIX

VIOS backupios -mksysb - does it need to be served by a NIM server ?

working through VIOS backup options. Generally, we store mksysb's on a server and then NFS mount them from it to copy to a VIO optical library, etc. In the case of a VIO backup, I see the -mksysb option to backupios and understand that it doesn't include the NIM resources in the backup. ...

2. Shell Programming and Scripting

meaning of today=${1:-${today}}

what does today=${1:-${today}} mean??? I saw a script which has these two lines: today=`date '+%y%m%d'` today=${1:-${today}} but both gives the same value for $today user:/export/home/user>today=`date '+%y%m%d'` user:/export/home/user>echo $today 120326...

3. Web Development

Symbol Links amongst Apache's served files, is this a security-don't?

I read somewhere that you should make sure Apache is configured to not allow symbolic links to be followed outside the webroot, as this can compromise security. I can imagine how this could lead to a security risk: eg: Is my assumption correct? -- Is it nothing more than: "its just...

4. IP Networking

to serve or be served??

I have two machines on my network - one OSX mac and one linux box. The mac is my main workhorse, and the linux box does occasional chores and webserving. Currently the mac shares (via NFS) files with the Liinux box. Would it be less demanding on the mac if I made it a client, and moved my files...

Login or Register to Ask a Question