.sh file syntax checking script

12-20-2007

Registered User

30, 0

Join Date: Dec 2007

Last Activity: 1 February 2014, 10:39 PM EST

Posts: 30

Thanks Given: 0

Thanked 0 Times in 0 Posts

>> == me
> = reborg
(Only one level of quoting seems to be supported on this forum, so I am reverting to old usenet style attribution.)

>>Personally, I think that it should always be an error to use an uninitialized variable.
>You are entitled to thinks so.
>I disagree but that is a question of opinion, by all means do that.
>I just said it was unnecessary in this case.

I feel more strongly that -n should always be on, or at least be the default behavior then I do about -u. I can live with -u not being on if every shell implementation has a standard behavior that it must adhere to (e.g. that an unitialized variable always has an empty string value, will never cause a null pointer exeception, etc). When you have problems is when different implementations subtly differ, which seems to be a major problem in the unix world (in contrast to, say, Java, which is properly stringent in implementation consistency). In cases like this, my experience is that it is better to be safe than sorry.

>>If I had written sh, it would not even be an option.
>Sorry I didn't follow you here.

I would not have set -n as an option to sh; set -n would be the default behavior. Possibly I -might- make set +n available if someone could convince me that there was a real need (e.g. performance increase) in some cases. As mentioned above, I would likewise have set -u be the default behavior unless the uninitialzed behavior is consistent and safe enough that this is not an issue. In which case I recant.

>>First, why would I ever want to use the full path to a command? That makes it totally unportable!...
>The purpose of PATH is to tell the shell in which directories and in which order to look for commands.
>A properly written script would not normally depend on login environment to know that.

Of course your script has to rely on the login environment to know where commands are--what else can it do, guess?

In the examples that you provide, you assumed that find (or a suitable version of find) is always found in /usr/bin/, but what if it isn't? Then your script is broken and the user will have to debug the script and fix it by hand.

If you instead rely on the PATH, it is a) far likelier that the user will have gotten things like having find on his path straightened out long before he runs this script and b) even if find is not there, he is much more likely to know how to do a generic unix task like fix his path then to debug and edit a custom script.

>The script will currently break if the user has a different command
>(or different non-compatible version of a command) with the same name
>as one of the ones you use in their PATH before the one you want

Assuming that /usr/bin/find exists and is the suitable version likewise is open to failure. No win there.

>or has a command aliased to behave in a way you don't expect which would be worse.

Agreed: aliases is one (the only?) failure mode that hard coding the path to find inside the script will overcome.

Unless the use has done something amazing, like alias find to rm or something, the worst that should happen in this script if they have aliased find is that it does not work correctly.

Is there a good way to detect if a command is actually being aliased? And to unalias it, at least inside the script, and use the raw command?

Actually, now that I think more on this, if you really think that aliases are a dangerous failure mode then you have to go to extreme lengths to cope with it. For instance, with the cs.sh script that I have presented so far, you not only need to worry about the find command being aliased, but you need to worry about every single other command in that script which could also be aliased: set, echo, while, read... Its basically impossible! Larry Wall was right about it being easier to write a new shell than to port (or guarantee the correct working of) a shell script.

>I'm guessing you mean non-recursive, since it recurses by default:

Right: I mistyped.

>find <dir> \( ! -name <dir> -prune \)

I think that the complete command that I want (for dir = "./") is more like

Code:

find "./" -nowarn \( ! -name "./" -prune \) | grep \.sh

That -nowarn option is critical, else you always get this nasty warning message (at least with GNU find):

find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name `./'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ `./''.

But it appears that -nowarn is not a POSIX option, so we are back to square one...

>It has little or nothing to do with laziness.
>It is more effort for the programmer to write the message

The "more effort for the programmer" is the laziness that I referred to.

And its actually not all that much extra effort, assuming you are actually doing the right thing and argument checking your code, because at that point you already have all the conditional logic put in; you simply need to notify the user too when things go wrong. But if you have done no arg checking and just blindly proceed, then yes it is more work.

>...I am not saying don't print the message that says what is wrong,
>just display the generic form as a reminder too or tell the user how to do so, this kind of thing:

I agree: decent behavior is to print both the specific error, as well as to print proper usage. The version of the script file in my next post below does this. You were also correct in saying that the proper usage should be printed out somehow (I chose to write a function to do this) then to merely store it as program comments.

>>Do people routinely write korn shell, c shell, etc script files
>>and use the same .sh file extension for all of them,
>>instead of using something sensible like .ksh, .csh, etc?
>Yes.

>>I do not even know if other shells support the -n syntax check option,
>>so I would just as soon ignore all script files except those purporting to be bourne shell scripts.
>bourne derived shells such as ksh and bash do.

The version of the script file presented in my next post below retains searching for .sh files and assumes that whatever actually shell is executing it (and hence will be doing the syntax checking) is also a shell that is compatible with all the other .sh files.

For instance, on both my cygwin and linux boxes, the shell is actually bash, and its syntax checker seems to have no problem with bash (but not bourne) constructs like [[ in .sh files.

At this point, I think that this is the most reasonable approach, even if it is not a cure all.

Writing a POSIX compatible find seems to be the hangup; my newbie scripting skills are not up to it.

fabulous2

View Public Profile for fabulous2

Find all posts by fabulous2

12-20-2007

Registered User

30, 0

Join Date: Dec 2007

Last Activity: 1 February 2014, 10:39 PM EST

Posts: 30

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by porter

Mechanism, not policy.

What is that supposed to mean? You only believe in tools supporting mechanisms and not rigid policies?

Assuming that that is the case, I half agree and half disagree with you.

The languages that best tradeoff among productivity, elegance, robustness, maintainability, and performance always make compromises between providing freedom of expression and implementing policies.

XML's runaway rampant success compared to SGML's languishing in obscurity (except for one child, HTML) is precisely because the rigidity of its policies (simple but has to be correct syntax) was wisely chosen. It gave enough freedom of expression to do anything that you want, but limits that expression to a form that is unambiguous and easily parsed.

Java has won out over C++ for many of the same reasons.

Back to shell scripts: I think that robust and safe behavior should be the default, and higher performance options that conflict with this should not. Especially with today's computers. Maybe the choices made 30 years ago with the slow machines of that time were necessary, however.

Quote:

So what was that you said about portability?

I would like this script to be portable, and I think that I have been rationally considering all proposals (e.g. from reborg) to fix it, even if I do not agree with everything he has said so far. I am willing to be educated!

Quote:

Normally they

(a) print usage

(b) say what option was offending

(c) do not try and read your mind.

All of which I agree with. I was talking about experiences with unix that I have had where (b) was left out, and I was left staring for a long time at my input trying to determine why it did not match (a). Thats what I have a problem with.

fabulous2

View Public Profile for fabulous2

Find all posts by fabulous2

12-20-2007

Registered User

30, 0

Join Date: Dec 2007

Last Activity: 1 February 2014, 10:39 PM EST

Posts: 30

Thanks Given: 0

Thanked 0 Times in 0 Posts

Here is the latest revision of the script for anyone interested:

Code:

#!/bin/sh
set -e	# exit on first failed command
set -u	# exit if encounter never set variable


#--------------------------------------------------
# Programmer notes:
#
# See this forum for a discussion of this file: https://www.unix.com/shell-programming-scripting/46785-sh-file-syntax-checking.html
#
# Possible future enhancements:
#	--make this script file POSIX compatible (currently it is NOT due to the special find command options used below)
#	--there are many more options from the find command (e.g. how symbolic links are handled--currently they are not followed)
#	which might want to expose as command line arguments of this script.
#	On the other hand, offering these options will make it even harder to be POSIX compatible...
#--------------------------------------------------


#----------environment variables


# Initialize option variables to preclude inheriting values from the environment:
opt_h="false"	# default is do not print help
opt_p="./"	# default is the current working directory
opt_R="-maxdepth 1"	# default limits the search to just path itself (i.e. do not drill down into subdirectories)


#----------functions


printHelp() {
	echo "Usage: sh checkSyntax.sh [-h] [-p path] [-R]"
	echo
	echo "DESCRIPTION"
	echo "Checks the syntax of bourne (or compatible) shell script files." \
		" The target may be either a single shell script file," \
		" or every shell script file in a directory (this is the default behavior, with the current directory the target)," \
		" or every shell script file in an entire directory tree." \
		" A shell script file is considered to be any file with the extension .sh REGARDLESS OF ITS CONTENTS." \
		" If the user has used this extension for other file types, then the syntax check may fail." \
		" The syntax checking will be done by the shell that is executing this file (e.g. bash on a typical Linux system)," \
		" so all the .sh files in the search path must be syntax compatible with this shell." \
		" A good description of differences between bash and bourne shells, for instance is found here: http://www.faqs.org/docs/bashman/bashref_122.html."
	echo
	echo "OPTIONS"
	echo "-h prints help and then exits; no value should be specified; all other options are ignored"
	echo
	echo "-p if supplied, then requires a value that is either the path to a single .sh file or the path to a directory;" \
		" if omitted, then the current working directory will be searched"
	echo
	echo "-R if present and if the path to be searched is a directory, then searches subdirectories too; no value should be specified"
	echo
	echo "DEPENDENCIES"
	echo "This script assumes that a suitable version of the find command will be the first one found in the user's PATH." \
		" This find command must support the -maxdepth, -type, and -iname options." \
		" GNU find works, but other variants may not."
	echo
	echo "+++ BUGS"
	echo "--the code is broken if any element in a path being searched contains leading whitespace"
}


#----------main


# Parse command-line options:
while getopts 'hp:R' option
do
	case "$option" in
	"h")
		opt_h="true"
		;;
	"p")
		opt_p="$OPTARG"
		;;
	"R")
		opt_R=""
		;;
	?)
		# Note: if supply an invalid option, getopts should have already printed an error line by this point, so no need to duplicate it
		echo
		printHelp
		exit 1
		;;
	esac
done


# Print help and then exit if -h is a command-line option:
if [ $opt_h = "true" ]; then
	printHelp
	exit
fi


# Find all the .sh files and check them:
#for shFile in `find $opt_p $opt_R -type f -iname "*.sh"`	# to understand this line, execute "man find"; was inspired by this script: http://www.debianhelp.org/node/1167
#	Dropped the above line because it fails on path elements which contain whitespace; the solution below is discussed here: https://www.unix.com/shell-programming-scripting/27487-how-read-filenames-space-between.html
find $opt_p $opt_R -type f -iname "*.sh" | while read shFile
do
	( sh -n "$shFile" )	# sh -n will merely syntax check (never execute) shFile, and will print to stdout any errors found; encase in parentheses to execute in a subshell so that if a syntax error is found, which causes sh -n to have a non-zero exit code, then this parent shell does not see it; critical since it will stop executing (due to the set -e line at the top of the file) otherwise
done

fabulous2

View Public Profile for fabulous2

Find all posts by fabulous2

12-20-2007

Administrator Emeritus

4,463, 16

Join Date: Mar 2005

Last Activity: 29 March 2012, 7:00 PM EDT

Location: Ireland

Posts: 4,463

Thanks Given: 0

Thanked 16 Times in 14 Posts

ok, don't think I'm beating up on you here, I am not. I'm only try to pass on what experience has taught me, and wouldn't bother if I though I was wasting my time, and certainly wouldn't do it just to be confrontational.

Never make assumptions of user competence or how a user likes to customize a shell, I have seen that flawed logic break more scripts than I care to remember. Anyway experience will teach you this.

In Solaris for example you have BSD compatibility versions of some binaries in /usr/ucb, I know a number of people who like the Berkley versions of various commands and put these in the PATH before that standard ones. The output of some of these is different from the standard ones, for example 'ps'. Likewise in /usr/sfw/bin ou have a number of GNU versions of commands, while these are less problematic since they are generally a superset of POSIX versions and are usually prefixed by 'g' eg. 'gawk' and 'gtar' it is none the less not as predictable as would be desired.

You can never discount the possibility that someone has an alias or function in their environment with the same name as a standard command which they either don't know about or don't use, I have seen that many times.

You can unalias each command you use before using it and redirect error output to /dev/null, but be aware of shell functions in login shells also, and also that a true bourne shell does not support aliases so the set -e will kill the script if you do this.

Quote:

Actually, now that I think more on this, if you really think that aliases are a dangerous failure mode then you have to go to extreme lengths to cope with it. For instance, with the cs.sh script that I have presented so far, you not only need to worry about the find command being aliased, but you need to worry about every single other command in that script which could also be aliased: set, echo, while, read... Its basically impossible!

Yes, that's why you use full paths

It is much easier to port a path to a command than an entire script, especially if you define all the commands at the start of the script, and if you really want to you can use "uname -s" to determine which platform you are on and set the path for any OS you know about. In a true bourne shell you don't have to worry about aliases, in bash or korn or POSIX sh you do.

And that is not the fully story. Don't forget security and abuse. By giving the full path you make sure you know what you are running for that reason alone your script would fail a code review in many companies.

Quote:

I think that the complete command that I want (for dir = "./") is more like

Code:

find "./" -nowarn \( ! -name "./" -prune \) | grep \.sh

That -nowarn option is critical, else you always get this nasty warning message,

find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name `./'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ `./''.

But it appears that -nowarn is not a POSIX option, so we are back to square one...

No, again why bother with non-POSIX semantics when the POSIX version works fine? The nowarn in GNU find is one of those GNU options that I personally think was pointless, the same thing is simply accomplished with stderr redirection.

Code:

 
find . \( ! -name "." -prune \) -type f -name "*.[sS][hH]" | ...

or if you really want the / for some reason:

Code:

find . \( ! -name "." -prune \) -type f -name "*.[sS][hH]" 2> /dev/null | ...

Quote:

>It has little or nothing to do with laziness.
>It is more effort for the programmer to write the message

The "more effort for the programmer" is the laziness that I referred to.

That's a bit of an oxymoron

I really don't follow the logic but no matter, it's a moot point.

Quote:

I agree: decent behavior is to print both the specific error, as well as to print proper usage. The version of the script file in my next post below does this. You were also correct in saying that the proper usage should be printed out somehow (I chose to write a function to do this) then to merely store it as program comments.

better !

Quote:

For instance, on both my cygwin and linux boxes, the shell is actually bash, and its syntax checker seems to have no problem with bash (but not bourne) constructs like [[ in .sh files.

That's bad not good in a lot of ways. It will blow up on a system with a true old bourne shell (Solaris). I always try to write for the lowest common denominator if I want compatibility, however you should know on this point note that a "true" bourne shell is not POSIX and most of those festure are in the POSIX definition, so I leave that to your judgement.

Quote:

Writing a POSIX compatible find seems to be the hangup; my newbie scripting skills are not up to it.

It's not really that difficult, see above

And what it does aside from making the script portable, it makes your skillset portable, which is more important. My number one piece of advice to anyone learning scripting would be to use the standard behaviour of a utility, and learn how to do things without gnu extensions first, then add them when you are happy that you know the long way so that you are capable of writing code for a more restrictive system.

On another note about writing portable code, if you are interested, it's always a good idea to have a few different OS to try on. Cygwin will give you more or less Linux behavior so I would generally count that as only one platform. A multiboot system or VMWare with a few different OSes is always nice for testing scripts, and a general rule, if your script works on Solaris using the default implementations of commands it will port fairly easily because Solaris tends to be conservative in what the functionality commands implement, some even less than POSIX with POSIX versions being available in /usr/xpg4/bin. So my reccommendations would be to add FreeBSD and Solaris to the mix.

reborg

View Public Profile for reborg

Find all posts by reborg

12-30-2007

Registered User

30, 0

Join Date: Dec 2007

Last Activity: 1 February 2014, 10:39 PM EST

Posts: 30

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

No worries: I have never felt that you are beating me up on anything. I solicited feedback, and you generously shared your knowledge with me. I appreciate that. My delay in responding is merely because I have been so busy for so long.

Speaking of overwork, I have no more time to invest in this shell script, but I have these final points to make:

1) I now think that aliases for this script (e.g. an alias for find) are no problem at all. Reason: aliases defined on the shell before executing this script are not inherited by the script when the user runs it (at least on cygwin and linux--is this true of all unixes?). Execute

Code:

alias find
alias find="lfsdllgfsdfnsdfjskjs"
alias find

on the command line; the first line should have the output

Code:

bash: alias: find: not found

while the third should have output

Code:

alias find='lfsdllgfsdfnsdfjskjs'

Then put this as the contents of the file t.sh:

Code:

#!/bin/sh
alias find

and on the command line execute

Code:

sh t.sh

I get as its output

Code:

bash: alias: find: not found

which proves that shell script files do not inherit aliases from their parent shell, at least on my systems.

Since my cs.sh file does not use any internal aliases, either explicitly or implicitly (e.g. by sourcing .bashrc or something), it should be immune from aliases.

If it was vulnerable to aliases, then the way that I would temporarily suppress them is by encasing all commands inside single quotes (e.g. 'find'), an idea that I got from
Alias (Unix shell - Wikipedia, the free encyclopedia)

2) I have been struggling for a while now to understand why you think that hardcoding full command paths inside a script--which will require a decent percentage of users to have to open the shell file and understand it and then modify it--is easier than having the user modify their path if there are issues with, say, the find command.

I stand by my approach, particularly for who my target is with this shell script (primarily script developers, not the general user).

But I now think that I can see maybe where you are coming from: are you a sys admin at a large company? Or at least a sys admin on a machine with many different user accounts but whose system level configuration (e.g. where the commands are installed) you control? Or maybe you simply always use the same unix variant so that you reliably know what paths to use. In this case, if you are responsible for writing a shell script that has to work for all your different users, then I can see why you use hard coded paths.

3) Security:

Quote:

And that is not the fully story. Don't forget security and abuse. By giving the full path you make sure you know what you are running for that reason alone your script would fail a code review in many companies.

That sounds crazy! I use relative paths, for example, in all kinds of programming applications and they are utterly invaluable (e.g. in having a configuration that always works regardless of where the user installs my package). I cannot imagine that too many companies have a policy that anything other than full paths is a security hazard.

fabulous2

View Public Profile for fabulous2

Find all posts by fabulous2

12-30-2007

Administrator Emeritus

4,463, 16

Join Date: Mar 2005

Last Activity: 29 March 2012, 7:00 PM EDT

Location: Ireland

Posts: 4,463

Thanks Given: 0

Thanked 16 Times in 14 Posts

Quote:

Originally Posted by fabulous2

You are correct here.

Quote:

2) I have been struggling for a while now to understand why you think that hardcoding full command paths inside a script--which will require a decent percentage of users to have to open the shell file and understand it and then modify it--is easier than having the user modify their path if there are issues with, say, the find command.

Not easier, better. I am not suggesting that people randomly update the script, simply that it is ported by someone on a new platform.

Quote:

But I now think that I can see maybe where you are coming from: are you a sys admin at a large company? Or at least a sys admin on a machine with many different user accounts but whose system level configuration (e.g. where the commands are installed) you control? Or maybe you simply always use the same unix variant so that you reliably know what paths to use. In this case, if you are responsible for writing a shell script that has to work for all your different users, then I can see why you use hard coded paths.

Both a large company, and large user systems on multiple platforms. Though I have not been a true sys-admin for several years, I have been working as an enterprise product integration specialist and system architect for a number of years with responsibility for a shell code base of hundreds of scripts and countless thousands of lines of code. In theory your approach sounds easier, but in practice it does not scale well and makes code more difficult to port.

Quote:

Not at all crazy, in fact it's a pretty basic and sensible precaution, and is very common practice. The package location is one of the cases for using a defined environment variable which the user sets.

eg.

Code:

MY_PACKAGE_HOME=/foo/bar

then in the script you do something like:

Code:

if [ -z "$MY_PACKAGE_HOME" -o ! -d $MY_PACKAGE_HOME ] ; then
    echo MY_PACKAGE_HOME is not correctly set
fi

# run other_application
${MY_PACKAGE_HOME}/other_application

If you need a relative behavior you should anchor the script and work relative to the anchor point.

Let us assume that you have a simple layout like this:

Code:

mydir
  +--script.sh
  +--etc
      +--script.conf

Your script depends on script.conf for some information to do it's job.

The relative approach:

Code:

#!/bin/sh
# script.sh
#
CONF_FILE="etc/script.conf"
...

Now you have to be in the directory to run the script as ./script.sh otherwise the relative path will not work.

On the other hand if you do:

Code:

#!/bin/sh
# script.sh
#
THIS_DIR=`cd \`dirname $0\` && pwd`
CONF_FILE="${THIS_DIR}/etc/script.conf"
...

You can run the script from anywhere, and you have some degree of control over what is being run.

reborg

View Public Profile for reborg

Find all posts by reborg

Shell Programming and Scripting

.sh file syntax checking script

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Command script for checking a file existence

Discussion started by: netrom

2. Shell Programming and Scripting

Checking LB status.. stuck in script syntax of code

Discussion started by: vinil

3. Shell Programming and Scripting

File checking script need help

Discussion started by: ken002

4. Shell Programming and Scripting

Help with script checking for a file in various servers

Discussion started by: momin

5. Shell Programming and Scripting

Script check for file, alert if not there, and continue checking until file arrives

Discussion started by: markdjones82

6. Shell Programming and Scripting

Checking ksh script syntax

Discussion started by: proactiveaditya

7. UNIX for Dummies Questions & Answers

Checking file sizes in script

Discussion started by: chris01010

8. Shell Programming and Scripting

Script for checking and reporting file sizes in a directory.

Discussion started by: marconi

9. Shell Programming and Scripting

Simple file checking script

Discussion started by: _Spare_Ribs_

10. UNIX for Dummies Questions & Answers

Script syntax checking

Discussion started by: bjornrud