Problem with tr command


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Problem with tr command
# 8  
Old 06-08-2012
Here we go. The reason to use quotes round the range and and example of tr misbehaving on HP-UX if there are single character filenames in the current working directory.

Code:
echo "NEE"|tr [A-Z] [a-z]
nee
cd ..
echo "NEE"|tr [A-Z] [a-z]
tr: The combination of options and String parameters is not legal.
Usage: tr [ -c | -cds | -cs | -ds | -s ] [-A] String1 String2
       tr [ -cd | -cs | -d | -s ] [-A] String1
ls -lad ?
-rw-r--r--   1 root       sys              0 Jun  7 11:58 a
-rw-r--r--   1 root       sys              0 Jun  7 11:58 b
-rw-r--r--   1 root       sys              0 Jun  7 11:58 c
echo "NEE"|tr '[A-Z]' '[a-z]'
nee
echo "NEE"|tr 'A-Z' 'a-z'
nee
# And finally - protecting the square brackets from the Shell (which is the same as using quotes).
echo "NEE"|tr \[A-Z\] \[a-z\]
nee

P.s. My post yesterday was bad because I just happened to try the command in a directory containing single character filenames! However the HP-UX man tr does show ranges complete with square brackets and surrounded by double quote characters.

Last edited by methyl; 06-08-2012 at 04:38 PM.. Reason: correct some ambiguity
# 9  
Old 06-08-2012
Quote:
Originally Posted by methyl
However the HP-UX man tr does show ranges complete with square brackets and surrounded by double quote characters.
I was looking at the opengroup's tr documentation and indeed it mentions that System V tr did use square-brackets to delimit ranges; BSD systems did not.

The standard chose the BSD syntax as the lesser of two evils (most SysV scripts would continue to work fine, as opposed to breaking every BSD script using ranges).

More detailed info in the RATIONALE @ http://pubs.opengroup.org/onlinepubs...lities/tr.html

While the range syntax just discussed is specified clearly, the source of your syntax error is the result of undefined behavior. To do its job, tr requires the second of two character strings to be at least as long as the first. When the second character string is shorter (after expanding ranges, character classes, and repetition operations), BSD-ish tr pads the second string by repeating its final character until the string lengths are equal. This padding behavior does not allow the situation which triggered the syntax error to occur. SysV-ish tr does not pad and errors out.

Padding is discussed in the APPLICATION USAGE section of the opengroup tr manual page linked above.

I have no experience with HP-UX. While the results of your tr invocations seem to indicate that your tr is POSIX-compliant, the HP-UX tr manual I consulted states that ranges can be specified with and without brackets.

From http://h20000.www2.hp.com/bc/docs/su.../c02273397.pdf
Quote:
Originally Posted by HP-UX 11i Version 3: September 2010 tr manual
c1-c2 or
[c1-c2]
Stands for the range of collating elements c1 through c2, inclusive, as defined by the
current setting of the LC_COLLATE locale category.
If that's accurate, then it cannot be compliant as those are not equivalent expressions per the standard.

Expected result:
Code:
$ echo '[abc]' | tr '[a-c]' '[.*]'
.....
$ echo '[abc]' | tr 'a-c' '[.*]'
[...]

I would be curious to know the result of those commands on your system.

Regards,
Alister
# 10  
Old 06-08-2012
@alister
Code:
echo '[abc]' | tr '[a-c]' '[.*]'
[...]
echo '[abc]' | tr 'a-c' '[.*]'
[...]

I can get the square brackets to translate with:
Code:
echo '[abc]' | tr '[a-c]\[\]' '[.*]'
.....

I think that this proves that square brackets are special in the syntax of tr range specifications. It is also the syntax I have used for umpteen years on assorted versions of unix. The slight variation on a Linux or BSD system is no surprise.

Finally this tr -d illustrates what I mean:
Code:
echo "[abc]" | tr -d '[a-c]'
[]

I use tr -d frequently in numeric or alphabetic data validation.
What you you get on a Linux system with that command?

Last edited by methyl; 06-08-2012 at 12:33 PM..
These 2 Users Gave Thanks to methyl For This Post:
# 11  
Old 06-08-2012
Based on your results, HP-UX tr is not POSIX-compliant in ways that have been part of the standard for at least 15 years now (perhaps 20). I do not say this pejoratively; it's merely an observation.

I could not find IEEE Std 1003.2-1992 online, which I believe is the first standard to include the utilities (IEEE Std 1003.1-1988 only covered core system services).

The Single UNIX ® Specification, Version 2
Copyright © 1997 The Open Group
tr manual page
http://pubs.opengroup.org/onlinepubs...lities/tr.html


As I said before, I have no experience with HP-UX nor do I know what it aspires to be.

Perhaps backwards compatibility is most important to HP and its userbase. If that's the case, then it was a mistake to add support for the POSIX/BSD range syntax. a-c in historical SysV tr means three characters, a, -, and c; it's equivalent to ac-.

If, however, HP endeavours to be POSIX-compliant, then your results are unexpected and erroneous; scripts that are compliant and work as expected on compliant systems can fail on HP-UX.


Quote:
Originally Posted by methyl
Code:
echo '[abc]' | tr '[a-c]' '[.*]'
[...]

That's the expected result for historical SysV behavior, but it's not POSIX-compliant. In a POSIX tr range expression, the brackets are not special at all; [a-c] is equivalent to ][a-c.

The POSIX-compliant result is .....



Quote:
Originally Posted by methyl
Code:
echo '[abc]' | tr '[a-c]\[\]' '[.*]'
.....

The \[ and \] escape sequences are undefined in POSIX. Their use is not portable.


Quote:
Originally Posted by methyl
Code:
echo "[abc]" | tr -d '[a-c]'
[]

I use tr -d frequently in numeric or alphabetic data validation.
What you you get on Linux system with that command?
That gives me nothing (except for the untranslated newline emitted by echo), which is the POSIX-compliant result.


Quote:
Originally Posted by methyl
I think that this proves that square brackets are special in the syntax of tr range specifications. It is also the syntax I have used for umpteen years on assorted versions of unix. The slight variation on a Linux or BSD system is no surprise.
Linux or BSD is irrelevant for the purposes of this discussion. I'm simply playing POSIX lawyer at the moment Smilie.

It appears that HP-UX tr added support for the BSD range expression syntax that POSIX long ago adopted, a-c, but it continues to accept historical SysV syntax, [a-c], treating them identically even though according to POSIX they mean different things (the latter includes two brackets which the former does not).

It's understandable that you've been using this syntax for a very long time without any obvious problems. With a SysV tr, the range expression behaves as you intend. With a POSIX or BSD tr, in most instances, where both strings consist of a range expression, the brackets are silently translated into identical characters. While the brackets were not intended to be members of the translation set, since they are translated into themselves, the result is correct (which is why the POSIX standard chose to go with the BSD syntax, less collateral damage). However, in other cases, for example, when only the first string contains a range expression and the second is a repetition expression, tr '[a-z]' '[.*]', there exists a potential for a silently erroneous result. And if the tr implementation does padding on the second string, then the repetition expression isn't required for a silent error to occur, tr '[a-z]' '.'.

methyl, I greatly appreciate your responses to my questions. I realize that these are rarely encountered corner cases, but they pique my curiosity. I often learn more than I intend as I dig into them.

Regards,
Alister
These 2 Users Gave Thanks to alister For This Post:
# 12  
Old 06-08-2012
I can only explain the issue in post #1 if the O/P has a combination of file(s) in the current working directory which causes Shell to change the tr command to something wrong but syntactically correct. Not found a way to achieve this yet.

@alister
The new(ish) thing is the Shell reacting to unquoted square brackets which is why we need to quote them in a tr.
Code:
ls ?
A  B  C

ls [A-C]
A  B  C

echo [A-C]
A B C


Edit: Noted that @alister has found a way of generating the anomoly. See subsequent posts.

Last edited by methyl; 06-08-2012 at 04:40 PM.. Reason: Noted ...
# 13  
Old 06-08-2012
Quote:
Originally Posted by methyl
I can only explain the issue in post #1 if the O/P has a combination of file(s) in the current working directory which causes Shell to change the tr command to something wrong but syntactically correct. Not found a way to achieve this yet.
It's trivial, in certain environments. I haven't used linux in a long while, but I still have a 2006 Debian install on an old laptop. It uses the en_US.UTF-8 locale by default. Here's one way to replicate OP's issue (command output bolded):

Code:
$ locale | grep COLLATE
LC_COLLATE="en_US.UTF-8"
$ mkdir test
$ cd test
$ set -x
$ echo NEE | tr [A-Z] [a-z]
+ tr '[A-Z]' '[a-z]'
+ echo NEE
nee
$ touch F
+ touch F
$ echo NEE | tr [A-Z] [a-z]
+ tr F F
+ echo NEE
NEE

Note that both [A-Z] and [a-z] matched F, because the locale's collation sequence interleaves uppercase and lowercase letters, including F in both sh glob patterns.

Since many implementations will simply ignore excess characters in the second string, they can be used to reproduce the error even when using a locale which does not interleave upper and lower case letters. Continuing where we left off, in the same working directory and environment:
Code:
$ LC_COLLATE=C
+ LC_COLLATE=C
$ echo NEE | tr [A-Z] [a-z]
+ tr F '[a-z]'
+ echo NEE
NEE

If your locale's collation isn't interleaved and if your tr implementation will not accept a longer second set of characters, then you will not be able to reproduce either result.

Regards,
Alister

Last edited by alister; 06-08-2012 at 02:42 PM.. Reason: simplified command sequence
# 14  
Old 06-08-2012
Quote:
Originally Posted by ddreggors
Admittedly, I use tr less often than say grep, awk, or sed... but enough that this is valuable to know.
It has nothing to do with tr. It'd happen anywhere you use an unquoted [a-z].

Code:
$ touch a b c d e f g h i j k l m n o p q r s t u v w x y z
$ echo [a-z]
a b c d e f g h i j k l m n o p q r s t u v w x y z
$

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

I'm facing problem with rpm command, when running the command and appears this error:

exec(): 0509-036 Cannot load program /usr/opt/freeware/bin/rpm because of the following errors: 0509-022 Cannot load module /opt/freeware/lib/libintl.a(libintl.so.1). 0509-150 Dependent module /opt/freeware/lib/libiconv.a(shr4.o) could not be loaded. 0509-152 Member... (4 Replies)
Discussion started by: Ohmkar
4 Replies

2. Shell Programming and Scripting

Problem with mv command and touch command

Hi guys, first of all I would say that this is my first time I write in a Forum. I've read the "forum rules" and I hope i will respect them. I searched everywhere for the solution of my problem but I didn't find anything. Here my problem: I'm using a sap job scheduler: in a particular job... (7 Replies)
Discussion started by: Antcam
7 Replies

3. Red Hat

please help me in if command problem

Please help me this script !!!! Script ***** a=2.0.0.0 b=1.0.0.0 #if test $a -ge $b if then echo "A is Greater than B" else echo "B is Greater than A" fi i am getting following error >sh abc abc: line 6: [: 2.0.0.0: integer expression expected (8 Replies)
Discussion started by: ponmuthu
8 Replies

4. Shell Programming and Scripting

Problem with command tr

Hello, excuse me for my english, i'm a french man. I have a problem with the command tr in applescript (with the accent...in french we have a lot of accents), i have read that is a problem with the version 10.5 of leopard then i would want to download the version 10.4 of universal binary of tr... (1 Reply)
Discussion started by: protocomm
1 Replies

5. Solaris

problem with ps command

Hi all I use to run sql loader inside a script with usename and password written in syntax. Now when I use 'ps' command to know about the status of loading, it also shows username and password that I don't want to share with someone who is sitting with me. If you have any idea except encryption... (1 Reply)
Discussion started by: sanjay1979
1 Replies

6. UNIX for Dummies Questions & Answers

problem with output of find command being input to basename command...

Hi, I am triying to make sure that there exists only one file with the pattern abc* in path /path/. This directory is having many huge files. If there is only one file then I have to take its complete name only to use furter in my script. I am planning to do like this: if ; then... (2 Replies)
Discussion started by: new_learner
2 Replies

7. Shell Programming and Scripting

problem with dd command or maybe AFS problem

Hi, folks. Sorry for bothering, but maybe someone could help me please. The problem is the following: there is some script that copies files from local file system to AFS. The copying is performed with dd command. The script copies data into some AFS volumes. The problem appeared with one... (0 Replies)
Discussion started by: Anta
0 Replies

8. UNIX for Dummies Questions & Answers

Problem with ps command??

I have a problem to.... (1).List the number of unique users that has active processes. (2).List the number of active processes for each of the users in (1). (3).Determine total memory usage for each user. Help me please....thank you. Best... (2 Replies)
Discussion started by: robocup
2 Replies

9. UNIX for Dummies Questions & Answers

problem with who command

Sorry for my poor english. Unix is SCO ODT ver 3.0 Mine problem is : when I login via some terminal emulator and type : who am i I see information like this : username ttyp02 Feb 28 09:53 after logoff and type who command (from some other terminal) I see that ttyp02 is still... (2 Replies)
Discussion started by: bane_yu
2 Replies
Login or Register to Ask a Question