Equivalence classes don't work


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Equivalence classes don't work
# 1  
Equivalence classes don't work

Hello:
I can't get equivalence classes to work in globs or when passing them to tr. If I understood correctly, [=e=] matches e, é, è, ê, etc. But when using them with utilities like tr they don't work. Here's an example found in the POSIX standard:
Quote:
This example uses an equivalence class to identify accented variants of the base character 'e' in file1, which are stripped of diacritical marks and written to file2.

Code:
tr "[=e=]" "[e*]" <file1 >file2

I decided to create the aforementioned files in order show the results. Here's the contents of file1:
Code:
Estrés
Miraré

And these are the results in a GNU/Linux and a Solaris machine:
Code:
$ uname -a
Linux sigma 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux

$ locale
LANG=es_ES.UTF-8
LANGUAGE=
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=

$ tr "[=e=]" "[e*]" <file1 >file2

$ cat file2
Estrés
Miraré

Code:
$ uname -a
SunOS solaris 5.11 11.3 i86pc i386 i86pc

$ locale
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_ALL=

$ tr "[=e=]" "[e*]" <file1 >file2

$ cat file2
Estrés
Miraré

Why aren't the accented e's replaced?

GNU tr doesn't support multi-byte characters, but the Solaris implementation does:
Code:
$ printf 'Estrés\n' | tr '[:lower:]' '[:upper:]'
ESTRÉS

So I don't know why it's failing on Solaris. Am I using equivalence classes correctly?
Thanks in advance.
# 2  
Try the utils from /usr/xpg4/bin on Solaris...
# 3  
Quote:
Originally Posted by vgersh99
Try the utils from /usr/xpg4/bin on Solaris...
I've tried with /usr/xpg4/bin/tr, /usr/xpg6/bin/tr and even changing the shell to /usr/xpg4/bin/sh, but none of them worked.

Oddly, the example I provided appears in the examples section of the tr manpage in Solaris...

Could it be that whoever made the Spanish locale in these systems didn't define any equivalence class?

Last edited by Cacializ; 12-12-2019 at 06:46 PM..
# 4  
In UTF-8 é should evaluate to (U+117).
There should be a command called localedef.
There also should be a Spanish UTF-8 locale, you are calling it correctly.

Please post the output of this, which lists classes
Code:
for class in $(
    locale -v LC_CTYPE | 
    sed 's/combin.*//;s/;/\n/g;q'
) ; do 
    printf "\n\t%s\n\n" $class
 done

If you get correct output, then character classes exist correctly in your locale. You may need to set the environment variable POSIXLY_CORRECT on Linux.
# 5  
Quote:
Originally Posted by jim mcnamara
In UTF-8 é should evaluate to (U+117).
There should be a command called localedef.
There also should be a Spanish UTF-8 locale, you are calling it correctly.

Please post the output of this, which lists classes
Code:
for class in $(
    locale -v LC_CTYPE | 
    sed 's/combin.*//;s/;/\n/g;q'
) ; do 
    printf "\n\t%s\n\n" $class
 done

If you get correct output, then character classes exist correctly in your locale. You may need to set the environment variable POSIXLY_CORRECT on Linux.
Here's the output shown in Debian:
Code:
$ for class in $(
>     locale -v LC_CTYPE | 
>     sed 's/combin.*//;s/;/\n/g;q'
> ) ; do 
>     printf "\n\t%s\n\n" $class
>  done

    upper


    lower


    alpha


    digit


    xdigit


    space


    print


    graph


    blank


    cntrl


    punct


    alnum

I don't see any equivalence classes, just character classes. So it means there are none defined in the locale, right?
# 6  
I was not clear. You thought your locale was messed up somehow, so I started at the beginning to debug it.
Looks okay. Next, tr has problems with equivlence classes
Code:
[aªáàâãäå]

This is the long form of an equivalence class. Try it (use whatever letter is handy)
Code:
echo "aªáàâãäå" | sed 's/[aªáàâãäå...]/a/g'

On Linux this fails for me:
Code:
$ echo "aªáàâãäå" | sed 's/[=a=]/x/g'
xªáàâãäå

The tr man page I have:
Quote:
Equivalence classes

The syntax [=C=] expands to all of the characters that are equivalent to C, in no particular order. Equivalence classes are a relatively recent invention intended to support non-English alphabets. But there seems to be no standard way to define them or determine their contents. Therefore, they are not fully implemented in GNU 'tr'; each character's equivalence class consists only of that character, which is of no particular use,
Try sed and use full classes to get past GNU problems. For Solaris I have no good answers, my home version is Solaris 9, and it is not POSIX compliant.
This User Gave Thanks to jim mcnamara For This Post:
# 7  
On Solaris 10, I tried the following, using the POSIX compliant utilities which are in /usr/xpg[46]/bin:

Code:
$ export PATH=/usr/xpg6/bin:/usr/xpg4/bin:$PATH
$ printf "%s\n" Estrés Miraré http://ën.wikipedia.org | LC_CTYPE=es_MX.UTF-8 LC_COLLATE=es_MX.UTF-8 tr '[=ë=]' x
Estrés
Miraré
http://xn.wikipedia.org
$ printf "%s\n" Estrés Miraré http://ën.wikipedia.org | LC_CTYPE=es_MX.UTF-8 LC_COLLATE=es_MX.UTF-8 sed 's/[[=ë=]]/x/g'
xstrxs
Mirarx
http://xn.wikipxdia.org

So tr did not work, but sed did

On Linux I had the same experience, but tr also gave an error message, so it appears it only uses single byte characters and it does not understand equivalence classes, but sed worked:

Code:
$ printf "%s\n" Estrés Miraré http://ën.wikipedia.org | LC_CTYPE=es_MX.UTF-8 LC_COLLATE=es_MX.UTF-8 tr '[=ë=]' x
tr: \303\253: equivalence class operand must be a single character
$ printf "%s\n" Estrés Miraré http://ën.wikipedia.org | LC_CTYPE=es_MX.UTF-8 LC_COLLATE=es_MX.UTF-8 sed 's/[[=ë=]]/x/g'
xstrxs
Mirarx
http://xn.wikipxdia.org


Last edited by Scrutinizer; 12-14-2019 at 07:33 AM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #823
Difficulty: Medium
The HTML 5 syntax is based on SGML.
True or False?

9 More Discussions You Might Find Interesting

1. Solaris

Open Terminal Don't work

Hi, I installed solaris 10 x86 on my local system. it was working fine. today when i started the system, it started up without any problem. when i tried to open the terminal it didn't open any terminal. Plz help me (0 Replies)
Discussion started by: malikshahid85
0 Replies

2. HP-UX

awk don't work in hp-ux 11.11

Hello all! I have problem in hp-ux 11.11 in awk I want to grep sar -d 2 1 only 3 column, but have error in awk in hp-ux 11.11 Example: #echo 123 234 | awk '{print $2}' 123 234 The situattions in commands bdf | awk {print $5}' some... In hp-ux 11.31 - OK! How resolve problem (15 Replies)
Discussion started by: ostapv
15 Replies

3. Programming

why printf don't work?

I use Solaris 10, I use following code: #include <signal.h> int main(void){ printf("----------testing-----------"); if(signal(SIGUSR1,sig_usr)==SIG_ERR) err_sys("can't catch SIGUSR1"); for(;;) pause(); sig_user(int signo){ ..... } when I run above code,it print nothing... (3 Replies)
Discussion started by: konvalo
3 Replies

4. Programming

why daytime don't work?

Following code is detecting solaris daytime,when I run it,I can't get any result,code is follows: #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #define BUFFSIZE 150 int main(){ ... (2 Replies)
Discussion started by: konvalo
2 Replies

5. Shell Programming and Scripting

Use variable in sed don't work.

Hi all. I have a script as below: cutmth=`TZ=CST+2160 date +%b` export cutmth echo $cutmth >> date.log sed -n "/$cutmth/$p" alert_sbdev1.log > alert_summ.log My purpose is to run through the alert_sbdev1.log and find the 1st occurence of 'Jan' and send everything after that line to... (4 Replies)
Discussion started by: ahSher
4 Replies

6. Programming

why printf() function don't go work?

I use FreeBSD,and use signal,like follows: signal(SIGHUP,sig_hup); signal(SIGIO,sig_io); when I run call following code,it can run,but I find a puzzled question,it should print some information,such as printf("execute main()") will print execute main(),but in fact,printf fuction print... (2 Replies)
Discussion started by: konvalo
2 Replies

7. UNIX for Dummies Questions & Answers

Things in tutorials that don't work.

I am thankful for this site and for the many links provided. I have been going through one of the tutorials, but as I try some things, they don't seem to work. I am wondering if there is something I need first before being able to use a tutorial (like version number (HP-UX) or how I am getting... (1 Reply)
Discussion started by: arungavali
1 Replies

8. Shell Programming and Scripting

find options don't work in script

Hi, I'm trying to write a bash script to find some files. However it seems that the find command is not behaving the same way when the script is executed as it does when executed from the command line: Script extract: #!/bin/bash ... NEW="/usr/bin/find current/applications/ -name '*jar'... (3 Replies)
Discussion started by: mattd
3 Replies

9. Post Here to Contact Site Administrators and Moderators

How come sigs don't work?

They appear to be turned on, I entered mine in. The check boxes are all checked. And yet, no sigs? (4 Replies)
Discussion started by: l008com
4 Replies

Featured Tech Videos