regular expression foreign language


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting regular expression foreign language
# 1  
Old 10-13-2009
regular expression foreign language

Hello all,
I read somewher that regular expressions work with ASCII table so when i type
Code:
grep "[a-z][a-z]*" file_name

it uses values from ACII dec97(a) to dec122(z), right ?
But if I have file containing diacritics, lets say (ordinary Slovak language characters):
Code:
marek@cepi:~$ cat diakritika 
áôúéťľúľščťžýáíéôäúú
ÁôúÉŤĽÚĽŠČŤŽÝÁÍÉôäÚÚ

marek@cepi:~$ grep -o "[a-z][a-z]*" diakritika 
áôúéťľúľščť
ýáíéôäúú
ôú
ôä

Why this regexp know diacritics? And why know only lower case and not "ž" ??? This is strange for me. Friend told me it could be something with $LANG. So my $LANG is:
Code:
marek@cepi:~$ echo $LANG
en_US.UTF-8

Also I would ask if I want uppercase file with diacritic i type:
Code:
marek@cepi:~$ cat diakritika | tr "[:lower:]" "[:upper:]"
áôúéťľúľščťžýáíéôäúú
ÁôúÉŤĽÚĽŠČŤŽÝÁÍÉôäÚÚ

why it not change lower to upper ?
Thanks a lot for reply
PS: I hope that characters display properly
# 2  
Old 10-17-2009
maybe i know why is "ž" different it is "behind" z so regex [a-z] did not match the "ž" but still many thinngs are unclear
# 3  
Old 10-17-2009
Quote:
Originally Posted by wakatana
why know only lower case
If you want uppercase as well you have to specify
Code:
grep "[a-zA-Z][a-zA-Z]*" file_name


Last edited by Scrutinizer; 10-17-2009 at 11:09 AM..
# 4  
Old 10-17-2009
What you are discusiing is called a collating sequence. Do a web search for "POSIX collating sequence" for further information.

To be language-neutral, your example would be written as
Code:
grep "[[:alpha:]][[:alpha:]]*"  filename

or if you only want lowercase characters
Code:
grep "[[:lower:]][[:lower:]]*"  filename

# 5  
Old 10-18-2009
Quote:
Originally Posted by wakatana
Hello all,
I read somewher that regular expressions work with ASCII table so when i type
Code:
grep "[a-z][a-z]*" file_name

it uses values from ACII dec97(a) to dec122(z), right ?
But if I have file containing diacritics, lets say (ordinary Slovak language characters):
Code:
marek@cepi:~$ cat diakritika 
áôúéťľúľščťžýáíéôäúú
ÁôúÉŤĽÚĽŠČŤŽÝÁÍÉôäÚÚ

marek@cepi:~$ grep -o "[a-z][a-z]*" diakritika 
áôúéťľúľščť
ýáíéôäúú
ôú
ôä

Why this regexp know diacritics? And why know only lower case and not "ž" ??? This is strange for me. Friend told me it could be something with $LANG.

ž comes after z, so it is not in the range you gave.
Quote:
So my $LANG is:
Code:
marek@cepi:~$ echo $LANG
en_US.UTF-8

Also I would ask if I want uppercase file with diacritic i type:
Code:
marek@cepi:~$ cat diakritika | tr "[:lower:]" "[:upper:]"
áôúéťľúľščťžýáíéôäúú
ÁôúÉŤĽÚĽŠČŤŽÝÁÍÉôäÚÚ

why it not change lower to upper ?

Probably because those characters are not part of the en_US.UTF-8 definition of [:lower:] and [:upper:].
# 6  
Old 10-18-2009
Thank you for reply. Is there an option how to convert lowercase diacritics to uppercase ?
# 7  
Old 10-18-2009

Use a locale in which they are defined. (I guess; I haven't tried it.)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Regular expression help

Hi, I am quite knew to scripting and I am trying to get a regular expression to work to check that a user enters a valid version number such as 1 or 1.1 or 12.3 etc. I dont seem to be able to get it to work as it picks up versions such as 1.......2. I only want it to work with a single dot.... (12 Replies)
Discussion started by: frodo61
12 Replies

2. UNIX for Advanced & Expert Users

sed: -e expression #1, char 0: no previous regular expression

Hello All, I'm trying to extract the lines between two consecutive elements of an array from a file. My array looks like: problem_arr=(PRS111 PRS213 PRS234) j=0 while } ] do k=`expr $j + 1` sed -n "/${problem_arr}/,/${problem_arr}/p" problemid.txt ---some operation goes... (11 Replies)
Discussion started by: InduInduIndu
11 Replies

3. Shell Programming and Scripting

regular expression and ls

hello using KSH shell i have those files in a folder FILE01 FILE1 FILE02 FILE2 FILE001 FILE0001 in a script i would like to list all the files through regular expressions i tried this ls FILE+* but i receive this error ls: 0653-341 The file FILE+* does not exist. what is... (2 Replies)
Discussion started by: ade05fr
2 Replies

4. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

5. Shell Programming and Scripting

Day of the week or Month in a foreign language

Hey guys, i'm a very new shell script user. I've been looking everywhere for a proper script to display the day of the week or the month, accurately, in a foreign language of my choosing. Something where i can just type in the appropriate word in a foreign language in the script and get the... (2 Replies)
Discussion started by: ibizagreg
2 Replies

6. Shell Programming and Scripting

Integer expression expected: with regular expression

CA_RELEASE has a value of 6. I need to check if that this is a numeric value. if not error. source $CA_VERSION_DATA if * ] then echo "CA_RELESE $CA_RELEASE is invalid" exit -1 fi + source /etc/ncgl/ca_version_data ++ CA_PRODUCT_ID=samxts ++ CA_RELEASE=6 ++ CA_WEEK_NO=7 ++... (3 Replies)
Discussion started by: ketkee1985
3 Replies

7. UNIX for Dummies Questions & Answers

ls with regular expression

I currently list and sort all the files in a directory which begin with an Upper Case C and end with the extension '#finished#'. Here is the command I use: ls -tr $currentDir/*.#finished# what i need to do now is list all the files in a directory that begin with upper case C and end... (3 Replies)
Discussion started by: rkgudde
3 Replies

8. Shell Programming and Scripting

Regular Expression

Hi, In Perl What should be the regular expression for 1-23. I tried with |1|2. But it is not working. I have a code snippet like below $state = 0; while( $state != 1 ) { $hour=<STDIN>; if ( $hour =~ /|1|2/) { print "Integer within range.\n"; $state = 1;... (3 Replies)
Discussion started by: siba.s.nayak
3 Replies

9. Linux

Regular expression to extract "y" from "abc/x.y.z" .... i need regular expression

Regular expression to extract "y" from "abc/x.y.z" (2 Replies)
Discussion started by: rag84dec
2 Replies

10. Shell Programming and Scripting

Regular Expression + Aritmetical Expression

Is it possible to combine a regular expression with a aritmetical expression? For example, taking a 8-numbers caracter sequece and casting each output of a grep, comparing to a constant. THX! (2 Replies)
Discussion started by: Z0mby
2 Replies
Login or Register to Ask a Question