Visit Our UNIX and Linux User Community


regular expression foreign language


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting regular expression foreign language
# 1  
Old 10-13-2009
regular expression foreign language

Hello all,
I read somewher that regular expressions work with ASCII table so when i type
Code:
grep "[a-z][a-z]*" file_name

it uses values from ACII dec97(a) to dec122(z), right ?
But if I have file containing diacritics, lets say (ordinary Slovak language characters):
Code:
marek@cepi:~$ cat diakritika 
ťľľščťž
ŤĽĽŠČŤŽ

marek@cepi:~$ grep -o "[a-z][a-z]*" diakritika 
ťľľščť



Why this regexp know diacritics? And why know only lower case and not "ž" ??? This is strange for me. Friend told me it could be something with $LANG. So my $LANG is:
Code:
marek@cepi:~$ echo $LANG
en_US.UTF-8

Also I would ask if I want uppercase file with diacritic i type:
Code:
marek@cepi:~$ cat diakritika | tr "[:lower:]" "[:upper:]"
ťľľščťž
ŤĽĽŠČŤŽ

why it not change lower to upper ?
Thanks a lot for reply
PS: I hope that characters display properly
# 2  
Old 10-17-2009
maybe i know why is "ž" different it is "behind" z so regex [a-z] did not match the "ž" but still many thinngs are unclear
# 3  
Old 10-17-2009
Quote:
Originally Posted by wakatana
why know only lower case
If you want uppercase as well you have to specify
Code:
grep "[a-zA-Z][a-zA-Z]*" file_name


Last edited by Scrutinizer; 10-17-2009 at 12:09 PM..
# 4  
Old 10-17-2009
What you are discusiing is called a collating sequence. Do a web search for "POSIX collating sequence" for further information.

To be language-neutral, your example would be written as
Code:
grep "[[:alpha:]][[:alpha:]]*"  filename

or if you only want lowercase characters
Code:
grep "[[:lower:]][[:lower:]]*"  filename

# 5  
Old 10-18-2009
Quote:
Originally Posted by wakatana
Hello all,
I read somewher that regular expressions work with ASCII table so when i type
Code:
grep "[a-z][a-z]*" file_name

it uses values from ACII dec97(a) to dec122(z), right ?
But if I have file containing diacritics, lets say (ordinary Slovak language characters):
Code:
marek@cepi:~$ cat diakritika 
ťľľčť
ŤĽĽČŤ

marek@cepi:~$ grep -o "[a-z][a-z]*" diakritika 
ťľľčť



Why this regexp know diacritics? And why know only lower case and not "" ??? This is strange for me. Friend told me it could be something with $LANG.

comes after z, so it is not in the range you gave.
Quote:
So my $LANG is:
Code:
marek@cepi:~$ echo $LANG
en_US.UTF-8

Also I would ask if I want uppercase file with diacritic i type:
Code:
marek@cepi:~$ cat diakritika | tr "[:lower:]" "[:upper:]"
ťľľčť
ŤĽĽČŤ

why it not change lower to upper ?

Probably because those characters are not part of the en_US.UTF-8 definition of [:lower:] and [:upper:].
# 6  
Old 10-18-2009
Thank you for reply. Is there an option how to convert lowercase diacritics to uppercase ?
# 7  
Old 10-18-2009

Use a locale in which they are defined. (I guess; I haven't tried it.)

Previous Thread | Next Thread
Test Your Knowledge in Computers #818
Difficulty: Medium
In CSS, E > F matches an F element child of an E element.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Regular expression help

Hi, I am quite knew to scripting and I am trying to get a regular expression to work to check that a user enters a valid version number such as 1 or 1.1 or 12.3 etc. I dont seem to be able to get it to work as it picks up versions such as 1.......2. I only want it to work with a single dot.... (12 Replies)
Discussion started by: frodo61
12 Replies

2. UNIX for Advanced & Expert Users

sed: -e expression #1, char 0: no previous regular expression

Hello All, I'm trying to extract the lines between two consecutive elements of an array from a file. My array looks like: problem_arr=(PRS111 PRS213 PRS234) j=0 while } ] do k=`expr $j + 1` sed -n "/${problem_arr}/,/${problem_arr}/p" problemid.txt ---some operation goes... (11 Replies)
Discussion started by: InduInduIndu
11 Replies

3. Shell Programming and Scripting

regular expression and ls

hello using KSH shell i have those files in a folder FILE01 FILE1 FILE02 FILE2 FILE001 FILE0001 in a script i would like to list all the files through regular expressions i tried this ls FILE+* but i receive this error ls: 0653-341 The file FILE+* does not exist. what is... (2 Replies)
Discussion started by: ade05fr
2 Replies

4. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

5. Shell Programming and Scripting

Day of the week or Month in a foreign language

Hey guys, i'm a very new shell script user. I've been looking everywhere for a proper script to display the day of the week or the month, accurately, in a foreign language of my choosing. Something where i can just type in the appropriate word in a foreign language in the script and get the... (2 Replies)
Discussion started by: ibizagreg
2 Replies

6. Shell Programming and Scripting

Integer expression expected: with regular expression

CA_RELEASE has a value of 6. I need to check if that this is a numeric value. if not error. source $CA_VERSION_DATA if * ] then echo "CA_RELESE $CA_RELEASE is invalid" exit -1 fi + source /etc/ncgl/ca_version_data ++ CA_PRODUCT_ID=samxts ++ CA_RELEASE=6 ++ CA_WEEK_NO=7 ++... (3 Replies)
Discussion started by: ketkee1985
3 Replies

7. UNIX for Dummies Questions & Answers

ls with regular expression

I currently list and sort all the files in a directory which begin with an Upper Case C and end with the extension '#finished#'. Here is the command I use: ls -tr $currentDir/*.#finished# what i need to do now is list all the files in a directory that begin with upper case C and end... (3 Replies)
Discussion started by: rkgudde
3 Replies

8. Shell Programming and Scripting

Regular Expression

Hi, In Perl What should be the regular expression for 1-23. I tried with |1|2. But it is not working. I have a code snippet like below $state = 0; while( $state != 1 ) { $hour=<STDIN>; if ( $hour =~ /|1|2/) { print "Integer within range.\n"; $state = 1;... (3 Replies)
Discussion started by: siba.s.nayak
3 Replies

9. Linux

Regular expression to extract "y" from "abc/x.y.z" .... i need regular expression

Regular expression to extract "y" from "abc/x.y.z" (2 Replies)
Discussion started by: rag84dec
2 Replies

10. Shell Programming and Scripting

Regular Expression + Aritmetical Expression

Is it possible to combine a regular expression with a aritmetical expression? For example, taking a 8-numbers caracter sequece and casting each output of a grep, comparing to a constant. THX! (2 Replies)
Discussion started by: Z0mby
2 Replies

Featured Tech Videos