Hello all,
I read somewher that regular expressions work with ASCII table so when i type
Code:
grep "[a-z][a-z]*" file_name
it uses values from ACII dec97(a) to dec122(z), right ?
But if I have file containing diacritics, lets say (ordinary Slovak language characters):
Code:
marek@cepi:~$ cat diakritika
áôúéťľúľščťžýáíéôäúú
ÁôúÉŤĽÚĽŠČŤŽÝÁÍÉôäÚÚ
marek@cepi:~$ grep -o "[a-z][a-z]*" diakritika
áôúéťľúľščť
ýáíéôäúú
ôú
ôä
Why this regexp know diacritics? And why know only lower case and not "ž" ??? This is strange for me. Friend told me it could be something with $LANG. So my $LANG is:
Code:
marek@cepi:~$ echo $LANG
en_US.UTF-8
Also I would ask if I want uppercase file with diacritic i type:
Code:
marek@cepi:~$ cat diakritika | tr "[:lower:]" "[:upper:]"
áôúéťľúľščťžýáíéôäúú
ÁôúÉŤĽÚĽŠČŤŽÝÁÍÉôäÚÚ
why it not change lower to upper ?
Thanks a lot for reply
PS: I hope that characters display properly