Question about REGEX Patterns and Case Sensitivity?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Question about REGEX Patterns and Case Sensitivity?
# 1  
Old 10-15-2012
Question about REGEX Patterns and Case Sensitivity?

Hello All,

I'm in the middle of a script and I'm doing some checks with REGEX (i.e. using the '[[' ).

I'm wondering if this example is correct or if its just a coincidence. But I thought that if I did not use the "shopt -s nocasematch"
that at least the first one should print "FALSE" but it prints "TRUE"..?

For Example:
Code:
#!/bin/bash

MY_VAR="HELLO"

### This prints "TRUE"
PATTERN_1="^[a-z]*"
if [[ $MY_VAR =~ $PATTERN_1 ]]
 then
    echo "TRUE"
else
    echo "FALSE"
fi

echo "-------------------------"

### This prints "FALSE"
PATTERN_2="^[A-z]*"
if [[ $MY_VAR =~ $PATTERN_2 ]]
 then
    echo "TRUE"
else
    echo "FALSE"
fi

echo "-------------------------"

### This prints "TRUE"
PATTERN_3="[a-Z]*"
if [[ $MY_VAR =~ $PATTERN_3 ]]
 then
    echo "TRUE"
else
    echo "FALSE"
fi



The OUTPUT:
Code:
TRUE
-------------------------
FALSE
-------------------------
TRUE

I remember being told before that the pattern "[A-z]" is NOT the same as doing "[A-Za-z]" like it would be in Perl...
So I'm wondering why the pattern "[a-Z]", which is the last if statement in the code above, returns "TRUE", when
the 2nd if statement above "[A-z]" returns "FALSE"...?

I tried changing the Variable "$MY_VAR" from all upper case to all lowercase, but I still get the same output...
And lastly, if I include the "shopt -s nocasematch" they all return "TRUE"...


If anyone has any thoughts/suggestions that would be great!

FYI:
Bash Version:
4.1.10


Thanks in Advance,
Matt
# 2  
Old 10-15-2012
I tested you code in bash version(4.1.10(4)) and with shell option(nocasematch) set or not set(i.e. shopt -p) it prints 'TRUE' and the reason is, at least the way i understand it is because the '*' means 0 or more matches.
Anyway, I would recommend using one of the POSIX Character Classes:
Code:
[[:alpha:]] matches alphabetic characters. This is equivalent to A-Za-z.
[[:lower:]] matches lowercase alphabetic characters. This is equivalent to a-z.
[[:upper:]] matches uppercase alphabetic characters. This is equivalent to A-Z.

This User Gave Thanks to spacebar For This Post:
# 3  
Old 10-15-2012
Assuming you're running on a system with a code set based on ASCII (i.e., not an IBM or Amdahl [if you remember them] mainframe); then[a-z]is a range expression that matches the 26 lowercase alphabetic characters; [A-z]is a range expression that matches the 52 uppercase and lowercase alphabetic characters and the \,^,_, and`characters; and[a-Z]is a range expression that is either treated as an error or as a request to match the empty set (depending on your implementation) becauseafollowsZin ASCII.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 10-16-2012
Hey Spacebar, thanks for the reply...

Sorry, I probably should have mentioned what I'm trying to do. Duhh, sorry about that...
Basically, I'm trying to "verify" some user input in the script. The user should enter some text. Then I check that text in the script to
make sure that the user's input "BEGINS" with an ALL lowercase string. I'll give the "[:lower:]" Character Class a try.
Maybe that will work...



Hey Don Cragun, thanks for your reply.

Is this the info your talking about, for what character encoding I'm using..? Also, the second one below I ran the "file" command
on one of my 'test' scripts to see what its encoding was...
Code:
# echo $LANG
en_US.utf8

# file -bi test_bashScript
text/x-shellscript; charset=us-ascii

Also, your saying the "[A-z]" range should work? I thought that everytime I tried using that, it would always, no matter the input,
would return "true" or "False", I forget exactly what the return value was. But I do remember that it always had the same
result everytime...



Basically, I just want to make sure that the entire "first" string that the user enters is in all lowercase...

And I'm just VERY confused why if the input string is "HELLO" (all uppercase) and the following test (below) is returning TRUE...??
Code:
#!/bin/bash

MY_VAR="HELLO"

### This pattern SHOULD match a string that begins with ONLY "lowercase letters", zero or more times...
PATTERN_1="^[a-z]*"

### This prints "TRUE"
if [[ $MY_VAR =~ $PATTERN_1 ]]
 then
    echo "TRUE"
else
    echo "FALSE"
fi

Any idea why I'm getting "TRUE" when the input is ALL uppercase letters..?


Thanks Again,
Matt
# 5  
Old 10-16-2012
I think shell patterns are anchored by default.
Try with:
Code:
PATTERN_1="[[:lower:]]*"

and
Code:
if [[ $MY_VAR == $PATTERN_1 ]]


Last edited by elixir_sinari; 10-16-2012 at 11:40 AM..
This User Gave Thanks to elixir_sinari For This Post:
# 6  
Old 10-16-2012
Hey elixir_sinari, thanks for the reply...

I think the reason I couldn't get that "[:lower:]" character class to work was because I didn't enclose it in another set of square
brackets... Seems to work to a degree..



I'm just still baffled why the pattern "[a-z]*" matches the string "HELLO" when they are ALL uppercase.... Smilie


Anyway, thanks for the suggestion...

Thanks Again,
Matt
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Regarding a GREAT site for understanding and Visualizing regex patterns.

Hello All, While googling on regex I came across a site named Regulex Regulex:JavaScript Regular Expression Visualizer I have written a simple regex ^(a|b|c)(*)@(.*) and could see its visualization; one could export it too, following is the screen shot. ... (3 Replies)
Discussion started by: RavinderSingh13
3 Replies

2. Shell Programming and Scripting

Convert text between exact matching patterns to Title case

Hi Folks, I have a large text file with multiple similar patterns on each line like: blank">PATTERN1 some word PATTERN2 title=">PATTERN1 some word PATTERN2 blank">PATTERN1 another word PATTERN2 title=">PATTERN1 another word PATTERN2 blank">PATTERN1 one more time PATTERN2 title=">PATTERN1... (10 Replies)
Discussion started by: martinsmith
10 Replies

3. Shell Programming and Scripting

Regex patterns

can someone please confirm for me if i'm right: the pattern: ORA-0*(600?|7445|4) can someone give me an idea of all the entries the pattern above will grab from a database log file? is it looking for the following strings?: ORA-0600 ORA-7445 4) (2 Replies)
Discussion started by: SkySmart
2 Replies

4. Shell Programming and Scripting

Search by patterns case

42 network read failed sv1 sv23 sv4 sv11 sv23 sv5 sv 7 48 client hostname could not be found sv21 sv78 sv19 sv22 sv111 sv203 sv5 sv 33 49 client did not start sv1 sv21 54 timed out connecting to client sv2 sv4 sv12 above is my file , I'd like to use a script to list all name... (5 Replies)
Discussion started by: Sara_84
5 Replies

5. UNIX for Dummies Questions & Answers

Is there a way to ignore CAPS or case sensitivity?

If I'm using a program that is expecting certain filenames and directories to be all CAPS, isn't there a way to ignore this in linux/cshell scripting? I.e., similiar to ignoring spaces with " (i.e., directory is directory 1, can ignore by typing "directory 1".) ?? (2 Replies)
Discussion started by: rebazon
2 Replies

6. Shell Programming and Scripting

Capture values using multiple regex patterns

I have to read the file, in each line of file i need to get 2 values using more than one search pattern. ex: <0112 02:12:20 def > /some string/some string||some string||124 i donot have same delimiter in the line, I have to read '0112 02:12:20' which is timestamp, and last field '124' which is... (4 Replies)
Discussion started by: adars1
4 Replies

7. Shell Programming and Scripting

[BASH] recognise new line regex in case statement

Hi, I'm trying to write a routine to parse a file that contains data that will be read into arrays. The file is composed of labels to identify data types and arbitrary lines of data with the usual remarks and empty new lines as is common with config files. The initial pass is built as so:... (3 Replies)
Discussion started by: ASGR
3 Replies

8. Shell Programming and Scripting

bash regex =~ case insensetive, possible?

It can get very annoying that bash regex =~ is case-sensetive, is there a way to set it to be case-insensetive? if ]; then echo match else echo no match fi (8 Replies)
Discussion started by: TehOne
8 Replies

9. UNIX for Dummies Questions & Answers

anchoring regex using case and ksh

Outside this process I built a file containing snmp response filtering for hostname, model type and ios version. I want to get a count across the network of those devices running 11.x code, 12.0 mainline, 12.0 T train and above, 12.1 and above and OS levels. This works ok .. but its cheap... (2 Replies)
Discussion started by: popeye
2 Replies

10. UNIX for Dummies Questions & Answers

Case sensitivity

Is there any way of stopping UNIX from being case sensitive? (2 Replies)
Discussion started by: Taveirne
2 Replies
Login or Register to Ask a Question