Regex issue with \s in character class.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regex issue with \s in character class.
# 1  
Old 01-01-2020
Regex issue with \s in character class.

Anybody have an explanation for why \s doesn't match ' ' in a character class? Here are 3 examples with the final example showing that \s in a character class (demonstrated by using egrep -o) fails:

\s works outside of class..

Code:
# echo " FOO " | egrep -o '\s[A-Z]+\s'
 FOO

Here is a sanity check using ' '...

Code:
# echo " FOO " | egrep -o '[  ][A-Z]+[  ]'
 FOO

Here I try with \s in a character class
Code:
# echo " FOO " | egrep -o '[\s][A-Z]+[\s]'
#

# 2  
Old 01-01-2020
FWIW:

Code:
macos$  echo " FOO " | grep -o -E '[[:space:]][A-Z]+[[:space:]]'
 FOO

Code:
linux# echo " FOO " | grep -o -E '[[:space:]][A-Z]+[[:space:]]'
 FOO

Sorry, I know those examples do not explain why \s is not supported.

It's a good question why. Maybe someone has the answer?
# 3  
Old 01-02-2020
Also, another FWIW:

Code:
linux# php -a
Interactive mode enabled

php > $a = ' FOO ';
php > echo preg_match('/[\s][A-Z]+[\s]/', $a,$matches);
1
php > echo $matches[0];
 FOO 
php >

so, it works in a PHP REGEX .....
# 4  
Old 01-02-2020
It is a shorthand character class that mostly expands to [ \f\t\n\v] -- it is shorthand for a character class, like \d and others

Explained here:.

Regexp Tutorial - Shorthand Character Classes has this for \s:

Quote:
\s stands for “whitespace character”¯. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a line break, or a form feed. Most flavors also include the vertical tab, with Perl (prior to version 5.18) and PCRE (prior to version 8.34) being notable exceptions. In flavors that support Unicode, \s normally includes all characters from the Unicode “separator”¯ category. Java and PCRE are exceptions once again. But JavaScript does match all Unicode whitespace with \s.
This User Gave Thanks to jim mcnamara For This Post:
# 5  
Old 01-02-2020
Quote:
Originally Posted by blackrageous
Anybody have an explanation for why \s doesn't match ' ' in a character class? Here are 3 examples with the final example showing that \s in a character class (demonstrated by using egrep -o) fails:

\s works outside of class..

Code:
# echo " FOO " | egrep -o '\s[A-Z]+\s'
 FOO


Here is a sanity check using ' '...

Code:
# echo " FOO " | egrep -o '[  ][A-Z]+[  ]'
 FOO

Here I try with \s in a character class
Code:
# echo " FOO " | egrep -o '[\s][A-Z]+[\s]'
#

The backslash is not a metacharacter in a POSIX compliant bracket expression. Therefore [\s] will match the characters \ or s instead of white space.
# 6  
Old 01-02-2020
Quote:
Originally Posted by Neo
Also, another FWIW:

Code:
linux# php -a
Interactive mode enabled

php > $a = ' FOO ';
php > echo preg_match('/[\s][A-Z]+[\s]/', $a,$matches);
1
php > echo $matches[0];
 FOO 
php >

so, it works in a PHP REGEX .....
POSIX regular expressions (BRE/ERE) and the GNU and BSD derivatives are the exception, rather than the norm, in that \ is not special/meta within a character class (bracket) expression.

In (I think all) other regex flavours \ is special in a character class (bracket) expression.


--
Shortcuts like \s are not part of POSIX BRE/ERE, but are supported as an extension in BSD and GNU implementations.

Last edited by Scrutinizer; 01-02-2020 at 10:04 AM..
# 7  
Old 01-02-2020
Interesting. I was using the linux...egrep (posix) and was just surprised when it didn't work.

--- Post updated at 19:21 ---

That seems to explain it. Thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

C++ : Base class member function not accessible from derived class

Hello All, I am a learner in C++. I was testing my inheritance knowledge with following piece of code. #include <iostream> using namespace std; class base { public : void display() { cout << "In base display()" << endl; } void display(int k) {... (2 Replies)
Discussion started by: anand.shah
2 Replies

2. Shell Programming and Scripting

Match string against character class in bash

Hello, I want to check whether string has only numeric characters. The following code doesn't work for me #!/usr/local/bin/bash if ]]; then echo "true" else echo "False" fi # ./yyy '346' False # ./yyy 'aaa' False I'm searching for solution using character classes, not regex.... (5 Replies)
Discussion started by: urello
5 Replies

3. Programming

Size of Derived class, upon virtual base class inheritance

I have the two class definition as follows. class A { public: int a; }; class B : virtual public A{ }; The size of class A is shown as 4, and size of class B is shown as 16. Why is this effect ?. (2 Replies)
Discussion started by: techmonk
2 Replies

4. Shell Programming and Scripting

Regex space character

Hi, I have following regex condition, however it does not work with different logs having same visible string.I believe it is because of some difference with space character, is it possible to make it work everywhere. Can someone suggest a better string? /BIND dn=" uid=/ Thanks. (8 Replies)
Discussion started by: susankoperna1
8 Replies

5. UNIX for Advanced & Expert Users

Get pointer for existing device class (struct class) in Linux kernel module

Hi all! I am trying to register a device in an existing device class, but I am having trouble getting the pointer to an existing class. I can create a class in a module, get the pointer to it and then use it to register the device with: *cl = class_create(THIS_MODULE, className);... (0 Replies)
Discussion started by: hdaniel@ualg.pt
0 Replies

6. Shell Programming and Scripting

Regex:search/replace but not for escaped character

Hi Input: - -- --- ---- aa-bb-cc aa--bb--cc aa---bb---cc aa----bb----cc Output: . - -. -- aa.bb.cc (7 Replies)
Discussion started by: chitech
7 Replies

7. Shell Programming and Scripting

Regex escape special character in AWK if statement

I am having issues escaping special characters in my AWK script as follows: for id in `cat file` do grep $id in file2 | awk '\ BEGIN {var=""} \ { if ( /stringwith+'|'+'50'chars/ ) { echo "do this" } else if ( /anotherString/ ) { echo "do that" } else { ... (4 Replies)
Discussion started by: purebc
4 Replies

8. Shell Programming and Scripting

perl regex issue

Hi, I find it really strange while writing a simple regex to match and print the matched string, dibyajyo@fwtest:~ #perl -e '$x = "root@rashmi>"; print "matched string:$1\n" if ($x =~ /(root@rashmi)/);' matched string:root dibyajyo@fwtest:~ #perl -e '$x = "root@rashmi>"; print... (1 Reply)
Discussion started by: rrd1986
1 Replies

9. Shell Programming and Scripting

regex to find font class

So, I need to find the instances of a certain font and remove it....so far in my testing I am using the find command with regex to find a font I want to pull out. However, I seem to be slightly stuck, and I am sure the beard stroking Unix geniuses here can help me. My example code: find... (7 Replies)
Discussion started by: tlarkin
7 Replies

10. Shell Programming and Scripting

awk and POSIX character class

can anyone tell me why this doesn't work? I've been trying to play with character classes and I seem to be missing something here..! echo "./comparecdna.summary" | awk '/^compare+]summary$/' # returns nothing echo "./compare_cdna.summary" | awk '/^compare_+]summary$/' # returns nothing echo... (5 Replies)
Discussion started by: anthalamus
5 Replies
Login or Register to Ask a Question