Regex issue with \s in character class.


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regex issue with \s in character class.
# 1  
Regex issue with \s in character class.

Anybody have an explanation for why \s doesn't match ' ' in a character class? Here are 3 examples with the final example showing that \s in a character class (demonstrated by using egrep -o) fails:

\s works outside of class..

Code:
# echo " FOO " | egrep -o '\s[A-Z]+\s'
 FOO

Here is a sanity check using ' '...

Code:
# echo " FOO " | egrep -o '[  ][A-Z]+[  ]'
 FOO

Here I try with \s in a character class
Code:
# echo " FOO " | egrep -o '[\s][A-Z]+[\s]'
#

# 2  
FWIW:

Code:
macos$  echo " FOO " | grep -o -E '[[:space:]][A-Z]+[[:space:]]'
 FOO

Code:
linux# echo " FOO " | grep -o -E '[[:space:]][A-Z]+[[:space:]]'
 FOO

Sorry, I know those examples do not explain why \s is not supported.

It's a good question why. Maybe someone has the answer?
# 3  
Also, another FWIW:

Code:
linux# php -a
Interactive mode enabled

php > $a = ' FOO ';
php > echo preg_match('/[\s][A-Z]+[\s]/', $a,$matches);
1
php > echo $matches[0];
 FOO 
php >

so, it works in a PHP REGEX .....
# 4  
It is a shorthand character class that mostly expands to [ \f\t\n\v] -- it is shorthand for a character class, like \d and others

Explained here:.

Regexp Tutorial - Shorthand Character Classes has this for \s:

Quote:
\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a line break, or a form feed. Most flavors also include the vertical tab, with Perl (prior to version 5.18) and PCRE (prior to version 8.34) being notable exceptions. In flavors that support Unicode, \s normally includes all characters from the Unicode “separator” category. Java and PCRE are exceptions once again. But JavaScript does match all Unicode whitespace with \s.
This User Gave Thanks to jim mcnamara For This Post:
# 5  
Quote:
Originally Posted by blackrageous
Anybody have an explanation for why \s doesn't match ' ' in a character class? Here are 3 examples with the final example showing that \s in a character class (demonstrated by using egrep -o) fails:

\s works outside of class..

Code:
# echo " FOO " | egrep -o '\s[A-Z]+\s'
 FOO


Here is a sanity check using ' '...

Code:
# echo " FOO " | egrep -o '[  ][A-Z]+[  ]'
 FOO

Here I try with \s in a character class
Code:
# echo " FOO " | egrep -o '[\s][A-Z]+[\s]'
#

The backslash is not a metacharacter in a POSIX compliant bracket expression. Therefore [\s] will match the characters \ or s instead of white space.
# 6  
Quote:
Originally Posted by Neo
Also, another FWIW:

Code:
linux# php -a
Interactive mode enabled

php > $a = ' FOO ';
php > echo preg_match('/[\s][A-Z]+[\s]/', $a,$matches);
1
php > echo $matches[0];
 FOO 
php >

so, it works in a PHP REGEX .....
POSIX regular expressions (BRE/ERE) and the GNU and BSD derivatives are the exception, rather than the norm, in that \ is not special/meta within a character class (bracket) expression.

In (I think all) other regex flavours \ is special in a character class (bracket) expression.


--
Shortcuts like \s are not part of POSIX BRE/ERE, but are supported as an extension in BSD and GNU implementations.

Last edited by Scrutinizer; 01-02-2020 at 11:04 AM..
# 7  
Interesting. I was using the linux...egrep (posix) and was just surprised when it didn't work.

--- Post updated at 19:21 ---

That seems to explain it. Thanks.
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #62
Difficulty: Easy
DNS in Internet terms stands for the 'Domain Name System'.
True or False?

10 More Discussions You Might Find Interesting

1. Programming

C++ : Base class member function not accessible from derived class

Hello All, I am a learner in C++. I was testing my inheritance knowledge with following piece of code. #include <iostream> using namespace std; class base { public : void display() { cout << "In base display()" << endl; } void display(int k) {... (2 Replies)
Discussion started by: anand.shah
2 Replies

2. Shell Programming and Scripting

Match string against character class in bash

Hello, I want to check whether string has only numeric characters. The following code doesn't work for me #!/usr/local/bin/bash if ]]; then echo "true" else echo "False" fi # ./yyy '346' False # ./yyy 'aaa' False I'm searching for solution using character classes, not regex.... (5 Replies)
Discussion started by: urello
5 Replies

3. Programming

Size of Derived class, upon virtual base class inheritance

I have the two class definition as follows. class A { public: int a; }; class B : virtual public A{ }; The size of class A is shown as 4, and size of class B is shown as 16. Why is this effect ?. (2 Replies)
Discussion started by: techmonk
2 Replies

4. Shell Programming and Scripting

Regex space character

Hi, I have following regex condition, however it does not work with different logs having same visible string.I believe it is because of some difference with space character, is it possible to make it work everywhere. Can someone suggest a better string? /BIND dn=" uid=/ Thanks. (8 Replies)
Discussion started by: susankoperna1
8 Replies

5. UNIX for Advanced & Expert Users

Get pointer for existing device class (struct class) in Linux kernel module

Hi all! I am trying to register a device in an existing device class, but I am having trouble getting the pointer to an existing class. I can create a class in a module, get the pointer to it and then use it to register the device with: *cl = class_create(THIS_MODULE, className);... (0 Replies)
Discussion started by: hdaniel@ualg.pt
0 Replies

6. Shell Programming and Scripting

Regex:search/replace but not for escaped character

Hi Input: - -- --- ---- aa-bb-cc aa--bb--cc aa---bb---cc aa----bb----cc Output: . - -. -- aa.bb.cc (7 Replies)
Discussion started by: chitech
7 Replies

7. Shell Programming and Scripting

Regex escape special character in AWK if statement

I am having issues escaping special characters in my AWK script as follows: for id in `cat file` do grep $id in file2 | awk '\ BEGIN {var=""} \ { if ( /stringwith+'|'+'50'chars/ ) { echo "do this" } else if ( /anotherString/ ) { echo "do that" } else { ... (4 Replies)
Discussion started by: purebc
4 Replies

8. Shell Programming and Scripting

perl regex issue

Hi, I find it really strange while writing a simple regex to match and print the matched string, dibyajyo@fwtest:~ #perl -e '$x = "root@rashmi>"; print "matched string:$1\n" if ($x =~ /(root@rashmi)/);' matched string:root dibyajyo@fwtest:~ #perl -e '$x = "root@rashmi>"; print... (1 Reply)
Discussion started by: rrd1986
1 Replies

9. Shell Programming and Scripting

regex to find font class

So, I need to find the instances of a certain font and remove it....so far in my testing I am using the find command with regex to find a font I want to pull out. However, I seem to be slightly stuck, and I am sure the beard stroking Unix geniuses here can help me. My example code: find... (7 Replies)
Discussion started by: tlarkin
7 Replies

10. Shell Programming and Scripting

awk and POSIX character class

can anyone tell me why this doesn't work? I've been trying to play with character classes and I seem to be missing something here..! echo "./comparecdna.summary" | awk '/^compare+]summary$/' # returns nothing echo "./compare_cdna.summary" | awk '/^compare_+]summary$/' # returns nothing echo... (5 Replies)
Discussion started by: anthalamus
5 Replies

Featured Tech Videos