Unix/Linux Go Back    


Programming Post questions about C, C++, Java, SQL, and other programming languages here.

Python re.search vs re.sub

Programming


Tags
perl, regular expressions, solved

Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 01-27-2016   -   Original Discussion by metallica1973
metallica1973's Unix or Linux Image
metallica1973 metallica1973 is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 7 June 2017, 3:38 PM EDT
Location: Washington D.C
Posts: 219
Thanks: 31
Thanked 3 Times in 3 Posts
Python re.search vs re.sub

I am having trouble understanding why these two commands differ with one producing the desire results and the other not. An example:

Code:
capture_str = 'xserver-xorg-video-qxl-dbg (0.1.1-2+b2 [s390x], 0.1.1-2+b1 [amd64, armel, armhf, i386, mips, mipsel, powerpc], 0.1.1-2 [arm64, ppc64el]) X.Org X server -- QXL display driver (debugging symbols)'

re.search(r'(?<=\[s390x\]\, ).*', capture_str).group(0)
'0.1.1-2+b1 [amd64, armel, armhf, i386, mips, mipsel, powerpc], 0.1.1-2 [arm64, ppc64el]) X.Org X server -- QXL display driver (debugging symbols)'

re.sub(r'(?<=\[s390x\]\, ).*', '', capture_str)
'xserver-xorg-video-qxl-dbg (0.1.1-2+b2 [s390x], '

I am truly confused at why "re.sub" doesnt perform a positive lookbehind that re.search can do. It appears to be doing the opposite with the same regex. What is the difference?

Last edited by metallica1973; 01-27-2016 at 07:11 PM..
Sponsored Links
    #2  
Old Unix and Linux 01-31-2016   -   Original Discussion by metallica1973
durden_tyler's Unix or Linux Image
durden_tyler durden_tyler is offline Forum Advisor  
Registered User
 
Join Date: Apr 2009
Last Activity: 9 September 2017, 1:30 PM EDT
Posts: 2,083
Thanks: 21
Thanked 383 Times in 346 Posts
Quote:
Originally Posted by metallica1973 View Post
I am having trouble understanding why these two commands differ with one producing the desire results and the other not.
...
...
I am truly confused at why "re.sub" doesnt perform a positive lookbehind that re.search can do. It appears to be doing the opposite with the same regex. What is the difference?
They are working just as expected. For a moment, disregard the fact that your regex is a look-behind assertion.

The "search" method searches for the pattern and displays it. Since you have greedy search (.*), it displays the string till the end.
The "sub" method substitutes the part of the string that matches the pattern by "nothing" (zero-length string). So, what is left is the remaining part of the string **before the matched pattern** and that is returned.

Here's an example in my python REPL:


Code:
>>> import re
>>> 
>>> x = "The rain in Spain falls mainly in the plain"
>>> 
>>> re.search(r'in.*', x).group(0)
'in in Spain falls mainly in the plain'
>>>

I search for a string that starts with "in" and extends as long as it has to i.e. till the end.
The matched part of the string x is in red below:


Code:
The rain in Spain falls mainly in the plain

and that is returned.

Now, if I use the same regex in the "sub()" method, then it means that I want to substitute the matched part by something. Python does the substitution and returns the resultant string. Here's the test with the same string; I substitute the part that matches the regex by "#":


Code:
>>> 
>>> x
'The rain in Spain falls mainly in the plain'
>>> 
>>> re.sub(r'in.*', '#', x)
'The ra#'
>>> 
>>>

You put a null string instead of '#', hence the matched part got chopped off.

The results of these methods are "opposite" of each other because of the specific regex that you used. It matches a part of the string and goes on till the end. If you replace that by a null string, then the remainder is the part *before* the matched string.

If you had used a non-greedy regex, then the results would not have been "opposite".
An example:


Code:
>>> x
'The rain in Spain falls mainly in the plain'
>>> 
>>> re.search(r"Spain", x).group(0)
'Spain'
>>> 
>>> re.sub(r"Spain", "USA", x)
'The rain in USA falls mainly in the plain'
>>>

The look-behind assertion simply ensures that the string matching the assertion is not a part of the actual match.
So if my regex is "(?<=a)in.*" then it matches "in" and everything after it, provided it has "a" before it.
But "a" is not part of the match. Hence it is not returned.


Code:
>>> x
'The rain in Spain falls mainly in the plain'
>>> 
>>> re.search(r'(?<=a)in.*', x).group(0)
'in in Spain falls mainly in the plain'
>>>

And if I substitute the red part above by null string, then the remainder will be "The ra" as seen below:


Code:
>>> x
'The rain in Spain falls mainly in the plain'
>>> 
>>> re.sub(r'(?<=a)in.*', '', x)
'The ra'
>>> 
>>>

Again, "a" is not part of the matched string, hence it is not replaced.

Hope that helps.
The Following 3 Users Say Thank You to durden_tyler For This Useful Post:
jim mcnamara (01-31-2016), metallica1973 (02-01-2016), Yoda (02-01-2016)
Sponsored Links
    #3  
Old Unix and Linux 02-01-2016   -   Original Discussion by metallica1973
metallica1973's Unix or Linux Image
metallica1973 metallica1973 is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 7 June 2017, 3:38 PM EDT
Location: Washington D.C
Posts: 219
Thanks: 31
Thanked 3 Times in 3 Posts
Simply Awesome. Many thanks
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Search and Replace+append a text in python Zam_1234 Shell Programming and Scripting 5 11-19-2015 12:40 PM
[Python] Search a file with paramiko Jomeaide Shell Programming and Scripting 0 05-19-2014 12:14 PM
Search for a File Python metallica1973 Programming 3 10-24-2013 06:45 PM



All times are GMT -4. The time now is 06:12 AM.