Python re.search vs re.sub


 
Thread Tools Search this Thread
Top Forums Programming Python re.search vs re.sub
# 1  
Old 01-27-2016
Python re.search vs re.sub

I am having trouble understanding why these two commands differ with one producing the desire results and the other not. An example:
Code:
capture_str = 'xserver-xorg-video-qxl-dbg (0.1.1-2+b2 [s390x], 0.1.1-2+b1 [amd64, armel, armhf, i386, mips, mipsel, powerpc], 0.1.1-2 [arm64, ppc64el]) X.Org X server -- QXL display driver (debugging symbols)'

re.search(r'(?<=\[s390x\]\, ).*', capture_str).group(0)
'0.1.1-2+b1 [amd64, armel, armhf, i386, mips, mipsel, powerpc], 0.1.1-2 [arm64, ppc64el]) X.Org X server -- QXL display driver (debugging symbols)'

re.sub(r'(?<=\[s390x\]\, ).*', '', capture_str)
'xserver-xorg-video-qxl-dbg (0.1.1-2+b2 [s390x], '

I am truly confused at why "re.sub" doesnt perform a positive lookbehind that re.search can do. It appears to be doing the opposite with the same regex. What is the difference?

Last edited by metallica1973; 01-27-2016 at 08:11 PM..
# 2  
Old 01-31-2016
Quote:
Originally Posted by metallica1973
I am having trouble understanding why these two commands differ with one producing the desire results and the other not.
...
...
I am truly confused at why "re.sub" doesnt perform a positive lookbehind that re.search can do. It appears to be doing the opposite with the same regex. What is the difference?
They are working just as expected. For a moment, disregard the fact that your regex is a look-behind assertion.

The "search" method searches for the pattern and displays it. Since you have greedy search (.*), it displays the string till the end.
The "sub" method substitutes the part of the string that matches the pattern by "nothing" (zero-length string). So, what is left is the remaining part of the string **before the matched pattern** and that is returned.

Here's an example in my python REPL:

Code:
>>> import re
>>> 
>>> x = "The rain in Spain falls mainly in the plain"
>>> 
>>> re.search(r'in.*', x).group(0)
'in in Spain falls mainly in the plain'
>>>

I search for a string that starts with "in" and extends as long as it has to i.e. till the end.
The matched part of the string x is in red below:

Code:
The rain in Spain falls mainly in the plain

and that is returned.

Now, if I use the same regex in the "sub()" method, then it means that I want to substitute the matched part by something. Python does the substitution and returns the resultant string. Here's the test with the same string; I substitute the part that matches the regex by "#":

Code:
>>> 
>>> x
'The rain in Spain falls mainly in the plain'
>>> 
>>> re.sub(r'in.*', '#', x)
'The ra#'
>>> 
>>>

You put a null string instead of '#', hence the matched part got chopped off.

The results of these methods are "opposite" of each other because of the specific regex that you used. It matches a part of the string and goes on till the end. If you replace that by a null string, then the remainder is the part *before* the matched string.

If you had used a non-greedy regex, then the results would not have been "opposite".
An example:

Code:
>>> x
'The rain in Spain falls mainly in the plain'
>>> 
>>> re.search(r"Spain", x).group(0)
'Spain'
>>> 
>>> re.sub(r"Spain", "USA", x)
'The rain in USA falls mainly in the plain'
>>>

The look-behind assertion simply ensures that the string matching the assertion is not a part of the actual match.
So if my regex is "(?<=a)in.*" then it matches "in" and everything after it, provided it has "a" before it.
But "a" is not part of the match. Hence it is not returned.

Code:
>>> x
'The rain in Spain falls mainly in the plain'
>>> 
>>> re.search(r'(?<=a)in.*', x).group(0)
'in in Spain falls mainly in the plain'
>>>

And if I substitute the red part above by null string, then the remainder will be "The ra" as seen below:

Code:
>>> x
'The rain in Spain falls mainly in the plain'
>>> 
>>> re.sub(r'(?<=a)in.*', '', x)
'The ra'
>>> 
>>>

Again, "a" is not part of the matched string, hence it is not replaced.

Hope that helps.
These 3 Users Gave Thanks to durden_tyler For This Post:
# 3  
Old 02-01-2016
Simply Awesome. Many thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Programming

Create a C source and compile inside Python 1.4.0 to 3.7.0 in Python for ALL? platforms...

Hi all... As you know I like making code backwards compatible for as many platforms as possible. This Python script was in fact dedicated for the AMIGA A1200 using Pythons 1.4.0, 1.5.2, 1.6.0, 2.0.1, and 2.4.6 as that is all we have for varying levels of upgrades from a HDD and 4MB FastRam... (1 Reply)
Discussion started by: wisecracker
1 Replies

2. Programming

Search or find a element inside list python

I have a list as follows: From this i need to grep the element using keyword as "primary" and return output as 12:13-internet-wifi-primary i used as follows if (i <= (len(system_info))): ss = system_info print... (5 Replies)
Discussion started by: Priya Amaresh
5 Replies

3. Windows & DOS: Issues & Discussions

How to execute python script on remote with python way..?

Hi all, I am trying to run below python code for connecting remote windows machine from unix to run an python file exist on that remote windows machine.. Below is the code I am trying: #!/usr/bin/env python import wmi c = wmi.WMI("xxxxx", user="xxxx", password="xxxxxxx")... (1 Reply)
Discussion started by: onenessboy
1 Replies

4. Shell Programming and Scripting

Search and Replace+append a text in python

Hello all, I have a verilog file as following (part of it): old.v: bw_r_rf16x32 AUTO_TEMPLATE ( 1957 // .rst_tri_en (mem_write_disable), 1958 .rclk (clk), 1959 .bit_wen (dva_bit_wr_en_e), 1960 .din ... (5 Replies)
Discussion started by: Zam_1234
5 Replies

5. Shell Programming and Scripting

**python** unable to read the background color in python

I am working on requirement on spreadsheet in python scripting. I have a spreadsheet containing cell values and with background color. I am able to read the value value but unable to get the background color of that particular cell. Actually my requirement is to read the cell value along... (1 Reply)
Discussion started by: giridhar276
1 Replies

6. Shell Programming and Scripting

[Python] Search a file with paramiko

I need to compare the output files in a directory for sftp, looking through a mask. Return the full file name. Eg. I have a file named locally: test.txt I must check through sftp, if a file with the following name: test_F060514_H173148.TXT My idea is for the filename to a... (0 Replies)
Discussion started by: Jomeaide
0 Replies

7. Programming

Search for a File Python

I am back at it with Python and have run into a little stupid hurdle. My goal is to simply search for the GeoIP.dat database and add the path to a couple of variables. So for example: geopath=os.system('find /usr/share -iname GeoIP.dat') geobase = pygeoip.GeoIP(geopath, pygeoip.MEMORY_CACHE)... (3 Replies)
Discussion started by: metallica1973
3 Replies

8. SuSE

"ssh suse-server 'python -V' > python-version.out" not redirecting

Okay, so I have had this problem on openSUSE, and Debian systems now and I am hoping for a little help. I think it has something to do with Python but I couldn't find a proper Python area here. I am trying to redirect the output of "ssh suse-server 'python -V'" to a file. It seems that no matter... (3 Replies)
Discussion started by: Druonysus
3 Replies
Login or Register to Ask a Question