> Python Newbie Question Regex | Unix Linux Forums | Shell Programming and Scripting

  Unix/Linux Go Back    


Shell Programming and Scripting Unix shell scripting - KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and shell scripts and shell scripting languages here.

Python Newbie Question Regex

Shell Programming and Scripting


Tags
python, regex

Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 03-06-2013
metallica1973 metallica1973 is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 20 April 2016, 2:08 PM EDT
Location: Washington D.C
Posts: 215
Thanks: 29
Thanked 2 Times in 2 Posts
Python Newbie Question Regex

I starting teaching myself python and am stuck on trying to understand why I am not getting the output that I want. Long story short, I am using PDB for debugging and here my function in which I am having my issue:

Code:
import re
...
...
...

def find_all_flvs(url):
    soup = BeautifulSoup(urllib2.urlopen(url))
    flvs = []
    for link in soup.findAll(onclick=re.compile("doShowCHys=1*")):
        link = str(link)
        vidnum   = re.search("\d{5,6}.*&amp", link)
        vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum

        for hashval_url in BeautifulSoup(urllib2.urlopen(vidurl)).findAll("flv"):

            flvs.append(hashval_url.text)

    return flvs

I verified that my regex is correct(\d{5,6}.*&amp):

Code:
"/home/Player.aspx?lpk4=108148&playChapter=True\',960,540,94343);return false;"

produces:

Code:
108148

which is what I want, so when running pdb using steps and I get to:

Code:
vidnum   = re.search("\d{5,6}.*&amp", link)

and this is what I end up with as the output:

Code:
<_sre.SRE_Match object at 0xaaf8de8>

in which I should be seeing:

Code:
108148

so it can be simply appended to:

Code:
vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum

producing:

Code:
(pdb)p vidurl


Code:
http://www.blahblah.com/home/GetPlay...px?lpk4=108148

I have been through several urls and cannot seem to figure out what I am doing wrong:

Python Regular Expressions

??

---------- Post updated at 04:37 PM ---------- Previous update was at 04:21 PM ----------

I made progress. The things you can find out by just reading:\
PHP Code:
re.search(patternstringflags=0)

    
Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the patternnote that this is different from finding a zero-length match at some point in the string.

and 

 
re.findall(patternstringflags=0)

    Return 
all non-overlapping matches of pattern in string, as list of stringsThe string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return list of groupsthis will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match
I was simply using the wrong function. I replaced re.search with re.findall and it worked partially.

Code:
vidnum   = re.findall("\d{5,6}.*&amp", link)
(pdb)p vidum
['108148&amp']
(pdb)p vidurl
http://www.blahblah.com/home/GetPlay...px?lpk4=108148['108148&amp']

How do I remove the brackets and single quotes to produce only:

Code:
http://www.blahblah.com/home/GetPlay...px?lpk4=108148&amp

??

---------- Post updated at 04:53 PM ---------- Previous update was at 04:37 PM ----------

It turned out the vidnum is part of a list and I needed to specify its place in the list, so:

Code:
vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum[0]


Last edited by metallica1973; 03-06-2013 at 05:39 PM..
Sponsored Links
    #2  
Old Unix and Linux 03-06-2013
Chubler_XL's Unix or Linux Image
Chubler_XL Chubler_XL is offline Forum Staff  
Moderator
 
Join Date: Oct 2010
Last Activity: 20 July 2016, 11:07 PM EDT
Posts: 3,261
Thanks: 126
Thanked 1,091 Times in 1,022 Posts
You could also try:


Code:
refound = re.search('\d{5,6}(?=&amp)', link)

if refound:
    vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % refound.group(0)


Last edited by Chubler_XL; 03-06-2013 at 07:11 PM..
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Perl newbie - regex replace all groups issue samask Shell Programming and Scripting 3 12-28-2011 11:31 AM
Python Regex barney34 Shell Programming and Scripting 1 07-21-2009 05:05 PM
UNIX newbie NEWBIE question! Hanamachi UNIX for Dummies Questions & Answers 4 03-28-2009 04:10 PM
NEWBIE QUESTION: python 3 or 2.6.x guptaxpn Programming 2 12-15-2008 11:04 PM
Newbie Regex Question ciremg01 UNIX for Dummies Questions & Answers 0 11-30-2005 04:30 PM



All times are GMT -4. The time now is 07:30 AM.