Python Newbie Question Regex | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Python Newbie Question Regex

Shell Programming and Scripting


Tags
python, regex

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 03-06-2013
metallica1973 metallica1973 is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 18 April 2014, 4:01 PM EDT
Location: Washington D.C
Posts: 167
Thanks: 16
Thanked 2 Times in 2 Posts
Python Newbie Question Regex

I starting teaching myself python and am stuck on trying to understand why I am not getting the output that I want. Long story short, I am using PDB for debugging and here my function in which I am having my issue:

Code:
import re
...
...
...

def find_all_flvs(url):
    soup = BeautifulSoup(urllib2.urlopen(url))
    flvs = []
    for link in soup.findAll(onclick=re.compile("doShowCHys=1*")):
        link = str(link)
        vidnum   = re.search("\d{5,6}.*&amp", link)
        vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum

        for hashval_url in BeautifulSoup(urllib2.urlopen(vidurl)).findAll("flv"):

            flvs.append(hashval_url.text)

    return flvs

I verified that my regex is correct(\d{5,6}.*&amp):

Code:
"/home/Player.aspx?lpk4=108148&playChapter=True\',960,540,94343);return false;"

produces:

Code:
108148

which is what I want, so when running pdb using steps and I get to:

Code:
vidnum   = re.search("\d{5,6}.*&amp", link)

and this is what I end up with as the output:

Code:
<_sre.SRE_Match object at 0xaaf8de8>

in which I should be seeing:

Code:
108148

so it can be simply appended to:

Code:
vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum

producing:

Code:
(pdb)p vidurl


Code:
http://www.blahblah.com/home/GetPlay...px?lpk4=108148

I have been through several urls and cannot seem to figure out what I am doing wrong:

Python Regular Expressions

??

---------- Post updated at 04:37 PM ---------- Previous update was at 04:21 PM ----------

I made progress. The things you can find out by just reading:\
PHP Code:
re.search(patternstringflags=0)

    
Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the patternnote that this is different from finding a zero-length match at some point in the string.

and 

 
re.findall(patternstringflags=0)

    Return 
all non-overlapping matches of pattern in string, as list of stringsThe string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return list of groupsthis will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match
I was simply using the wrong function. I replaced re.search with re.findall and it worked partially.

Code:
vidnum   = re.findall("\d{5,6}.*&amp", link)
(pdb)p vidum
['108148&amp']
(pdb)p vidurl
http://www.blahblah.com/home/GetPlay...px?lpk4=108148['108148&amp']

How do I remove the brackets and single quotes to produce only:

Code:
http://www.blahblah.com/home/GetPlay...px?lpk4=108148&amp

??

---------- Post updated at 04:53 PM ---------- Previous update was at 04:37 PM ----------

It turned out the vidnum is part of a list and I needed to specify its place in the list, so:

Code:
vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum[0]


Last edited by metallica1973; 03-06-2013 at 05:39 PM..
Sponsored Links
    #2  
Old 03-06-2013
Chubler_XL's Avatar
Chubler_XL Chubler_XL is offline Forum Advisor  
Registered User
 
Join Date: Oct 2010
Last Activity: 22 April 2014, 11:13 AM EDT
Posts: 2,603
Thanks: 94
Thanked 804 Times in 757 Posts
You could also try:


Code:
refound = re.search('\d{5,6}(?=&amp)', link)

if refound:
    vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % refound.group(0)


Last edited by Chubler_XL; 03-06-2013 at 07:11 PM..
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Perl newbie - regex replace all groups issue samask Shell Programming and Scripting 3 12-28-2011 11:31 AM
Python Regex barney34 Shell Programming and Scripting 1 07-21-2009 05:05 PM
UNIX newbie NEWBIE question! Hanamachi UNIX for Dummies Questions & Answers 4 03-28-2009 04:10 PM
NEWBIE QUESTION: python 3 or 2.6.x guptaxpn Programming 2 12-15-2008 11:04 PM
Newbie Regex Question ciremg01 UNIX for Dummies Questions & Answers 0 11-30-2005 04:30 PM



All times are GMT -4. The time now is 09:37 AM.