Python Newbie Question Regex


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Python Newbie Question Regex
# 1  
Old 03-06-2013
Python Newbie Question Regex

I starting teaching myself python and am stuck on trying to understand why I am not getting the output that I want. Long story short, I am using PDB for debugging and here my function in which I am having my issue:
Code:
import re
...
...
...

def find_all_flvs(url):
    soup = BeautifulSoup(urllib2.urlopen(url))
    flvs = []
    for link in soup.findAll(onclick=re.compile("doShowCHys=1*")):
        link = str(link)
        vidnum   = re.search("\d{5,6}.*&amp", link)
        vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum

        for hashval_url in BeautifulSoup(urllib2.urlopen(vidurl)).findAll("flv"):

            flvs.append(hashval_url.text)

    return flvs

I verified that my regex is correct(\d{5,6}.*&amp):
Code:
"/home/Player.aspx?lpk4=108148&playChapter=True\',960,540,94343);return false;"

produces:
Code:
108148

which is what I want, so when running pdb using steps and I get to:
Code:
vidnum   = re.search("\d{5,6}.*&amp", link)

and this is what I end up with as the output:
Code:
<_sre.SRE_Match object at 0xaaf8de8>

in which I should be seeing:
Code:
108148

so it can be simply appended to:
Code:
vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum

producing:
Code:
(pdb)p vidurl

I have been through several urls and cannot seem to figure out what I am doing wrong:

Python Regular Expressions

??

---------- Post updated at 04:37 PM ---------- Previous update was at 04:21 PM ----------

I made progress. The things you can find out by just reading:\
PHP Code:
re.search(patternstringflags=0)

    
Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the patternnote that this is different from finding a zero-length match at some point in the string.

and 

 
re.findall(patternstringflags=0)

    Return 
all non-overlapping matches of pattern in string, as list of stringsThe string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return list of groupsthis will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match
I was simply using the wrong function. I replaced re.search with re.findall and it worked partially.
Code:
vidnum   = re.findall("\d{5,6}.*&amp", link)
(pdb)p vidum
['108148&amp']
(pdb)p vidurl
http://www.blahblah.com/home/GetPlay...px?lpk4=108148['108148&amp']

How do I remove the brackets and single quotes to produce only:
Code:
http://www.blahblah.com/home/GetPlay...px?lpk4=108148&amp

??

---------- Post updated at 04:53 PM ---------- Previous update was at 04:37 PM ----------

It turned out the vidnum is part of a list and I needed to specify its place in the list, so:
Code:
vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % vidnum[0]


Last edited by metallica1973; 03-06-2013 at 06:39 PM..
# 2  
Old 03-06-2013
You could also try:

Code:
refound = re.search('\d{5,6}(?=&amp)', link)

if refound:
    vidurl   = "http://www.blahblah.com/home/GetPlayerXML.aspx?lpk4=%s" % refound.group(0)


Last edited by Chubler_XL; 03-06-2013 at 08:11 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Python with Regex and Excel

Hello I have a big excel file for Ticket Data Analysis. The idea is to make meaningful insight from Resolution Field. Now as people write whatever they feel like while resolving the ticket it makes quite a task. 1. They may or may not tag it with something like below within the resolution... (1 Reply)
Discussion started by: radioactive9
1 Replies

2. Programming

Python Regex List Creation

Here is a snippet of my code: blahblahblah... blah for link in goodies.soup.find_all('a'): blah.append(link.get('href')) blah=list(set(blah)) which gives my list of urls. So now I use a regex to search for the relevant urls which I want in a list: for r... (0 Replies)
Discussion started by: metallica1973
0 Replies

3. Shell Programming and Scripting

Python Regex Removing One Too Many...

Well, I'm a python noob and my last post here I was introduced to Regex. I thought this would be easy since I knew Regex with Bash. However, I've been banging my head a while to extract an ip address from ifconfig with this: #!/usr/bin/python import re import subprocess from subprocess... (5 Replies)
Discussion started by: Azrael
5 Replies

4. Programming

Python Reading Individual files and Regex through them

As a newbie to Python, I am trying to write a script in which is will add all the log files (*.log) from within a directory to a list, open the files and search for an ip using a regex and single it out (appending the ip's to the list). So far, I have: import re, os def list_files() content = ... (4 Replies)
Discussion started by: metallica1973
4 Replies

5. Shell Programming and Scripting

Newbie Python Url Scraper

I setup Zoneminder and have been playing around with setting up a couple of Wanscam PTZ ip cameras in which I have been running into road blocks with streaming and etc. I cant find much information on the camera and its webserver that sits on it and wanted to get a an absolute directory structure... (5 Replies)
Discussion started by: metallica1973
5 Replies

6. Shell Programming and Scripting

Perl newbie - regex replace all groups issue

Hello, Although I have found similar questions, I could not find advice that could help with my problem. The issue: I am trying to replace all occurrences of a regex, but I cannot make the regex groups work together. This is a simple input test file: The Vedanta Philosophy... (3 Replies)
Discussion started by: samask
3 Replies

7. Shell Programming and Scripting

Python Regex

I have the below string and regex. However I cant understand why it works the way it does. IP has been changed for safety ;) String = NowSMS Error Report. Error initializing SMSC Interface 'SMPP - 10.15.8.10:17600'. Interface is not available. Regex = (.+\.)\s(.+) I get two... (1 Reply)
Discussion started by: barney34
1 Replies

8. UNIX for Dummies Questions & Answers

UNIX newbie NEWBIE question!

Hello everyone, Just started UNIX today! In our school we use solaris. I just want to know how do I setup Solaris 10 not the GUI one, the one where you have to type the commands like ECHO, ls, pwd, etc... I have windows xp and I also have vmware. I hope I am not missing anything! :p (4 Replies)
Discussion started by: Hanamachi
4 Replies

9. Programming

NEWBIE QUESTION: python 3 or 2.6.x

I'm a newbie and want to learn a programming language, willy-nilly I picked python... Should I go with 2.6.x which at first glance seems extremely well documented, or should I go with 3.0, which is new and shiny?! I want...no...I'm going to NEED fantastic documentation or I'm going to fail... (2 Replies)
Discussion started by: guptaxpn
2 Replies

10. UNIX for Dummies Questions & Answers

Newbie Regex Question

Hello, I am trying to use the CDE File Manager in AIX to filter out files that I want to be hidden in the file manager. It gives me a script box that I can supposedly enter a regex into if I want to filter out additional file types. Example: "Also Hide:"_______________ I can put... (0 Replies)
Discussion started by: ciremg01
0 Replies
Login or Register to Ask a Question