Sponsored Content
Full Discussion: Newbie Python Url Scraper
Top Forums Shell Programming and Scripting Newbie Python Url Scraper Post 302816739 by metallica1973 on Tuesday 4th of June 2013 01:12:37 PM
Old 06-04-2013
Newbie Python Url Scraper

I setup Zoneminder and have been playing around with setting up a couple of Wanscam PTZ ip cameras in which I have been running into road blocks with streaming and etc. I cant find much information on the camera and its webserver that sits on it and wanted to get a an absolute directory structure of the webserver on the camera. I tried using:
Code:
wget --spider -r 192.168.3.3:80
Spider mode enabled. Check if remote file exists.
--2013-06-04 13:00:49--  (try: 5)  http://192.168.3.3/
Connecting to 192.168.3.3:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

Spider mode enabled. Check if remote file exists.
--2013-06-04 13:00:54--  (try: 6)  http://192.168.3.3/
Connecting to 192.168.3.3:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

Spider mode enabled. Check if remote file exists.
--2013-06-04 13:01:00--  (try: 7)  http://192.168.3.3/
Connecting to 192.168.3.3:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

Spider mode enabled. Check if remote file exists.
--2013-06-04 13:01:07--  (try: 8)  http://192.168.3.3/
Connecting to 192.168.3.3:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

but doesnt find a thing. I know it has a webserver that reside on TCP:80 because I can view the camera through it. I have been attempting to use Pythons "scrapy" but can understand how to tell it to crawl and find the directory structure as opposed to where to start looking for it. This is what I have so far:
Code:
 #!/usr/bin/env python
# encoding=utf-8

from scrapy.spider import BaseSpider
from scrapy.http import Request
from scrapy.http import FormRequest
from scrapy.selector import HtmlXPathSelector
from scrapy import log
import sys
### Kludge to set default encoding to utf-8
reload(sys)
sys.setdefaultencoding('utf-8')

class PTZcamera(BaseSpider):
      name = "camera"
      allowed_domains = ["http://192.168.3.3:80"]
      #start_urls = [""]

      def parse(self, response):
          pass

but doesn't produce much. I would like an output in which is display on the absolute path of the directory on the webserver like:
Code:
http://192.168.3.3/cgi-bin/blah
http://192.168.3.3/cgi-bin/blah2
http://192.168.3.3/video/blah1
http://192.168.3.3/video/blah2
...
...
...

Can someone point me in the correct direction?
 

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

2. Shell Programming and Scripting

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

3. Programming

NEWBIE QUESTION: python 3 or 2.6.x

I'm a newbie and want to learn a programming language, willy-nilly I picked python... Should I go with 2.6.x which at first glance seems extremely well documented, or should I go with 3.0, which is new and shiny?! I want...no...I'm going to NEED fantastic documentation or I'm going to fail... (2 Replies)
Discussion started by: guptaxpn
2 Replies

4. UNIX for Dummies Questions & Answers

ReDirecting a URL to another URL - Linux

Hello, I need to redirect an existing URL, how can i do that? There's a current web address to a GUI that I have to redirect to another webaddress. Does anyone know how to do this? This is on Unix boxes Linux. example: https://m45.testing.address.net/host.php make it so the... (3 Replies)
Discussion started by: SkySmart
3 Replies

5. UNIX for Dummies Questions & Answers

UNIX newbie NEWBIE question!

Hello everyone, Just started UNIX today! In our school we use solaris. I just want to know how do I setup Solaris 10 not the GUI one, the one where you have to type the commands like ECHO, ls, pwd, etc... I have windows xp and I also have vmware. I hope I am not missing anything! :p (4 Replies)
Discussion started by: Hanamachi
4 Replies

6. Web Development

Regex to rewrite URL to another URL based on HTTP_HOST?

I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST The call goes out to http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena The ID in the middle is always random due to the cookie. I... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

7. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

8. Shell Programming and Scripting

Python Newbie Question Regex

I starting teaching myself python and am stuck on trying to understand why I am not getting the output that I want. Long story short, I am using PDB for debugging and here my function in which I am having my issue: import re ... ... ... def find_all_flvs(url): soup =... (1 Reply)
Discussion started by: metallica1973
1 Replies

9. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies
DH_PYTHON(1)							     Debhelper							      DH_PYTHON(1)

NAME
dh_python - calculates Python dependencies and adds postinst and prerm Python scripts (deprecated) SYNOPSIS
dh_python [debhelperoptions] [-n] [-V version] [moduledirs...] DESCRIPTION
Note: This program is deprecated. You should use dh_python2 instead. This program will do nothing if debian/pycompat or a Python-Version control file field exists. dh_python is a debhelper program that is responsible for generating the ${python:Depends} substitutions and adding them to substvars files. It will also add a postinst and a prerm script if required. The program will look at Python scripts and modules in your package, and will use this information to generate a dependency on python, with the current major version, or on pythonX.Y if your scripts or modules need a specific python version. The dependency will be substituted into your package's control file wherever you place the token ${python:Depends}. If some modules need to be byte-compiled at install time, appropriate postinst and prerm scripts will be generated. If already byte- compiled modules are found, they are removed. If you use this program, your package should build-depend on python. OPTIONS
module dirs If your package installs Python modules in non-standard directories, you can make dh_python check those directories by passing their names on the command line. By default, it will check /usr/lib/site-python, /usr/lib/$PACKAGE, /usr/share/$PACKAGE, /usr/lib/games/$PACKAGE, /usr/share/games/$PACKAGE and /usr/lib/python?.?/site-packages. Note: only /usr/lib/site-python, /usr/lib/python?.?/site-packages and the extra names on the command line are searched for binary (.so) modules. -V version If the .py files your package ships are meant to be used by a specific pythonX.Y version, you can use this option to specify the desired version, such as 2.3. Do not use if you ship modules in /usr/lib/site-python. -n, --noscripts Do not modify postinst/prerm scripts. CONFORMS TO
Debian policy, version 3.5.7 Python policy, version 0.3.7 SEE ALSO
debhelper(7) This program is a part of debhelper. AUTHOR
Josselin Mouette <joss@debian.org> most ideas stolen from Brendan O'Dea <bod@debian.org> 9.20120909 2011-12-06 DH_PYTHON(1)
All times are GMT -4. The time now is 02:38 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy