Sponsored Content
Full Discussion: Newbie Python Url Scraper
Top Forums Shell Programming and Scripting Newbie Python Url Scraper Post 302816739 by metallica1973 on Tuesday 4th of June 2013 01:12:37 PM
Old 06-04-2013
Newbie Python Url Scraper

I setup Zoneminder and have been playing around with setting up a couple of Wanscam PTZ ip cameras in which I have been running into road blocks with streaming and etc. I cant find much information on the camera and its webserver that sits on it and wanted to get a an absolute directory structure of the webserver on the camera. I tried using:
Code:
wget --spider -r 192.168.3.3:80
Spider mode enabled. Check if remote file exists.
--2013-06-04 13:00:49--  (try: 5)  http://192.168.3.3/
Connecting to 192.168.3.3:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

Spider mode enabled. Check if remote file exists.
--2013-06-04 13:00:54--  (try: 6)  http://192.168.3.3/
Connecting to 192.168.3.3:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

Spider mode enabled. Check if remote file exists.
--2013-06-04 13:01:00--  (try: 7)  http://192.168.3.3/
Connecting to 192.168.3.3:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

Spider mode enabled. Check if remote file exists.
--2013-06-04 13:01:07--  (try: 8)  http://192.168.3.3/
Connecting to 192.168.3.3:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.

but doesnt find a thing. I know it has a webserver that reside on TCP:80 because I can view the camera through it. I have been attempting to use Pythons "scrapy" but can understand how to tell it to crawl and find the directory structure as opposed to where to start looking for it. This is what I have so far:
Code:
 #!/usr/bin/env python
# encoding=utf-8

from scrapy.spider import BaseSpider
from scrapy.http import Request
from scrapy.http import FormRequest
from scrapy.selector import HtmlXPathSelector
from scrapy import log
import sys
### Kludge to set default encoding to utf-8
reload(sys)
sys.setdefaultencoding('utf-8')

class PTZcamera(BaseSpider):
      name = "camera"
      allowed_domains = ["http://192.168.3.3:80"]
      #start_urls = [""]

      def parse(self, response):
          pass

but doesn't produce much. I would like an output in which is display on the absolute path of the directory on the webserver like:
Code:
http://192.168.3.3/cgi-bin/blah
http://192.168.3.3/cgi-bin/blah2
http://192.168.3.3/video/blah1
http://192.168.3.3/video/blah2
...
...
...

Can someone point me in the correct direction?
 

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

2. Shell Programming and Scripting

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

3. Programming

NEWBIE QUESTION: python 3 or 2.6.x

I'm a newbie and want to learn a programming language, willy-nilly I picked python... Should I go with 2.6.x which at first glance seems extremely well documented, or should I go with 3.0, which is new and shiny?! I want...no...I'm going to NEED fantastic documentation or I'm going to fail... (2 Replies)
Discussion started by: guptaxpn
2 Replies

4. UNIX for Dummies Questions & Answers

ReDirecting a URL to another URL - Linux

Hello, I need to redirect an existing URL, how can i do that? There's a current web address to a GUI that I have to redirect to another webaddress. Does anyone know how to do this? This is on Unix boxes Linux. example: https://m45.testing.address.net/host.php make it so the... (3 Replies)
Discussion started by: SkySmart
3 Replies

5. UNIX for Dummies Questions & Answers

UNIX newbie NEWBIE question!

Hello everyone, Just started UNIX today! In our school we use solaris. I just want to know how do I setup Solaris 10 not the GUI one, the one where you have to type the commands like ECHO, ls, pwd, etc... I have windows xp and I also have vmware. I hope I am not missing anything! :p (4 Replies)
Discussion started by: Hanamachi
4 Replies

6. Web Development

Regex to rewrite URL to another URL based on HTTP_HOST?

I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST The call goes out to http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena The ID in the middle is always random due to the cookie. I... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

7. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

8. Shell Programming and Scripting

Python Newbie Question Regex

I starting teaching myself python and am stuck on trying to understand why I am not getting the output that I want. Long story short, I am using PDB for debugging and here my function in which I am having my issue: import re ... ... ... def find_all_flvs(url): soup =... (1 Reply)
Discussion started by: metallica1973
1 Replies

9. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies
SHOREWALL-EXCLUSION(5)						  [FIXME: manual]					    SHOREWALL-EXCLUSION(5)

NAME
exclusion - Exclude a set of hosts from a definition in a shorewall configuration file. SYNOPSIS
!address-or-range[,address-or-range]... !zone-name[,zone-name]... DESCRIPTION
The first form of exclusion is used when you wish to exclude one or more addresses from a definition. An exclaimation point is followed by a comma-separated list of addresses. The addresses may be single host addresses (e.g., 192.168.1.4) or they may be network addresses in CIDR format (e.g., 192.168.1.0/24). If your kernel and iptables include iprange support, you may also specify ranges of ip addresses of the form lowaddress-highaddress No embedded whitespace is allowed. Exclusion can appear after a list of addresses and/or address ranges. In that case, the final list of address is formed by taking the first list and then removing the addresses defined in the exclusion. Beginning in Shorewall 4.4.13, the second form of exclusion is allowed after all and any in the SOURCE and DEST columns of /etc/shorewall/rules. It allows you to omit arbitrary zones from the list generated by those key words. Warning If you omit a sub-zone and there is an explicit or explicit CONTINUE policy, a connection to/from that zone can still be matched by the rule generated for a parent zone. For example: /etc/shorewall/zones: #ZONE TYPE z1 ip z2:z1 ip ... /etc/shorewall/policy: #SOURCE DEST POLICY z1 net CONTINUE z2 net REJECT /etc/shorewall/rules: #ACTION SOURCE DEST PROTO DEST # PORT(S) ACCEPT all!z2 net tcp 22 In this case, SSH connections from z2 to net will be accepted by the generated z1 to net ACCEPT rule. In most contexts, ipset names can be used as an address-or-range. Beginning with Shorewall 4.4.14, ipset lists enclosed in +[...] may also be included (see shorewall-ipsets[1] (5)). The semantics of these lists when used in an exclusion are as follows: o !+[set1,set2,...setN] produces a packet match if the packet does not match at least one of the sets. In other words, it is like NOT match set1 OR NOT match set2 ... OR NOT match setN. o +[!set1,!set2,...!setN] produces a packet match if the packet does not match any of the sets. In other words, it is like NOT match set1 AND NOT match set2 ... AND NOT match setN. EXAMPLES
Example 1 - All IPv4 addresses except 192.168.3.4 !192.168.3.4 Example 2 - All IPv4 addresses except the network 192.168.1.0/24 and the host 10.2.3.4 !192.168.1.0/24,10.1.3.4 Example 3 - All IPv4 addresses except the range 192.168.1.3-192.168.1.12 and the network 10.0.0.0/8 !192.168.1.3-192.168.1.12,10.0.0.0/8 Example 4 - The network 192.168.1.0/24 except hosts 192.168.1.3 and 192.168.1.9 192.168.1.0/24!192.168.1.3,192.168.1.9 Example 5 - All parent zones except loc any!loc FILES
/etc/shorewall/hosts /etc/shorewall/masq /etc/shorewall/rules /etc/shorewall/tcrules SEE ALSO
shorewall(8), shorewall-accounting(5), shorewall-actions(5), shorewall-blacklist(5), shorewall-hosts(5), shorewall_interfaces(5), shorewall-ipsets(5), shorewall-maclist(5), shorewall-masq(5), shorewall-nat(5), shorewall-netmap(5), shorewall-params(5), shorewall-policy(5), shorewall-providers(5), shorewall-proxyarp(5), shorewall-rtrules(5), shorewall-routestopped(5), shorewall-rules(5), shorewall.conf(5), shorewall-secmarks(5), shorewall-tcclasses(5), shorewall-tcdevices(5), shorewall-tcrules(5), shorewall-tos(5), shorewall-tunnels(5), shorewall-zones(5) NOTES
1. shorewall-ipsets http://www.shorewall.net/manpages/shorewall-ipsets.html [FIXME: source] 06/28/2012 SHOREWALL-EXCLUSION(5)
All times are GMT -4. The time now is 05:02 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy