Website crawler


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Website crawler
# 1  
Old 10-11-2011
Website crawler

Hi,

I want to build a crawler that seeks for a keyword on certain websites.


This is what the website looks like:

website.com/xxxxAA11xxxx

I want that the crawler automatically changes the letters alphanumerically and if a certain keyword is found, the website got to be logged.

But I have no plan how to do that, please help me out!

Thank you.
# 2  
Old 10-11-2011
change what letters, to what?

What's your system? What's your shell?

Last edited by Corona688; 10-11-2011 at 06:12 PM..
# 3  
Old 10-11-2011
I use OS X and bash.

Change the letters from 0000 to 99ZZ so the first two digits should be all numbers from 00 to 99 and the last two digits every letter and every number in every combination.

Last edited by yaylol; 10-11-2011 at 07:07 PM..
# 4  
Old 10-11-2011
If you have a mac, you'll have to use curl. It's actually fairly good at this -- you can tell it whole lists of things to fetch, and split on --_curl_-- to tell when each new page begins and ends when you pipe it into something else. Unfortunately, it'll take [a-z] and [0-9], but not [a-z0-9] or [0123456789abcdefghijklmnopqrstuvwxyz], so you have to give it four blocks of stuff to fetch:

Something like:

Code:
BASE="http://website/xxxxx[00-99]"
TAIL="yyyyy"

# Fetch all pages with curl, feed them through awk, print all pages containing 'searchstr'
curl "${BASE}[0-9][0-9]${TAIL}" "${BASE}[0-9][a-z]${TAIL}" "${BASE}[a-z][0-9]${TAIL}" "${BASE}[a-z][a-z]${TAIL}" 2>/dev/null |
        # Split on curl header of --_curl_--.  $1 is the URL following it.  Print all URLs for pages containing 'searchstr'
        awk -v RS="^--_curl_--" -v FS="\n" '/searchstr/ { print  $1 }'


Last edited by Corona688; 10-11-2011 at 07:57 PM..
# 5  
Old 10-11-2011
Might be a foolish question, but how do I install curl? I downloaded it (curl-7.19.7). The instructions from curl itself don't work. Says can't find make. Configuration also fails. And then, how to run your script? Is this all code I need? Thank you!
# 6  
Old 10-12-2011
I was nearly certain that macs came with curl, make sure you don't have it.

If you don't, it'd probably be much easier to install it through fink than to build it yourself by hand.
# 7  
Old 10-14-2011
Thank you so much for your help, but it doesn't work either, I can't install it and even the aid program is useless.
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Monitor Website

Hi all, i need to do the following, when i connect to my website it prints out "Welcome User", but sometimes there are errors like "dictionary not loaded" or "wrong user name or password" so i wanted to make a script that checks that login page, and if i get the Welcome massage do nothing,... (6 Replies)
Discussion started by: charli1
6 Replies

2. Web Development

Which technology is used on this website?

Hi, Home ~ InfoBeans I want to know technology framework using which we can achieve that kind of look and feel Thanks (6 Replies)
Discussion started by: ezee
6 Replies

3. Homework & Coursework Questions

crawler in bash

hello i'm doing a assignment in bash building a crawler.. i downloaded a url into a file and i need to check if the url is an html format and also to save in a different file all the html's links.. please help me thx (1 Reply)
Discussion started by: yanivlug
1 Replies

4. Web Development

my website.please. help me.

hello!! well, i am planning to make my own virtual pet site like that of a neopets. unfortunately i don't have any idea on how to do it.. i've tried searching in the net, but the result is really complicated. i don't know where to begin.i have already drawn some that i think would help... (2 Replies)
Discussion started by: ackiemae
2 Replies

5. UNIX for Dummies Questions & Answers

Website

Hey guys I know you probably get this question a lot but, I want to make a website, and I don't have any experience doing this. I have a iMac and i was wondering if there is someone you could refer me to or a site that will show me how to do it. Thanks. (2 Replies)
Discussion started by: mmecca21
2 Replies

6. Web Development

sitemap.xml from crawler or CMS?

Hello, I have question. Is it better to use site Crawler or scripts from CMS/blog to do that? (2 Replies)
Discussion started by: gstoychev
2 Replies

7. Filesystems, Disks and Memory

website

HELLO FELLOW GEEKS. PLZ CHECK OUT MY FRIENDS SITE AT http://isunshine.dhs.org or u can also join the message board at http://isunshine.dhs.org/scripts/ikonboard.cgi wixifer (1 Reply)
Discussion started by: wixifer
1 Replies
Login or Register to Ask a Question