The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Special Forums > Web Programming, Web 2.0 and Mashups
.
google unix.com




View Single Post in the UNIX and Linux Forums - Click on the Thread or Permalink to View Entire Thread -->
  #4 (permalink)  
Old 06-26-2009
Neo's Avatar
Neo Neo is online now Forum Staff  
Administrator
  
 

Join Date: Sep 2000
Location: Asia Pacific
Posts: 6,728
Quote:
Originally Posted by rickhlwong View Post
Dear all,

I want to use robots.txt to control the "spider". can i specify a IP address to ALLOW the website can be accessed by the "spider"??
thank you.

Rick
We block unwanted spiders by IP address using ipchains. The robots.txt file uses the User Agent field in HTTP.

Many aggressive spiders do not follow robots.txt and will have to be blocked using something like ipchains or "insert your favorite" firewall tool