The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Special Forums > Web Programming, Web 2.0 and Mashups
.
google unix.com



Web Programming, Web 2.0 and Mashups Discuss Web Programming and Web Server Administration, including LAMP, Apache, MySQL, Flash, HTML, SEO, Mashups and other Web APIs and topics.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
current CPU usage, memory usage, disk I/O oid(snmp) S_venkatesh SUN Solaris 2 12-13-2008 06:19 AM
how can I find cpu usage memory usage swap usage and logical volume usage alert0919 HP-UX 3 12-02-2008 02:38 PM
K-Robots 0.5 (Default branch) iBot Software Releases - RSS News 0 07-16-2008 08:10 PM
Monitor CPU usage and Memory Usage Gajanad Bihani High Level Programming 2 03-09-2005 07:35 AM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-25-2009
rickhlwong rickhlwong is offline
Registered User
  
 

Join Date: Jul 2008
Posts: 8
robots.txt usage

Dear all,

I want to use robots.txt to control the "spider". can i specify a IP address to ALLOW the website can be accessed by the "spider"??
thank you.

Rick
  #2 (permalink)  
Old 06-25-2009
figaro figaro is offline
Registered User
  
 

Join Date: Jan 2007
Posts: 267
Almost everything there is to know about robots.txt can be found here: The Web Robots Pages
Hope this helps
  #3 (permalink)  
Old 06-25-2009
fpmurphy's Avatar
fpmurphy fpmurphy is offline Forum Staff  
Moderator
  
 

Join Date: Dec 2003
Location: Florida
Posts: 1,912
Short answer - no.
  #4 (permalink)  
Old 06-26-2009
Neo's Avatar
Neo Neo is offline Forum Staff  
Administrator
  
 

Join Date: Sep 2000
Location: Asia Pacific
Posts: 6,656
Quote:
Originally Posted by rickhlwong View Post
Dear all,

I want to use robots.txt to control the "spider". can i specify a IP address to ALLOW the website can be accessed by the "spider"??
thank you.

Rick
We block unwanted spiders by IP address using ipchains. The robots.txt file uses the User Agent field in HTTP.

Many aggressive spiders do not follow robots.txt and will have to be blocked using something like ipchains or "insert your favorite" firewall tool
  #5 (permalink)  
Old 06-28-2009
rickhlwong rickhlwong is offline
Registered User
  
 

Join Date: Jul 2008
Posts: 8
thank you~~

---------- Post updated at 09:13 AM ---------- Previous update was at 09:07 AM ----------

thank you~~
Sponsored Links
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 10:45 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0