Preventing Web Site Downloading Using robots.txt
The first step is to disallow the downloading programs in your robots.txt file. To do this, you will need to define which bad robots you wish to disallow.
Disallowing bad programs in robots.txt does not prevent all web site downloading, because many bad programs simply ignore the contents of robots.txt and do what they want to do.
Preventing Web Site Downloading Using User Agent Blocking in httpd.conf
Another method is to exclude the downloading programs user agent in httpd.conf.
Add every agent you wish to exclude to httpd.conf:
SetEnvIfNoCase User-Agent ^Httrack keep_away
SetEnvIfNoCase User-Agent ^Offline Explorer keep_away
SetEnvIfNoCase User-Agent ^psbot keep_away
SetEnvIfNoCase User-Agent ^Teleport keep_away
SetEnvIfNoCase User-Agent ^WebCopier keep_away
SetEnvIfNoCase User-Agent ^WebReaper keep_away
SetEnvIfNoCase User-Agent ^Webstripper keep_awayOrder Allow,Deny
Allow from all
Deny from env=keep_away
User agent blocking also does not prevent all web site downloading, because the user can delete his user agent or spoof it to appear to be Internet Explorer or another common browser.
Preventing Web Site Downloading Using User Agent Blocking in PHP
If the content you are attempting to protect is in PHP, you may be interested in the user agent blocking technique described in Deny Spambots and Prevent Email Harvesting.
Follow Us!