 |
| Our robots accesses your page too often? |
To prevent overloading of web servers our crawlers are designed not to visit a web server more than once
in a few seconds. Nevertheless, since
several closed web crawlers are running simultaneously at btbot two different crawlers may access the same
web server at once. If our robots visit your web server too often please send us an email to
btbot@btbot.com to report your problem so that we can fix it.
|
| |
 |
| Why does the robot try to download a file called robots.txt? |
The robots.txt contains rules defined by the robot exclusion standard. Each web crawler should
check for this file which tells the robot what parts of the web server are allowed to be visited
and what parts are not. Each crawler maintains a cache of all robots.txt files that it has been
downloaded and which is updated periodically.
|
| |
 |
| How can I avoid the robots from crawling my web site or parts of my web site? |
To avoid our crawlers from crawling your web site you should use the robot exclusion standard. Create a
robots.txt file and place it into the root directory of your web site. The file may contain the
following lines:
User-agent: btbot (* for all robots)
Disallow: /path_1 (/ for the complete web site)
Disallow: /path_2
For more details about the robot exclusion standard we refer to
The Robots Exclusion Protocol.
|
| |
 |
| Why does the robot download the robots.txt so often? |
Each crawler maintains its own cache for all downloaded robots.txt files which
has to be updated from time to time (usually once within 24 hours) so that in the worst case the number of
downloads is equal to the number of running crawlers.
|
| |
 |
| What parts of my web sites will be analyzed? |
All parts of web sites are analyzed, except blocks that are commented out with
<!-- .. -->. Also scripts within <script>...</script> and sources for
frames will not be evaluated.
|
| |