Firstly, I would like to extend my thanks to Daniel, the author of the largest bot list on the net. You can click on the preceeding link to visit his site, and view his bot lists, email extractors list, spam bot lists, and 'other lists' (which he describes as link checkers, verifiers, etc.). The included Default_Bot_List.txt is a modified version of his original bot list.
In the IPB ACP, you can go to Tools & Settings -> Search Engine Spiders, and one of the settings allows you to enter in a spider mapping (Spider Bot User-Agent). The mapping works like so
(user agent string)=Displayed Bot Name
The "(user agent string)" is a string that should be matched against their HTTP User Agent. The Displayed Bot Name is what they will be shown as on the site, and in the spider logs in the ACP. By default, IPB supplies only 6 spider mappings. They are the most common/largest spiders (arguably) but the list is far from comprehensive.
googlebot=Google.com slurp@inktomi=Hot Bot ask jeeves=Ask Jeeves lycos=Lycos.com whatuseek=What You Seek ia_archiver=Archive.org
What does this mean for you?
Not much, specifically, but the more mappings you have, the more spiders you can recognize, log, and treat specially. This is why we want to enter in a more comprehensive list of mappings so we can monitor different kinds of spiders.
Why not just link to Daniel's original list?
The problem I found with Daniel's list was that there were many duplicates, effectively. Here is an example:
AbachoBOT (Mozilla compatible)=Crawler.de AbachoBOT=Crawler.de
Remembering that the data on the left side is matched against the user agent, and then if a match is found the name on the right side is displayed, what happens here is the forums will try to match against the first string "AbachoBOT (Mozilla compatible)" - if it can, then this name will be used "Crawler.de". If not, the next string is tested "AbachoBOT", and then it's name is used "Crawler.de". The problem is that we can acheive the same end result by ONLY trying to match against the second entry "AbachoBOT" - if a bot matches this entry, it will match both anyways, and there is no benefit to trying to match them separately. Additionally, the larger this list is, the more overhead IPB has to contend with in trying to check for the bots.
By going through Daniel's list, to which I was originally just going to link, and removing the duplicates as in the situation above, I've narrowed down the list by 10kb - this is a huge resource savings when you factor in that this list has to be parsed and loaded into a regular expression on every single page load.
Summary
The bot list is used to recognize spiders, with the option to treat them special (as in put them in a special member group, or force them to use a specific skin), and the ability to log the spider activity on your site. We've included a more comprehensive list than the one included in IPB. You can see the original list that our included list is based off of here, however at the time of this writing, there is no benefit to using the original list over our trimmed list.

Sign In
Register
Help



MultiQuote