- Treating search engines as part of a group (i.e. 'Spiders') but not actually letting the group access the board
- Skin parser errors no a skin only spiders use
- Modifications to the session authentication routine that only cause errors for spiders
The easiest way I can explain to do this is with the "Live HTTP Headers" Firefox plugin, freely available here. After you install this plugin, you will need to restart your browser. When you open it back up, under the Tools menu, the last option will now say "Live HTTP headers".
Load up your site in a Firefox tab, then go to Tools and select this new option. Click a link on your forums installation (a simple one, such as the first entry in your navigation bar), and then look at the Live HTTP Headers log. There will likely be several entries, as images, css, and javascript will all be downloaded when you request the web page as well. The first entry, however, is the most important in this case.
You will see something like this:
http://localhost/forums/ GET /forums/ HTTP/1.1 Host: localhost User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: anonlogin=-1; forum_read=a%3A3%3A%7Bi%3A2%3Bi%3A1171022792%3Bi%3A3%3Bi%3A1170814046%3Bi%3A5%3Bi%3A1170597992%3B%7D; rte-sidepanel=open; pass_hash=0; member_id=0; ipb_stronghold=736dd35a22b3eb4c2cead48c00041a09; ipb-myass-div=440,200; collapseprefs=; topicmode=linear; session_id=fde47d5090df3a342a997f1d8b8029f3; ipb_admin_session_id=3f7d293ed750305788db443d12b79a97 HTTP/1.x 200 OK Date: Mon, 12 Feb 2007 10:23:38 GMT Server: Apache/2.0.59 (Win32) PHP/5.2.0 X-Powered-By: PHP/5.2.0 Set-Cookie: member_id=0; path=/; httponly Set-Cookie: pass_hash=0; path=/; httponly Set-Cookie: session_id=fff0544cc6eedc062b2b83d5deef3015; path=/; httponly Content-Encoding: gzip Vary: Accept-Encoding Content-Length: 5263 Keep-Alive: timeout=15, max=99 Connection: Keep-Alive Content-Type: text/html
These are the headers sent to your server (the first the block), and received from your server (the second block). These headers dictate how the webserver is going to see and respond to you throughout it's session.
Click the first line that says
http://localhost/forums/
and then click the button at the bottom left that says "Replay". A new window will open with the original request headers in a text box at the top half of the screen. One of those request headers will be a User Agent header, and this is the header we must modify to be seen as a search engine spider.
In my case it looks like
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9
What we should do is edit this header, so that we will be seen as a google bot, or some other search engine spider. It does not matter, effectively, which one we mirror. The only end result to consider is that this visit will be logged in the ACP. If you wish to delete the log later, you will want to use a search engine spider not already logged to your site. If the log entry does not matter to you, then it's easiest to just mimick google. Change your user agent line like so:
User-Agent: googlebot
And then click replay - when the page loads, IPB will have recognized you as a spider. You should see any privileges given to the search engine spider group, and the skin forced on the search engine spiders. You may or may not see the spider listed in the active users on the first page load - I've noticed typically with IPB you have to click "Replay" a second time to see "Google.com" listed in the active users at the bottom of the page - however you are still recognized as a spider regardless.
If you wish to visit another page, in the Live HTTP Replay window, just change the url in the top line.
change_url.jpg (101.16K)
Number of downloads: 351
Remember - you can only view pages you have allowed (and IPB allows) search engines to see. And don't forget to take a look at non-IPB pages, such as your home pages, or any other pages you serve to visitors, to make sure search engines see them the same way you expect them to be seen.

Sign In
Register
Help



MultiQuote


