I made a custom 404 Error page this morning. It’s really easy with Mod Rewrite On, because you can catch every URL request with php. In the case of broken or bad requests you may want to tell you visitors that they simply missed… It looks something like this:
if($request == false){
header("HTTP/1.0 404 Not Found");
include('#error.php');
exit();
}
My #error.php file is a simple XHTML file with some custom made shape and content. I put the # character at the beginning of the file’s name, so you can’t access it directly through URL. Now, if you want to visit http://www.hladnik.net/ups, you should get my custom 404 Error response.
I posted this link also on my Twitter account at 11:06 am (CET). In the first 12 minutes I got 18 visits from these bots:
2010-03-07 11:06:23 85.114.136.243 Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
2010-03-07 11:06:23 204.236.249.194 JS-Kit URL Resolver, http://js-kit.com/
2010-03-07 11:06:24 64.13.147.188 Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:25 66.249.71.179 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2010-03-07 11:06:25 174.129.90.99 Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2010-03-07 11:06:26 65.52.26.149 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:27 216.24.142.47 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)
2010-03-07 11:06:28 89.151.116.52 Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
2010-03-07 11:06:32 70.37.70.230 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:35 67.207.201.153 Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:06:53 67.202.7.134 PycURL/7.18.2
2010-03-07 11:06:54 72.13.91.40 Java/1.6.0_18
2010-03-07 11:07:06 79.99.6.106 Twingly Recon
2010-03-07 11:07:46 208.74.66.39 Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:34 142.166.170.104 radian6_linkcheck_(www.radian6.com/crawler)
2010-03-07 11:09:13 142.166.170.103 R6_FeedFetcher(www.radian6.com/crawler)
2010-03-07 11:12:38 204.236.203.128 Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)
2010-03-07 11:18:43 184.73.20.47 Python-urllib/2.5
Four of them also looked for my robots.txt file:
2010-03-07 11:06:24 64.13.147.188 Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:36 67.207.201.153 Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:07:45 208.74.66.36 Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:47 142.166.170.103 R6_FeedFetcher(www.radian6.com/crawler)
And that was it. No more visits after 11:18 am. Is it good or is it bad or it means nothing? These robots didn’t try to visit any other content on my website, although there are 5 links on my404 Error page! So I can say only: “Much Ado About Nothing!”
2010-03-07 11:06:23 204.236.249.194 JS-Kit URL Resolver, http://js-kit.com/
2010-03-07 11:06:24 64.13.147.188 Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:25 66.249.71.179 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2010-03-07 11:06:25 174.129.90.99 Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2010-03-07 11:06:26 65.52.26.149 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:27 216.24.142.47 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)
2010-03-07 11:06:28 89.151.116.52 Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
2010-03-07 11:06:32 70.37.70.230 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:35 67.207.201.153 Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:06:53 67.202.7.134 PycURL/7.18.2
2010-03-07 11:06:54 72.13.91.40 Java/1.6.0_18
2010-03-07 11:07:06 79.99.6.106 Twingly Recon
2010-03-07 11:07:46 208.74.66.39 Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:34 142.166.170.104 radian6_linkcheck_(www.radian6.com/crawler)
2010-03-07 11:09:13 142.166.170.103 R6_FeedFetcher(www.radian6.com/crawler)
2010-03-07 11:12:38 204.236.203.128 Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)
2010-03-07 11:18:43 184.73.20.47 Python-urllib/2.5

