When you post a link on Twitter…
Sunday, March 07th, 2010 | Author:

I made a custom 404 Error page this morning. It’s really easy with Mod Rewrite On, because you can catch every URL request with php. In the case of broken or bad requests you may want to tell you visitors that they simply missed… It looks something like this:

if($request == false){
    header("HTTP/1.0 404 Not Found");
    include('#error.php');
    exit();
 }

My #error.php file is a simple XHTML file with some custom made shape and content. I put the # character at the beginning of the file’s name, so you can’t access it directly through URL. Now, if you want to visit http://www.hladnik.net/ups, you should get my custom 404 Error response.

I posted this link also on my Twitter account at 11:06 am (CET). In the first 12 minutes I got 18 visits from these bots:

2010-03-07 11:06:23	85.114.136.243		Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
2010-03-07 11:06:23	204.236.249.194		JS-Kit URL Resolver, http://js-kit.com/
2010-03-07 11:06:24	64.13.147.188		Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:25	66.249.71.179		Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2010-03-07 11:06:25	174.129.90.99		Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2010-03-07 11:06:26	65.52.26.149		Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:27	216.24.142.47		Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)
2010-03-07 11:06:28	89.151.116.52		Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
2010-03-07 11:06:32	70.37.70.230		Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:35	67.207.201.153		Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:06:53	67.202.7.134		PycURL/7.18.2
2010-03-07 11:06:54	72.13.91.40		Java/1.6.0_18
2010-03-07 11:07:06	79.99.6.106		Twingly Recon
2010-03-07 11:07:46	208.74.66.39		Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:34	142.166.170.104		radian6_linkcheck_(www.radian6.com/crawler)
2010-03-07 11:09:13	142.166.170.103		R6_FeedFetcher(www.radian6.com/crawler)
2010-03-07 11:12:38	204.236.203.128		Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)
2010-03-07 11:18:43	184.73.20.47		Python-urllib/2.5

Four of them also looked for my robots.txt file:

2010-03-07 11:06:24	64.13.147.188		Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:36	67.207.201.153		Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:07:45	208.74.66.36		Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:47	142.166.170.103		R6_FeedFetcher(www.radian6.com/crawler)

And that was it. No more visits after 11:18 am. Is it good or is it bad or it means nothing? These robots didn’t try to visit any other  content on my website, although there are 5 links on my404 Error page! So I can say only: “Much Ado About Nothing!”

2010-03-07 11:06:23    85.114.136.243    Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
2010-03-07 11:06:23    204.236.249.194    JS-Kit URL Resolver, http://js-kit.com/
2010-03-07 11:06:24    64.13.147.188    Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:25    66.249.71.179    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2010-03-07 11:06:25    174.129.90.99    Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2010-03-07 11:06:26    65.52.26.149    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:27    216.24.142.47    Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)
2010-03-07 11:06:28    89.151.116.52    Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
2010-03-07 11:06:32    70.37.70.230    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:35    67.207.201.153    Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:06:53    67.202.7.134    PycURL/7.18.2
2010-03-07 11:06:54    72.13.91.40    Java/1.6.0_18
2010-03-07 11:07:06    79.99.6.106    Twingly Recon
2010-03-07 11:07:46    208.74.66.39    Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:34    142.166.170.104    radian6_linkcheck_(www.radian6.com/crawler)
2010-03-07 11:09:13    142.166.170.103    R6_FeedFetcher(www.radian6.com/crawler)
2010-03-07 11:12:38    204.236.203.128    Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)
2010-03-07 11:18:43    184.73.20.47    Python-urllib/2.5
Category: Web development  | Tags: , ,