Archive for March 7th, 2010

Get IMDB ID (tt number) from movie title
Sunday, March 07th, 2010 | Author:

I searched on the internet if there is a way to get the IMDB movie ID from movie’s title. As you know the Internet Movie Database has data about every movie on the Earth ever made. Every movie has it’s own ID number which begins with letters tt followed by 7 numbers. The exact URL of a movie looks something like http://www.imdb.com/title/tt2345678/.

The IMDB has some RSS channels but nothing about that. Of course, you can use their search, but what if you need this function in your php script? Imagine parsing some TV guide RSS channel… wouldn’t it be nice to create the IMDB link on every movie you could watch that day?

Here is my class:

class IMDBtt{

	function tt($title){
		$t = urlencode($title);
		$content = $this->getSite('http://www.imdb.com/find?s=tt&q=' . $t);
		$content = substr($content, strpos($content, 'div id="main"'));
		preg_match('/tt[0-9]{7}/',$content,$matches);
		return reset($matches);
	}

	private function getSite($url){
		$ch = curl_init($url);

		curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
		curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,5);
		curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
		curl_setopt($ch,CURLOPT_MAXREDIRS,1);

		$data = curl_exec($ch);
		curl_close($ch);
		return $data;
	}

}

Usage:

$IMDB = new IMDBtt();
$tt = $IMDB->tt('Avatar');
//returns tt0499549

You can create your <a> link like this:

<a href="http://www.imdb.com/title/<?=$tt;?>">Avatar</a>

There is not much more to say about that. Maybe just a little note: if IMDB finds only one match for a movie title, it immediately redirects to it’s site. Therefore I had to use options:

curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_MAXREDIRS,1);

to follow redirect (only once).

If you are working on your localhost you could get error: Call to undefined function curl_init()

  1. Open C:\xampp\apache\bin\php.ini
  2. Remove the semi-colon in front of this line: extension=php_curl.dll
  3. Restart Apache

Demo here.

Category: Web development  | Tags: ,  | 7 Comments
When you post a link on Twitter…
Sunday, March 07th, 2010 | Author:

I made a custom 404 Error page this morning. It’s really easy with Mod Rewrite On, because you can catch every URL request with php. In the case of broken or bad requests you may want to tell you visitors that they simply missed… It looks something like this:

if($request == false){
    header("HTTP/1.0 404 Not Found");
    include('#error.php');
    exit();
 }

My #error.php file is a simple XHTML file with some custom made shape and content. I put the # character at the beginning of the file’s name, so you can’t access it directly through URL. Now, if you want to visit http://www.hladnik.net/ups, you should get my custom 404 Error response.

I posted this link also on my Twitter account at 11:06 am (CET). In the first 12 minutes I got 18 visits from these bots:

2010-03-07 11:06:23	85.114.136.243		Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
2010-03-07 11:06:23	204.236.249.194		JS-Kit URL Resolver, http://js-kit.com/
2010-03-07 11:06:24	64.13.147.188		Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:25	66.249.71.179		Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2010-03-07 11:06:25	174.129.90.99		Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2010-03-07 11:06:26	65.52.26.149		Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:27	216.24.142.47		Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)
2010-03-07 11:06:28	89.151.116.52		Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
2010-03-07 11:06:32	70.37.70.230		Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:35	67.207.201.153		Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:06:53	67.202.7.134		PycURL/7.18.2
2010-03-07 11:06:54	72.13.91.40		Java/1.6.0_18
2010-03-07 11:07:06	79.99.6.106		Twingly Recon
2010-03-07 11:07:46	208.74.66.39		Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:34	142.166.170.104		radian6_linkcheck_(www.radian6.com/crawler)
2010-03-07 11:09:13	142.166.170.103		R6_FeedFetcher(www.radian6.com/crawler)
2010-03-07 11:12:38	204.236.203.128		Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)
2010-03-07 11:18:43	184.73.20.47		Python-urllib/2.5

Four of them also looked for my robots.txt file:

2010-03-07 11:06:24	64.13.147.188		Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:36	67.207.201.153		Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:07:45	208.74.66.36		Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:47	142.166.170.103		R6_FeedFetcher(www.radian6.com/crawler)

And that was it. No more visits after 11:18 am. Is it good or is it bad or it means nothing? These robots didn’t try to visit any other  content on my website, although there are 5 links on my404 Error page! So I can say only: “Much Ado About Nothing!”

2010-03-07 11:06:23    85.114.136.243    Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
2010-03-07 11:06:23    204.236.249.194    JS-Kit URL Resolver, http://js-kit.com/
2010-03-07 11:06:24    64.13.147.188    Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:25    66.249.71.179    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2010-03-07 11:06:25    174.129.90.99    Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2010-03-07 11:06:26    65.52.26.149    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:27    216.24.142.47    Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)
2010-03-07 11:06:28    89.151.116.52    Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
2010-03-07 11:06:32    70.37.70.230    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:35    67.207.201.153    Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:06:53    67.202.7.134    PycURL/7.18.2
2010-03-07 11:06:54    72.13.91.40    Java/1.6.0_18
2010-03-07 11:07:06    79.99.6.106    Twingly Recon
2010-03-07 11:07:46    208.74.66.39    Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:34    142.166.170.104    radian6_linkcheck_(www.radian6.com/crawler)
2010-03-07 11:09:13    142.166.170.103    R6_FeedFetcher(www.radian6.com/crawler)
2010-03-07 11:12:38    204.236.203.128    Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)
2010-03-07 11:18:43    184.73.20.47    Python-urllib/2.5
Category: Web development  | Tags: , ,  | 6 Comments