Tag-Archive for ◊ php ◊

Storing images from internet to mysql
Sunday, September 05th, 2010 | Author:

There is a lot of articles about storing images to mysql database, but they are mostly related to uploaded images. What about images that are available on the internet? I must say: very simple.

First you have to get the contents of image file:

$url = 'http://www.editor.si/template/images/logo.gif';
 
$image = file_get_contents($url);

That’s it. To store the $image to mysql database you should also put the addslashes() around it to avoid broken query.

You may also need to save the image’s size (length) and type to successfully retrieve it and show it from your database later.

Category: Web development  | Tags: ,  | One Comment
Parsing HTML tables with simpleXML
Sunday, July 25th, 2010 | Author:

Sometimes it still happens that you have to parse the entire HTML tables from other websites and many times that is the only way to do it. Here is a little tip how to make it simple with php’s simpleXML.

First you have to get the website with file_get_contents($url) and extract the table out (with preg_match or substr).

When you have the entire table in the $table variable, just put the <?xml version=”1.0″?> in front of it. That is necessary to call the simplexml_load_string($table) function. The table must be xhtml compliant otherwise the simplexml would raise errors.

The last step is the foreach($xml->children() as $tr){} loop, where you can access any cell row by row and get the data out of it. Thanks to the simplexml the data is already parsed out of the HTML tags and ready for use.

Example:

//get page
$url = 'http://www.apache.org/server-status';
$content = file_get_contents($url);
//get table
$start = strpos($content, '<table');
$end = strpos($content, '</table>') + 8; //length of </table>
$table = substr($content, $start, $end - $start);
//make it usable
$table = '<?xml version="1.0"?>' . str_replace('nowrap', '', $table);
$xml = simplexml_load_string($table);
//go through data, I need just 13. cell
foreach($xml->children() as $tr){
     if(isset($tr->td[12])) echo $tr->td[12].'<br/>';
}
Category: Web development  | Tags: , ,  | Leave a Comment
Generate an unique ID with PHP
Wednesday, June 30th, 2010 | Author:

While creating a webshop I had to generate some unique IDs for market orders. The autoincrement numeric ID is simply not good enough, because someone (for example your competition) could track the number of your orders. So I had to create a simple system to apply some unique ID number to every order that would not reveal how many orders we had so far.

I quickly realised that the md5 hash is too long, because the order code should be much shorter to use it on documents.

The only unique value, I could imagine, is time. If I write a php time() value it is already unique and will never repeat again. Of course you could have two or more orders in the same second of time, so it has to be more specific.

Here is my function:

function uniqueID(){

$time = microtime(1);
$parts = explode(‘.’, (string)$time);
return strtoupper(strrev(dechex($parts[0]) . dechex($parts[1]))) . dechex(rand());

}

//Returns for example 75020CB2C417f

I tested it with this loop:

for($i = 0; $i < 100; $i++){
$array[] = uniqueID();
}

and the $array was always unique! Of course all 100 unique IDs were generated in the same milisecond, but the rand() function ensures that you have no duplicates. I used the dechex() to get shorter results and strrev() to ensure that the result is not so obvious.

The rest is up to you. You should also try to insert some dashes to get more “document look” number, for example 75-020CB2C4-17f. And even if somebody could crack this unique ID, the only result would be the exact time of the order and nothing more!

Category: Web development  | Tags: ,  | Leave a Comment
Login and remember me on many computers
Sunday, May 30th, 2010 | Author:

Everything has already been said and written about login and remember me. I’ve tried some different techniques and the best one seems to be “the clientside cookie”. Of course you cannot reveal the users password, but you have to store something else into the cookie.

So far I used a special cookie field for each user, which is a random md5 hash. If your visitor owns the ‘remember’ cookie, you just have to check: SELECT * FROM users WHERE cookie = ‘$_COOKIE[remember]‘.

When some users logs in with the ‘remember me’ checkbox checked, you have to write some random md5 hash into the cookie field and set a cookie with that value to the client’s browser.

What happens if some user wants to be remembered on more computers at the same time?

  1. You can ignore it. Just write a new md5 hash every time and users won’t be remembered anywhere else.
  2. Check if the cookie field exists and set it’s value to the cookie. If the field is empty you fill it first. You can fill it with some random hash also at the very creation of a new user…
  3. You don’t need the cookie field in the database, just create a random value from the other data, for example md5($id . $username . $email . $datefield) and put it into the cookie! The database query should be slightly changed to: SELECT * FROM users WHERE MD5(CONCAT(id, username, email, datefield)) = ‘$_COOKIE[remember]‘.

Be sure to check and escape the $_COOKIE variable before inserting into the query to avoid the SQL injection attack! I wrote it into the query just to simplify this post.

Category: Web development  | Tags: , ,  | 3 Comments
Simple chart from <div>s
Sunday, March 21st, 2010 | Author:

Here is a simple PHP function for generating simple charts from div elements.

function chart($array, $size){
	$max = max($array);
	$ratio = $size / $max;
	$out = '';

	foreach($array as $el){
		$width = ceil($el * $ratio);
		$out .= "<div class=\"chart\" style=\"width: ".$width."px;\">$el</div>\n";
	}

	return $out;
}

There is a lot of room for improvements but I kept it as simple as possible deliberately. Many times you can find useful PHP scripts on the internet which are so complex that they aren’t useful any more.

Usage:

$test = array (35070, 24440, 4730, 35700, 29380, 22860, 28870, 22730, 26270);

echo chart($test, 400);

You can style your chart elements as you like, of course!

That’s it, you can check demo here!

Category: Web development  | Tags: ,  | Leave a Comment
Get IMDB ID (tt number) from movie title
Sunday, March 07th, 2010 | Author:

I searched on the internet if there is a way to get the IMDB movie ID from movie’s title. As you know the Internet Movie Database has data about every movie on the Earth ever made. Every movie has it’s own ID number which begins with letters tt followed by 7 numbers. The exact URL of a movie looks something like http://www.imdb.com/title/tt2345678/.

The IMDB has some RSS channels but nothing about that. Of course, you can use their search, but what if you need this function in your php script? Imagine parsing some TV guide RSS channel… wouldn’t it be nice to create the IMDB link on every movie you could watch that day?

Here is my class:

class IMDBtt{

	function tt($title){
		$t = urlencode($title);
		$content = $this->getSite('http://www.imdb.com/find?s=tt&q=' . $t);
		$content = substr($content, strpos($content, 'div id="main"'));
		preg_match('/tt[0-9]{7}/',$content,$matches);
		return reset($matches);
	}

	private function getSite($url){
		$ch = curl_init($url);

		curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
		curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,5);
		curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
		curl_setopt($ch,CURLOPT_MAXREDIRS,1);

		$data = curl_exec($ch);
		curl_close($ch);
		return $data;
	}

}

Usage:

$IMDB = new IMDBtt();
$tt = $IMDB->tt('Avatar');
//returns tt0499549

You can create your <a> link like this:

<a href="http://www.imdb.com/title/<?=$tt;?>">Avatar</a>

There is not much more to say about that. Maybe just a little note: if IMDB finds only one match for a movie title, it immediately redirects to it’s site. Therefore I had to use options:

curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_MAXREDIRS,1);

to follow redirect (only once).

If you are working on your localhost you could get error: Call to undefined function curl_init()

  1. Open C:\xampp\apache\bin\php.ini
  2. Remove the semi-colon in front of this line: extension=php_curl.dll
  3. Restart Apache

Demo here.

Category: Web development  | Tags: ,  | 7 Comments
When you post a link on Twitter…
Sunday, March 07th, 2010 | Author:

I made a custom 404 Error page this morning. It’s really easy with Mod Rewrite On, because you can catch every URL request with php. In the case of broken or bad requests you may want to tell you visitors that they simply missed… It looks something like this:

if($request == false){
    header("HTTP/1.0 404 Not Found");
    include('#error.php');
    exit();
 }

My #error.php file is a simple XHTML file with some custom made shape and content. I put the # character at the beginning of the file’s name, so you can’t access it directly through URL. Now, if you want to visit http://www.hladnik.net/ups, you should get my custom 404 Error response.

I posted this link also on my Twitter account at 11:06 am (CET). In the first 12 minutes I got 18 visits from these bots:

2010-03-07 11:06:23	85.114.136.243		Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
2010-03-07 11:06:23	204.236.249.194		JS-Kit URL Resolver, http://js-kit.com/
2010-03-07 11:06:24	64.13.147.188		Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:25	66.249.71.179		Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2010-03-07 11:06:25	174.129.90.99		Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2010-03-07 11:06:26	65.52.26.149		Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:27	216.24.142.47		Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)
2010-03-07 11:06:28	89.151.116.52		Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
2010-03-07 11:06:32	70.37.70.230		Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:35	67.207.201.153		Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:06:53	67.202.7.134		PycURL/7.18.2
2010-03-07 11:06:54	72.13.91.40		Java/1.6.0_18
2010-03-07 11:07:06	79.99.6.106		Twingly Recon
2010-03-07 11:07:46	208.74.66.39		Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:34	142.166.170.104		radian6_linkcheck_(www.radian6.com/crawler)
2010-03-07 11:09:13	142.166.170.103		R6_FeedFetcher(www.radian6.com/crawler)
2010-03-07 11:12:38	204.236.203.128		Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)
2010-03-07 11:18:43	184.73.20.47		Python-urllib/2.5

Four of them also looked for my robots.txt file:

2010-03-07 11:06:24	64.13.147.188		Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:36	67.207.201.153		Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:07:45	208.74.66.36		Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:47	142.166.170.103		R6_FeedFetcher(www.radian6.com/crawler)

And that was it. No more visits after 11:18 am. Is it good or is it bad or it means nothing? These robots didn’t try to visit any other  content on my website, although there are 5 links on my404 Error page! So I can say only: “Much Ado About Nothing!”

2010-03-07 11:06:23    85.114.136.243    Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot
2010-03-07 11:06:23    204.236.249.194    JS-Kit URL Resolver, http://js-kit.com/
2010-03-07 11:06:24    64.13.147.188    Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
2010-03-07 11:06:25    66.249.71.179    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2010-03-07 11:06:25    174.129.90.99    Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2010-03-07 11:06:26    65.52.26.149    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:27    216.24.142.47    Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)
2010-03-07 11:06:28    89.151.116.52    Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
2010-03-07 11:06:32    70.37.70.230    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
2010-03-07 11:06:35    67.207.201.153    Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
2010-03-07 11:06:53    67.202.7.134    PycURL/7.18.2
2010-03-07 11:06:54    72.13.91.40    Java/1.6.0_18
2010-03-07 11:07:06    79.99.6.106    Twingly Recon
2010-03-07 11:07:46    208.74.66.39    Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8
2010-03-07 11:08:34    142.166.170.104    radian6_linkcheck_(www.radian6.com/crawler)
2010-03-07 11:09:13    142.166.170.103    R6_FeedFetcher(www.radian6.com/crawler)
2010-03-07 11:12:38    204.236.203.128    Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)
2010-03-07 11:18:43    184.73.20.47    Python-urllib/2.5
Category: Web development  | Tags: , ,  | 6 Comments
My approach to detecting and trapping bots
Saturday, February 27th, 2010 | Author:

Hi, there is another post from me. You should know something about PHP, MySQL and HTTP protocol to understand it well. It’s not my intention to describe how to manage statistics on your website, I am just illustrating it in order to explain how do I detect and trap bots.

I have a website with Mod Rewrite on, because I redirect everything to index.php document no matter what ever you type into URL line. To be more specific I redirect everything to index.php?q=*, so I can use the $_GET['q'] variable to manage different URLs.

The next step is logging every single visit into MySQL database. In order to do that I have a table of visits which looks something like this:

CREATE TABLE `visits` (
    `id` int(20) NOT NULL AUTO_INCREMENT,
    `datefield` datetime NOT NULL,
    `ip` varchar(100) NOT NULL,
    `useragent` varchar(255) NOT NULL,
    `uri` varchar(255) NOT NULL,
    `referer` varchar(255) NOT NULL,
    `session` varchar(32) NOT NULL,
    PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;

When somebody visits my website I can insert some data into that DB table:

$ip = $_SERVER['REMOTE_ADDR'];
$agent = $_SERVER['HTTP_USER_AGENT'];
$uri = $_SERVER['REQUEST_URI'];
$referer = $_SERVER['HTTP_REFERER'];
$session = session_id();
mysql_query ("
    INSERT INTO visits(datefield, ip, useragent, uri, referer, session)
    VALUES (NOW(), '$ip', '$agent', '$uri', '$referer', '$session')
");

OK, so that’s really easy. The table visits represents raw data about every single visit and this is only the beginning. If you want to get some real benefit out of your statistics you should create some statistics summary and collect data into some useful information: daily hits, unique hits, referrers, bots visits, users browsers, users operation systems and so on.

There are many solutions to get it done right but as I said before it was not my intention to talk about that. Let’s just concentrate on bots visits. As you know there are many robots crawling through the web and collecting data from websites. It’s allways good to know who they are and what are they doing on your website. It’s also useful to trap and redirect bad robots away.

Look again at the table visits and check column useragent. It holds data about users browsers and it looks like Mozilla/5.0 (Windows; U; Windows NT 6.1; sl,en:us; rv:1.9.2) Gecko/20100115 Firefox/3.6 (.NET CLR 3.5.30729) when the user is human and something like Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) when the user is a robot. I could look for a word ‘bot’ in my useragent column and I should find most of them really easy.

But I found even easier way to do that. It’s true, robots are not very smart. They can’t resist trying to open the document robots.txt. When a robot comes around there is a huge possibility that it would search for robots.txt file. Of course, I don’t have one. If there would be such a file, a robot would open it and so it would slip around my statistics collector. But in my case I just see in my $GET['q'] variable that he wanted a robots.txt file (but my .htaccess file redirects him to index.php script).

That’s first step how can I detect bots because humans don’t search for robots.txt file very often. In addition with ‘bot’ word in useragent column I can be pretty sure if you are a human or you are just another bot. Of course I don’t like bots to go to index.php when they are requesting robots.txt file. So right after when I insert it’s visit into my database I create a fake robots.txt file with PHP code:

if($_GET['q']=='robots.txt'){
    $text = "User-agent: *\r\nDisallow: /email-list/";
    header("Content-Type: text/plain");
    echo $text;
    exit();
}

There is a slight trap for bad robots included. As you can see the robot requests robots.txt file and gets:

User-agent: *
Disallow: /email-list/

Good robots obey and don’t try to access the email-list folder. But bad robots do just that! They immediately try to get into my email-list folder… which doesn’t exist, of course! It’s a simple trap which helps me to separate good robots from bad ones. I have a separate table in my database just for robots where I specify if a robot is good or bad.

So, that’s it. It is up to your imagination what to do with bad robots. You can simply write them some die(‘spammer’); command, you can trap them into some PHP script and have fun with them or you can immediately redirect them to www.google.com! You can do whatever you want and live happily ever after!

Category: Web development  | Tags: , , , ,  | 2 Comments