Saturday, October 17, 2009

How to make your PHP Website UTF8 ready

After many days of frustration I think, I hope, I finally figured out how to configure my server and php code the right way to enable bug free UTF-8 on my website, which certainly took me a day or two to achieve. Lets begin with the...

Browser
From what I have seen is that at least Firefox and IE listen just to the Line 'Content-Type: text/html;' charset=utf-8' in the HTTP header and simply ignore my meta tag on my websites:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
So the first thing I did is to add this line of php code before any text is printed out:
header('Content-type: text/html; charset=utf-8');
But its also possible to achieve the same using the AddDefaultCharSet directive in the apache config.

MySQL Database
MySQL allows collations to be set on each table and text/char field. As the MySQL manual already tells this setting won't concern you at all until you start searching with unicode characters. So you can leave that to whatever setting you want when you don't search for it.

PHP Code
I've ran into several difficulties when writing user input from a form into my database.
  • The first thing I had to change is to append the parameter accept-charset="UTF-8" to all my <form>'s otherwise you'll get some ISO encoded chars in your DB.
  • Secondly, I ran into a bug when using htmlentities() converting umlauts wrongly. I realized you also need to tell htmlentities() the charset in which you want to work on. I solved it with using htmlentities($mytext,ENT_COMPAT,'UTF-8');

AJAX
When I was in the process of getting UTF-8 working I always suspected that my AJAX code / XMLHttpRequest() was some of the cause of the problems. But the only point where you can do something with the charset is by setting the request header (myRequest.setRequestHeader) for the right content-type (just like with the browser http header) before sending your http request. And I guess when you don't explicitly specifiy your charset there the Browser uses the Content-Type charset given from the websites http header.
So no tweaks needed here.

If you got this all your done! Your Site should work for 99.5% of your users.