PHP is a great language for developing dynamic web sites. Some do it for fun while others for
business. It is true that a great part of the web is in English. However, if you are
targeting a worldwide audience, then neither English nor Esperanto alone is an option.
If you need to deliver content in several languages, it is a good idea to explore several
alternatives. However, some alternatives may not be suitable for dynamic websites. Added to that, there is
the overhead of time spent in maintenance. To further complicate things, your needs may not be totally in line with
the resources you have at your disposal. Therefore, it is advisable to choose an alternative
that suits you best.
I found myself in positions which required me to deliver content in both English and Spanish, and in one
project a third language. Here are the possibilities I explored:
- Explicit links for each language
- Use Apache's mod_negotiation
- Use GNU Gettext support in PHP
- Write your own
This article gives a brief introduction to the first three possibilities, but then we will go about the fourth
solution which suited the requirements best, given the set of constraints. I am assuming that the reader is at least
familiar with PHP programming and the use of PHP classes.
Principles of content negotiation
Before we go into exploring the various options, we should understand the basics of content
negotiation and how that applies to the development framework. Then, you will be able to
develop a web application that can deliver its content in the language of choice of your
visitor.
By simply configuring the web browser, the user can set it up in a way that his or her
preferred language is used when available. Several languages can be specified in a prioritized
list, by setting up the preferences or option of the browser.
And this list of preferred languages on every request made to the site. This
action is totally transparent to the user, as the information gets sent in the Accept-Language
header, for example:
Accept-Language: bg, es, en-US, fr
Here our visitor has chosen Bulgarian, US English, Spanish and French in that order. Notice
that you can even specify regional variants. The first two characters are a language code as
specified in an ISO standard. This language code may be followed by a dash and a region
code.
As an example, if the request arrives to a website whose content is entirely in Russian, then
the list is exhausted and the visitor will get Russian text whether (s)he likes it or not. Now,
assuming the website has both English and Spanish content (the 2nd and 3rd options), then
the visitor will receive pages in Spanish. Why? Simply because here Spanish had higher
priority with respect to English.
Sometimes, the web server itself can manage the content negotiation, if configured to do so.
Otherwise, the request for a particular language is ignored. Alternatively, the application that
delivers the content takes the decision of which language it is to use. This is exactly what we
will do later.
Before going further, I would like to point out that the content negotiation is not just
dealing with human languages. For example, it also negotiates the kind of information the
client can take by means of MIME types, but that is beyond the scope of this article.
Explicit links
Many multi-lingual websites present the content in various languages, and do so by placing a
link on the document. There would be one link for each of the supported languages. This is a
very simplistic approach and should only be used if you need to have multi-lingual
content, but do not have the resources of a scripting language or dynamic content.
If a document is moved, or a new language is added or removed from the repertoire, then
the webmaster would have to edit, add or remove links in each of the affected documents.
This can be quite tedious.
Apache's content negotiation
The Apache web server can manage language-sensitive content delivery by using the
information from the content negotiation headers. Then, the webmaster must provide the
static pages for each language and name them properly. For example if the welcome page is
available in Spanish and English, the webmaster would have these two files:
welcome.html.es
welcome.html.en
When the web server is well configured, it will deliver the appropriate web page based on
the language code according to the priority list.
This works perfectly for static pages. However, if you have a dynamic website where a great deal of the
pages is generated based on queries, then this approach will not work. Another
disadvantage is that you need to know how to do it and you may or may not have access to
the configuration files. My experience was that it was a bit tricky and it did not offer enough
flexibility for my purposes.
An advantage of this method is that the negotiation is between the browser and the Apache
server. You need only to provide the static content.
GNU Gettext with PHP
This internationalization tool has been around for some time for C programmers. There is
also a variant used on other Un*x, such as HP. Both are very good and are easy to use.
This extension has been available in PHP since version 3.0.6 and also in 4.0. The Gettext
extension is easy to use, and is good if you are generating your webpages dynamically. The
only thing left here would be the PHP code that generates the content and a set of message
catalogs. Supporting a new language is as easy as generating a new catalog with the
translations and dropping the file in the appropriate directory. Therefore, assuming you have a PHP
application named "myphp" and that the appropriate message catalogs exist and are installed,
then the application would have something like this:
<?php
/* Initialization of GetText in myphp */
putenv("LANG=$language");
bindtextdomain("myphp","./locale");
textdomain("myphp");
/* Print some messages in the native language */
echo gettext("Hello new user");
echo _("You have no new messages");
?>
My provider had recently upgraded from PHP 3.0RC5 to a PHP4.0 Beta 2 installation.
While PHP4 does have support for the Gettext extension, my provider did not compile the
PHP4 module with Gettext support. However, even if they
had, moving to another provider without Gettext would become a major headache.
CtNls - National Language Support
Having considered various alternatives, I wrote down a set of requirements for the NLS
module:
- It had to be simple, and based in PHP
- Easy to use
- Allow the user the freedom of choosing or mixing NLS methods
As a result, I developed a PHP class that would allow the user to set up a
multi-lingual website. With this class, the developer can emulate the Apache approach of one file
per language, without reconfiguring Apache. Additionally, it is possible to use a
PHP script that generates the output in the appropriate language by means of message
catalogs.
A real life application of this class is on my website at
www.coralys.com. The
main page is available in English (en), Spanish (es) and Dutch (nl).
Using CtNls
This class is very easy to use. The application would have something like this in its
initialization code:
<?php
include('ctnls.class.php');
$nls = new CtNls;
?>
At that point, the class has detected the choice of languages suggested by the user behind
the browser. What happens next depends on which method the application uses to
provide multi-lingual content.
For static web pages, the approach is similar to the Apache convention. The name of the
webpage comprises three parts: the base name, the language extension (two letter code per
ISO standard) and the extension which would typically be ".html". For example a welcome
page would be:
welcome.en.html
welcome.es.html
Notice that each page contains a different language. At this point, your PHP application
would serve the file in the appropriate language, by specifying the whole name without the
language ID/code. The CtNls class inserts the appropriate ID/code:
<?php
$nls->LoadPage("welcome.html");
?>
Here you can specify any filename that is valid to the PHP virtual() function, which is how
LoadPage() delivers the content to the browser.
Another way to deliver the content in various languages is by using message catalogs. In this
case, there is one PHP file for each language. It must be a valid PHP file because the
message catalog is nothing more than a collection of PHP constants, each referring to a
particular message. The reasons why we use constants and not variables are twofold. First, the
message does not change and by definition it is a constant. Second and most important, is
that in PHP, constants are accessible at any place, even within functions. Whereas, if we use
variables for the messages, we would have to resort to using a "global" declaration for each
message used within a function.
The catalogdir class variable specifies the location of all message catalogs. It is an absolute
path within the server, where each application will have its message catalog. The name of
each catalog file has three parts: the application ID, the language code and a valid PHP
extension so that it can be include()ed by CtNls.
Now, assuming we have an application called "myapp," we could use the following code at the
initialization part:
<?php
$nls->LoadMessages("myapp");
?>
This, as well as LoadPage(), can only be used after you have created a new instance of the
CtNls object. This method expects at least one file (with the default language) in the
message catalog. In this case, myapp.en.php, and inside, it must have valid constant
declarations for each message. For example:
<?php
// Language: English
// File: path/to/nls/myapp.en.php
define("MSG_WELCOME", "welcome to our site");
define("MSG_REDIRECTED", "You are being redirected to our new site");
?
<?php
// Language: Spanish
// File: path/to/nls/myapp.es.php
define("MSG_WELCOME", "Bienvenido a nuestras paginas");
define("MSG_REDIRECTED", "Esta siendo redireccionado a nuestro " .
"nuevo sitio de web");
?>
Only one of these will be loaded, depending on the language priority assigned by the user.
Then, somewhere in the PHP application, you can generate output like this:
<?php
echo MSG_WELCOME . "<br>";
echo MSG_REDIRECTED;
?>
Auxiliary functions
The CtNls class has several auxiliary member functions. These are not normally used, but can
come in handy.
The SetCatalogPath function overrides the path to the catalog directory given in the
configuration part of the CtNls class.
SetDefaultLanguage does just that. It sets the code of the language to be tried in case all else
fails. This is normally English.
SetLanguageId and GetLanguageId set and return the id/code of the first language to check.
This language is checked before any of those found in the user preferences. Upon
initialization, it is the same as the first language in the user list. Once LoadPage() or
LoadMessages() is invoked, GetLanguageId() would return the ID of the language that was
used in delivering the content to the browser. It is also useful if the user has stored
his/her preference in a cookie.
How it actually works
We have already explained what happens during the content negotiation phase. The browser
sends the user's choice of languages in the Accept-Language header. This information is
available to any PHP script in the $HTTP_ACCEPT_LANGUAGE global variable.
When you create a new instance of the CtNls object, the constructor function CtNls() takes
control. In turn, it initializes some internal settings and calls setLanguage(). This internal
member function goes through the list of languages in $HTTP_ACCEPT_LANGUAGE
(delimited by commas), by using only the language code and stripping the region code. The
language code is for selecting the appropriate file. In this process, we convert the list to an
array, but preserve the language priority.
When an application invokes either LoadPage() or LoadMessages(), these methods will call
the internal search() function.
The search() member function walks the language preference array from start to finish and if
all of them fail to produce results, it checks for the default language. This function keeps
track of which languages it tried so that two attempts are not done on the same language.
The getFile() member function is called every time the search() function generates a new
language candidate. It does this by passing it a base name, a language candidate and
optionally a file extension.
By passing the base name, the language code and the file extension in that order, the
getFile() function generates a list of candidate file names. If we are using LoadMessages()
and pass it an empty extension name, it produces a list by using all known valid PHP script
extensions by prefixing them with the path to the catalog directory. These PHP extension
names are currently ".php3" and ".php". If, on the other hand, we pass the file extension to the
function, then, we reduce the list of candidate file names to one.
getFile() then continues by walking through the list of candidate file names until it finds one
that exists. Based on this, we determine the current language and the object reads in that file.
-- Didimo