Internationalization using gettext

Internationalization refers to the general process of accommodating computer-based services to many different "locales" including different languages.

i18n
internationalization has 18 characters between the 'i' and the 'n'. People who don't like to type a lot of characters shorten Internationalization to i18n. Nothing more is implied.
localization
l10n to the poor typists. The process of defining the adaptions of something that has been internationalized to a particular locale

There exists a uniform way to provide translated text, using the "gettext" function. The computer user sets a "locale" in an environment variable, and each internationalized program supplies the correct language, date and number formatting, etc.
The manual says: "Beside marking the translatable string in the source code and generating the translations the developers do not have anything to do themselves."

However, for a web page, the locale of the server is not what the client wants, so a PHP program must determine the client's preference, and set the appropriate locale temporarily.

So you want to make your PHP page in two -- or more -- languages...

You need to take the following steps:
  1. Set up your own locale directory structure, with subdirectories for each language,
  2. Mark the "messages" that you want to be translated,
  3. Extract the messages from the page, getting a ".po" file
  4. Find a translator for each language, have them fill in the translations in a copy of the .po file, and put the result in that language's subdirectory. Use msgfmt to create a .mo binary file ("machine object") 
  5. Insert code in your page to "bind" or point to, the .mo files: where they are, and what they are named, and also, set the environment variable LANG.

Locale directories

First of all, take note: any LANG setting must exist as a "locale" on your server. This can be determined by the terminal command:  locale -a
For example, Osiris has the locales: fr_CA and fr_CA.utf8   The pattern is language_country.charset, and gettext will gracefully drop the charset and country to find, for example, fr . So you can use fr for your subdirectory, but beware, this will not do for the LANG setting.

  1. I made a subdirectory of public_html, and called it translations/.
  2. In translations, I made a subdirectory fr/ for french.
  3. In each language directly (such as fr/), make another subdirectory, LC_MESSAGES . That is where your .mo files must go.

2. Mark your messages

This is easy. You just put them in double quotes, as the argument to gettext().
Also, to save typing, "gettext" has an alias, "_" (underscore). Before and after examples:
		<h1>Hello World!</h1>
<h1><?php echo _("Hello World"); ?>!</h1>

echo "Book Title: $title<br>/n";
echo _("Book Title")."$title<br>/n";
// note the dot(.) to concatenate strings

3. Extract the messages

The utility xgettext goes through a file, recognizing double-quoted strings within calls to gettext, and produces a file with all the messages, and a place to fill in translation of each one.

xgettext hitme.php
This will produce a file messages.po in the same directory
xgettext -d hitme -p translations --from-code=UTF-8 hitme.php
This produced for me a file named hitme.po in the path /home/jensen/public_html/translations
The specification of UTF-8 is only needed should your messages contain some non-ascii characters.

4. Translation

The po file has some headings (fill in the information) and pairs like this:

#: hitme.php:15
msgid "your vote has been counted, from "
msgstr ""

#: hitme.php:17
msgid "BOO, HISS, Get lost!"
msgstr ""
Consider this a template file, if you are doing several translations, and move copies into the subdirectories. I did that. Or just translate it here. Let's assume we just add the french here, in translations.
#: hitme.php:15
msgid "your vote has been counted, from "
msgstr "Votre vote est enregistre, de "

#: hitme.php:17
msgid "BOO, HISS, Get lost!"
msgstr "Desole, ca marche pas!"
Now we need to convert it to .mo format, and put it in the subdirectory, with this command:
msgfmt -v -o fr/LC_MESSAGES/hitme.mo hitme.po
And we are good to go!

5. Set up php

To find your translations, gettext needs to know the TEXTDOMAIN (file name of the .mo files) and TEXTDOMAINDIR, where to look for the various subdirectories.    
It also needs to know what language the user wants. The simplest way is to put the "locale" in the query string, you can also remember it using a cookie. It needs to be "put" into the environment, and php must be informed. Note that in many languages the charset is important, as there are non-ascii characters.

<?php
header("Content-type: text/html; charset=UTF-8"); // tell the browser

$domain="hitme";
bind_textdomain_codeset($domain, 'UTF-8'); // tell php
bindtextdomain($domain,"./translations"); // ..where to find..
textdomain($domain); // hitme.mo

$locale = $_GET['lang']; // must be fr_CA, not just fr

putenv('LANG='.$locale); // set locale into 3 environment vars.
putenv('LC_ALL='.$locale);
putenv('LC_MESSAGES='.$locale);
setlocale(LC_ALL,$locale); // and tell php about it too
// (this may vary from system to system)

Surely you are not going to make any changes?

Well, what about 1 alumnus, 3 alumni? There is a plural form, ngettext. And dcgettext, in case you want to look up something in a different "domain" (other named .mo file). And, well, dcngetext. But that's it.
 
Now, if you change your messages, or add new ones, you will need to merge the new stuff with our old translations. There is an "app" for this too, of course. I have yet to use it, so proceed carefully.
We suppose that hitme.php has been changed, and that we want to keep the French translations already in translations/hitme.po. So we use msgmerge in Update mode. It "merges" two po files (producing a third unless Update mode).
In Update mode, the "def" file is updated, keeping translations of old messages, and with new messages needing translation.  So, back to your translator with the result.
Then recreate the mo file as before.
xgettext -p translations hitme.php   #giving translations/messages.po
cd translations
msgmerge -U hitme.po messages.po #merges new strings into hitme.po

##-- now go translate..., then:

msgfmt -v -o fr/LC_MESSAGES/hitme.mo hitme.po #as before

Questions? Lin Jensen

Back to contents