Benchmarking PHP Localization – Is gettext fast enough?

Last year, I wrote a post about using gettext to localize PHP web pages. Gettext makes it easy to maintain the translations and always provides a fallback locale. But is it fast?

I created a simple web page to compare the performance of various localization methods for PHP. It only contains 3 localized strings and does not use advanced features of gettext (e.g. plurals). I wrote a version using the gettext PHP extension (“gettext Ext.”), one using PHP-gettext (“gettext PHP”, a gettext implementation written in pure PHP) and a version that does not use gettext at all, instead it uses an array that contains all the translations (“String ID”).

I put all three pages on a Debian machine with the latest Apache 2.0.55 and PHP 5.1.2 and used the Apache HTTP server benchmarking tool to measure the performance of the different methods. I always made two tests – using the default locale (English) and a translation (German), because gettext does not have to use a locale file for the default language (it’s embedded in the page).

The results

Here are the results (requests per second, more is better, I used “ab -n 5000 URL”):

Requests per Second

As you can see, the version using the PHP gettext extension is the fastest solution. It is only marginally slower when using a language file, because the extension caches the translations (the downside is that you have to restart the webserver when you change a locale file). The String ID version is equally fast for either locale, because it always has to lookup the text in the locale array. The pure PHP gettext implementation is the slowest solution, and even slower when it needs to use a locale file. This is understandable because it always has to read the whole file for every request.

You can download the test files if you’re interested.

The gettext extension

When using the gettext extension on Linux, make sure all used locales are installed on the system. For example, in Debian (and probably other distributions, too), add the required locales to /etc/locale.gen and run locale-gen. For this test, I added “de_DE.UTF-8 UTF-8″ for the German UTF-8 version.

Conclusion

The native gettext extension for PHP performed best in the benchmarks. Using the gettext Extension allows you to create clean code by only wrapping strings in _() and including a file that loads the localizations. As I already wrote in my previous post, gettext also allows painless updates of the localizations by automatically finding new or changed strings. The only downside is that the webserver must be restarted to load the new language files. This can be a problem for users on shared hosts. In these cases, a pure PHP implementation can be used (which is not very fast though).

If you can restart the webserver, I recommend using gettext. It is very useful especially for larger projects or projects where not all translations can be updated simultaneously, because it always provides a fallback language. It also makes the source code much easier to read, because the default texts are directly in the source code.

14 Responses to “Benchmarking PHP Localization – Is gettext fast enough?”

  1. Localizing PHP web sites using gettext » Pablo’s Development Blog Says:

    [...] See my follow-up post “Benchmarking PHP Localization – Is gettext fast enough?” for Benchmarks. In general, the gettext Extension is faster than using a String-Array. The pure PHP implementation of gettext is slower and not recommended if you can use the PHP Extension. [...]

  2. Brice Burgess Says:

    In case anyone was wondering, it appears that this benchmark used PHP Gettext with cache support included (enabled by default). This was added by Nico Kaiser in version 1.0.3+ of PHP gettext released March 2005.

    It would be frightful to see the results w/o cache enabled.

  3. Pablo Hoch Says:

    Brice: Yes, I used the default PHP gettext options, so the cache was enabled. However, I have just compared the performance of PHP gettext (1.0.7) with cache on vs. cache off and I didn’t see any significant performance difference.

    Here are my results:
    PHP gettext, cache on: default locale = 175 rps, other locale = 148 rps
    PHP gettext, cache off: default locale = 175 rps, other locale = 147 rps

    I assume this is because there are only three localized strings. It could make a difference when you have more strings.

  4. Jim Plush Says:

    I also did some benchmark tests when my last company needed an i18n ready site. might be worth a read

    http://www.litfuel.net/plush/?postid=84

  5. Richard Thomas Says:

    I am also working with gettext and have managed to combined it with savant temlate system.

    Using a savant plugin I can translate on the fly, or with a script pretranslate all the templates so gettext doesnt need to be used on the live site

    http://www.cyberlot.net/Savantgettext

  6. David Says:

    We have used both methods and find gettext not only easier in the long run for updates, but faster also.

  7. Zhitao Ma Says:

    My test results are in the following,

    Requests per Second
    gettext-ext 395.63
    gettext-php 221.12
    stringids 568.66

    I tested them on a Windows XP SP2 computer with the Apache 2.0.55 and PHP 5.1.3-dev. It seems that the results are quite different on Windows and Debian.

  8. Hm-ohj2006 : blogi » Blog Archive » PALAUTUS: Gettextillä Railsia lokalisoimaan. Says:

    [...] Gettext-tuki löytyy myös PHPstä, mutta Railsissa sen eteen on jo tehty valmiiksi töitä. Näin käyttöönotto on suoraviivaisempaa ja muutamalla lisäyksellä (mm. Rakefilen lisätehtävätä) todella vaivatonta. Lisäksi tehokkuusmielessä on Gettextin oikeanlainen käyttö varsin nopeaa. PHP:lle sitä onkin testattu ainakin yhden tapauksen verran ( [1] ). [...]

  9. Jan Schneider Says:

    On most shared hosts a web server restart is not required because they often use either CGI PHP SAPIs. These don’t cache the locale (because it’s actually the web server process, not PHP that is doing the caching), so locale updates work instantly.

  10. links for 2006-06-26 at 59ideas Says:

    [...] Benchmarking PHP Localization – Is gettext fast enough? » Pablo’s Development Blog (tags: php i18n) [...]

  11. links for Jun 19-26 at 59ideas Says:

    [...] Benchmarking PHP Localization – Is gettext fast enough? » Pablo’s Development Blog (tags: php i18n) [...]

  12. Quellen zu Internationalisierung und Lokalisierung » contactsheet.de Says:

    [...] Benchmarking PHP Localization – Is gettext fast enough? in einem inzwischen etwas älteren Artikel nimmt Pablo Hoch drei verschiedene Techniken unter die Lupe: native PHP-Arrays, PHP-gettext sowie die Kombination aus PHP-gettext-Extension und locale. [...]

  13. Kruxik Says:

    If you use mod_php (not CGI) and have problem with cached dictionary after its update there is one solution which doesn’t need webserver restart. You can create the domain name dynamicly and change it whenever you change the dictionary.
    Example: standard name (domain) of the dictionary is messages.mo – means domain is “messages”. You can rename the file to “messages1.mo”, and tell PHP to change domain from “messages” to “messages1″.

    This solution is 100% functional (we use it for our CMS gettext editor).

  14. khaleej clubc Says:

    I dont think 3 strings are enough for a benchmark.
    How many strings do you usually show in a one web page?

    the numbers are so closed and may indicated which to choose is insignificant ..

    I would love to read an updated version of this benchmark and taking into consideration how Symfony, cakePHP, Akelos and other frameworks are doing the l18n and L10n.

    great job
    thanks.