As we migrate to gettext for Chamilo LMS v10, we are also looking for a platform to host our translation system (considering our current translation system does not support gettext).
We will migrate most existing translations, but we were looking for the right platform to manage the translations as a community. One tool that attracted our attention was Pootle, a Python-based open source translation system that seems to be lead by the right, passionate people.
In this first phase though, we really need to avoid being distracted by other things than the development of the core code of Chamilo LMS v10. This is why we looked for a hosted solution, with an existing community, preferrably with special plans for open source projects (like Github has).
So we found Crowdin.net, with apparently all the features we will need at first. We also looked at terms and conditions and privacy terms. Terms and conditions indicate that all content translated remain the property of the user, which is good, but we will have to manage some kind of agreement with all translators that their translation work will be considered Creative Commons BY-SA. Finally, privacy terms are pretty reasonnable and respectful of our privacy, but they *do* indicate that Crowdin.net can send promotional e-mails about its services to all members, which means that translators will get a (hopefully reasonnable) amount of spam (I call it spam anyway), but that’s limited to Crowdin.net services only. This being said, we will try it as a reduced core of developers for a while and see if the mails flow is reasonnable before we generalize its use.
So if you want to try it, you should be able to take a look within a few days from now on https://crowdin.com/project/chamilo-lms
If you haven’t read lesson 1 on string identifiers for web applications translations, then I strongly recommend you do so first.
When you start translating a web application, one question will always pop to mind fast enough: do I include punctuation inside the translation or not?
Not all languages end sentences with dots
Well the first element of answer here is that not all languages end up their sentences with dots. This might seem pretty weird to most west Europeans or North Americans, but Japanese is a great example here: they split sentences with 。, not just a dot. Their question mark is also a bit different: it uses カ (ka) instead of “?”. From there, saying that punctuation should be included inside the translatable string just pops to mind as the natural behaviour. This way, each translator can manage punctuation in the way their language dictates.
As an example, the sentence
Welcome to this website. Are you ready?
would transform into its translated form
As you can see, translating the initial sentences by not considering the punctuation marks would obvioulsy have generated duplication and confusion. Something like:
And there is another case where this might be useful…
Right to left languages end sentences with dots, but on the other side
Most European, American and even Asian languages are written from left to right. That’s how I’m writing right now. Some languages, though, most importantly a big set of arabic languages, are written from right to left. Now imagine you would not include the dot in the following sentence to be translated.
Welcome to our website.
The translation of this would be
مرحبا بكم في هذا الموقع..
Now the problem here is, because this sentence is written from right to left, the dot is not ending the sentence anymore. It’s starting it!
As you can see, at least two reasons exist to always include punctuation in your translatable sentences.
Translating a web application is not an easy task (although it might seem so). Or rather, translating it well is not easy.
Using tools like gettext will help you, of course, but past the tools, there are a few things that do not seem to be well understood by web developers with little experience in foreign languages.
In this series of articles, I’ll try to give one example at a time of how to make a perfect translation.
In this first lesson, we’ll talk about the string identifiers, or the name of the translated item.
For example, let’s say you want to translate the string “Title” to many languages, so that when your users come to your page, they will see “Title” if they use your English version, and “Titre” if they use your French version.
Of course, you want to make sure most of the translators understand, from the identifier of the string itself, what this string refers to. For example, you could name it $title. Pretty clear, pretty obvious.
To avoid having lots of different ways to represent translateable strings, you should define a clear convention from the start. Something about using UpperCamelCaseToRepresentYourString or lower_camel_case_to_represent_your_string, or even ‘a pure text identifier always considered as an array index’. I have read once that, if you have to choose between UpperCamelCase and lower_camel_case, non-native English speakers have more difficulty understanding UpperCamelCase because it is more difficult to split the words (visually).
This being said, read the following before establishing your conventions…
Clashing with local variables
Oh but… wait… if you use “$title” kind-of-identifiers, then how are you going to do when you have to use a string identifier that you are already using for the computational elements of your script? Surely, there must be something to do about it…
Well yes, you can decide on 2 options here, which imply creating the notion of namespace, that is create identifiers that will be easily recognized as being part of a group, somehow:
* use one (or several) array(s), of which each index is the identifier (like $t[‘title’] = ‘Title’;)
* use prefixes to your variables (like $translate_title or $t_title = ‘Title’ to make it short)
Now wether you use namespaces or not, and whether you use upper camel case or not, you will have a problem when it comes to differentiate: ‘Title’ from ‘title’. You know, it so happens that some times you have to put a specific text element between parenthesis, and in this case you will need to put it in lowercase.
A quick solution to this is to use the strtoupper() and strtolower() strings in PHP, which allow you to put something in lower case or in upper case. There’s even a function to uppercase only the first letter. But that will not help one you start to have more than European languages, or when you end up in a tricky situation.
In terms of translations, it is generally accepted that you should try to define the different cases in different terms, and not try to programmatically convert strings to what they are not because, in some language, it will not be possible or logical to do so.
As such, try to be specific about your strings. If you really need to make a difference (for the term meaning or correctness), then indicate a suffix to your identifier, insisting on the fact it should we lowercase, uppercase or capitalized (in the last case, only the first letter is uppercase).
For example: $t_TitleItem_lower. This doesn’t break your convention, because the term is still identified by ‘TitleItem’, but you are giving something more of precision.
In just three weeks since the announcement of the split between Dokeos and Chamilo, Chamilo has already received support from many organizations and independents that seem to have thought alike for a long time. We also received about 10 successful patches in this period, which is more than I can remember receiving in the whole year of 2009 for Dokeos.
Finally, we’ve been working a lot this week (kudos to scaramanga and svennie) to get you a brand new translation system that we hope will help you get more productive, faster. For example, you now have a “next untranslated term” icon which lets you do the translation in one straight line, and the possibility to translate to various languages (for those of you who are professional translators, this will probably help you help us a lot more! ;-)
Importantly, you can also download, modify and upload translation files which, in combination with the phpLangEditor plugin for Firefox (from one of the developers of Claroline, by the way), will help you get über-efficient.
We are also gathering new translators and people that will want to get involved deeply into the translations by taking the role of translation coordinators.
Last but not least, Chamilo 1.8.7 will be fully-UTF-8 compliant, which will trigger a massive opening of our community to the East of Europe with known interest from China, Japan, Russia and Arab-writing countries!
If you want to know more, just make sure you watch http://www.chamilo.org next week!
This one is tricky… (and not so practical just yet)
So let’s say you have a 120 pages document, written using MS-Word, and which is understood pretty well by OpenOffice.org Writer (v2). Now let’s say your document is a technical document and, as such, it is half-full of screenshots taken from a software interface (in this case Dokeos) directly into the language of the document. See what I mean?
Now, let’s say this document is in French and you want to translate it to English (or Spanish, the problem is exactly the same). You’re a good translator from French to English, but you want to try and reduce translation time (or “enchance translation speed”) by making use of the tools at hand.
What can you do? I’ll try to explain, step by step, how I don’t do it and why (and then we’ll see how I do it and how long it takes).
First, let’s abandon the idea of translating the text inside the image. Even if it worked with some kind of OCR-magic, it would still be impractical because the result would be an image and the manual correction would be a nightmare (you would still need to open it inside an image editor). Let’s consider this will be patiently done by capturing the same screenshots manually.
1. Translating directly inside OpenOffice.org
Well, that would be practical, wouldn’t it? OpenOffice.org understands its own format, so it could very probably reduce damage in formatting. There are a few tools that pretend they can do that on the web. OmegaT is one of them. I downloaded it, tried to make it run, but apparently they don’t know what explicit error messages mean, so basically I try to read a .odt document and it tells me it can’t read it. OK… fine. Other tools that are supposed to be able to do this seem to be non open-source, which I tend to avoid by moral principle (I know, this isn’t an argument).
I keep thinking that adding this option inside OpenOffice.org would be great. I’ll see with Sergio Infante what can be done about that. I guess developing a plugin could be done easily…
2. Use Google Translator with a PDF
Google Translator is alright to translate web pages, but it transforms everything to HTML, removing the images, which means:
- I loose the format
- I loose the images
- I have an ugly HTML document which I can’t redistribute
If your PDF is too big, you can also convert it to an HTML using Zoho (which really converts it in an amazing way to HTML).
3. Use Google Translator Toolkit
OK, so if directly turn to Google Translator Toolkit and read the documentation, it seems to be good for me:
- it reads .odt files
- it translates them
- it allows me to get the translated file back in .odt
The problems are
- it doesn’t translate from anything else than English
- it doesn’t manage files larger than 1MB (mine is 16MB)
OK well… There’s only one solution left…
4. Use Google Translator and a lot of clicking
Although this is highly impractical, it still does the job of cutting pre-translation time. The technique is as follows:
- open a browser on the Google Translator page
- open your document in OpenOffice.org Writer
- make them use half of the screen width each
- triple click one paragraph in Writer, CTRL-C it
- triple click the current text to be translated in the Google Translator page and CTRL-V the new text
- click “Translate”
- triple click the translation + CTRL-C
- click the Writer window (the text to be translated is still selected)
- move on to the next paragraph
This technique is a bit boring but it has the merit of not breaking your formatting much and pre-translating your document (you can probably get to something like 30 pages an hour, for pre-translation).
I reckon the real translation will then still take a bit more (15 pages an hour?), but you will still have gained a lot in translation-thinking-time. It is now more like a work of corrector than a work of translator. By the way, most of the time, this review time would still need to be done by a technician and a second translator to make sure it is accurate and idiomatically correct.