Pre-translating a large rich-text document
This one is tricky… (and not so practical just yet)
So let’s say you have a 120 pages document, written using MS-Word, and which is understood pretty well by OpenOffice.org Writer (v2). Now let’s say your document is a technical document and, as such, it is half-full of screenshots taken from a software interface (in this case Dokeos) directly into the language of the document. See what I mean?
Now, let’s say this document is in French and you want to translate it to English (or Spanish, the problem is exactly the same). You’re a good translator from French to English, but you want to try and reduce translation time (or “enchance translation speed”) by making use of the tools at hand.
What can you do? I’ll try to explain, step by step, how I don’t do it and why (and then we’ll see how I do it and how long it takes).
First, let’s abandon the idea of translating the text inside the image. Even if it worked with some kind of OCR-magic, it would still be impractical because the result would be an image and the manual correction would be a nightmare (you would still need to open it inside an image editor). Let’s consider this will be patiently done by capturing the same screenshots manually.
1. Translating directly inside OpenOffice.org
Well, that would be practical, wouldn’t it? OpenOffice.org understands its own format, so it could very probably reduce damage in formatting. There are a few tools that pretend they can do that on the web. OmegaT is one of them. I downloaded it, tried to make it run, but apparently they don’t know what explicit error messages mean, so basically I try to read a .odt document and it tells me it can’t read it. OK… fine. Other tools that are supposed to be able to do this seem to be non open-source, which I tend to avoid by moral principle (I know, this isn’t an argument).
I keep thinking that adding this option inside OpenOffice.org would be great. I’ll see with Sergio Infante what can be done about that. I guess developing a plugin could be done easily…
2. Use Google Translator with a PDF
Google Translator is alright to translate web pages, but it transforms everything to HTML, removing the images, which means:
- I loose the format
- I loose the images
- I have an ugly HTML document which I can’t redistribute
If your PDF is too big, you can also convert it to an HTML using Zoho (which really converts it in an amazing way to HTML).
3. Use Google Translator Toolkit
OK, so if directly turn to Google Translator Toolkit and read the documentation, it seems to be good for me:
- it reads .odt files
- it translates them
- it allows me to get the translated file back in .odt
The problems are
- it doesn’t translate from anything else than English
- it doesn’t manage files larger than 1MB (mine is 16MB)
OK well… There’s only one solution left…
4. Use Google Translator and a lot of clicking
Although this is highly impractical, it still does the job of cutting pre-translation time. The technique is as follows:
- open a browser on the Google Translator page
- open your document in OpenOffice.org Writer
- make them use half of the screen width each
- triple click one paragraph in Writer, CTRL-C it
- triple click the current text to be translated in the Google Translator page and CTRL-V the new text
- click “Translate”
- triple click the translation + CTRL-C
- click the Writer window (the text to be translated is still selected)
- move on to the next paragraph
This technique is a bit boring but it has the merit of not breaking your formatting much and pre-translating your document (you can probably get to something like 30 pages an hour, for pre-translation).
I reckon the real translation will then still take a bit more (15 pages an hour?), but you will still have gained a lot in translation-thinking-time. It is now more like a work of corrector than a work of translator. By the way, most of the time, this review time would still need to be done by a technician and a second translator to make sure it is accurate and idiomatically correct.