2013 Sep 12

L10n POC: how to quickly find what you should translate?

If you are Fedora translator, you can jump to the result and then come back here.

Life of a Fedora Translator

Being a Fedora Project translator means you could be able to translate software, documentation and websites. They are not handled the same way: some are weekly updated, while some other get updates time to time, and few never get updated. And finally the priority of our translation change according to the release schedule... Before translating, we should:

  1. Check the L10n schedule in order to see if we should translate doc, websites or software;
  2. Select a project. If it's a Fedora documentation, we should check on the table if the it is up to date
  3. And finally do our lovely job

It's not easy. It takes time. It takes effort. Luckily we are using Transifex and it helps gathering projects. However, I always have hard time checking the health of our translations (stats per projects, per release..) We don't have a global view of our translatons.

Note, this could change shortly with the new Organization feature, I just get bored about the usual interface, that's why I coded my own page.

To get more details, Fedora translations are spread into Transifex releases. A release gather projects around a common goal. We have 4 releases:

  • Fedora Documentation: gathers the official Fedora Documentation available at docs.fedoraproject.org.
  • Fedora Websites: our websites!
  • Fedora Main: follow the Fedora release, therefore they used to have string freeze and could be re-packaged for test and release days.
  • Fedora Upstream Projects: a collection Projects not part of the Fedora Project itself, but which are important for Fedora as a distributions. Examples include Yum, RPM and PackageKit.These projects may have their own upstream translation teams or may utilize the Fedora Translation Project ones.

As the Transifex releases are updated manually times to times, they could miss some resources, or provide outdated resource. And we can only add resources to a release, not the whole project (unless we add its resources one by one).

Many projects reuses the Fedora Project L10n teams, by outsourcing the Fedora HUB. Which means that we have an other way to gather (part) of our projects.

From the above you can see that we have many entries to find what needs to be translated. In short, visit fedora.transifex.com then choose your language (and you'll have a huge ugly list) or scroll down and select a resource under a transifex release (but remember that the list could be outdated).

How to improve our life

We moved from git to transifex online, I moved back to git!

It started with a local mirror of our translations. There was a time where any Fedora translator could have uploaded a translation to the wrong language team... It was useful to restore a translation (because some project don't even have POs on their repository).

I created the French mirror at wip.fedora-fr.org/p/traduction. Code and how to will come later. Basically, I use the transifex client to daily pull the french translations. You can see on the history that I hide the email adresses. This is also useful for fast proofreading. I am also using the repo to quickly fix typo.

Example, if you have a new translator who pushed a huge translation with an incorrect translation, you would be able to check it quickly:

  • You find an incorrect string, for example "Fedora Project" translated "projet Fedora" instead of "Projet Fedora" (caps matters);
  • you grep the repo to correct the errors (with sed or what ever you like):
$ git grep -E "^\s?msgstr.*projet Fedora"
fedora-docs/translations/fedora-docsite-publican.Labels/fr.po:msgstr "Documentation pour les contributeurs au projet Fedora"
fedora-docs/translations/fedora-readme-burning-isos.Introduction/fr.po:msgstr "Le projet Fedora ne fait du support que pour les logiciels faisant partie de la distribution Fedora"
  • then you push your changes with tx, as the repo is already set up! (use for example tx push -t -l fr -r fas.faspot once you are on the fedora-website subfolder.

Finally, I get a better overview!

Because I set the mirror repository the way I decided is best (parsing all releases to get all the projects and not stick to the selected resources only), I thought I will file a descriptive file of all projects and their translation stats. This file would be the basis of the home view... And here is the result: shaiton.org/trad/traduction.cgi

You can quickly find what you should translate because:

  • you can sort the list the way you want
  • projects are gathered by releases
  • the list is complete (if you find missing projects it's a bug, tell me!)
  • you can see priority projects highlighted by different colors (it's my own priority list, don't forget, it's a (working) Proof Of Concept.)

Now that you saw this clean list, compare it with the Transifex one Can you see the original problem? :)

How to setup your own page

There are two steps, unless the trans team decide we should build a global Fedora app for that (I am not sure this would be useful to many teams.. I would prefer the Indifex team working on a better home page for organizations): mirror the translations, then update the stats and publish them.

All the scripts (and more) are under my gitorious repo: gitorious.org/tiny-scripts/transifex/

mirror you translations

You need to initiate the Transifex repositories and your git repo. This is done with the l10n_stats.py script (under the l10n_stats folder).

./l10n_stats.py --init --lang=<your language code>

You need to have a working ~/.transifexrc file with you credentials as the transifex API does not support anonymous GET request yet. See doc.

Now, you need to init your git repo and add all files

Then, setup a cron to pull the translations on the repo. I use the script pull_translations.sh in order to pull translations and get translator credits (it also display maintainer changes). You won't have credit by lines, only by files, as many translators could have updated the files before your next pull.

Setup your main webpage!

The idea is to parse the metadata file (generated by ./l10n_stats.py --lang=<your language code> on a second cron job) to display stats. My code uses a cgi script, only because I was limited by my host and because I always avoided this.. I had to test one day ;) It's all under the l10n_stats/web/ folder. priority.json file is my own priority list.

Happy L10n to all!