Softcatala / translation-memory-tools

A set of tools to build, maintain and use translation memories

Home Page:https://www.softcatala.org/recursos/memories/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Selective processing of projects in builder.py

julen opened this issue · comments

commented

Since fc11bb9 the configuration is managed in a per-project basis, i.e. each project is represented in a JSON file. When running the builder without further arguments, it will process every single project available within the cfg/projects/ folder.

Unless I missed some detail, this can currently be restricted either via:

  • CLI flag, --project=<project1, project2, ..., projectN>
  • selectively placing project configuration files in a separate directory and passing --json=<json_dir>.
  • Using --softcatala to process those projects which contain the softcatala: true flag.

Any of the existing approaches are quite tedious if someone wants to run the tool out of the scope of SC. I believe the ideal would be:

  • The CLI still accepts passing a list of projects to process.
  • Project configurations are generic and locale-independent. This has the challenge of handling language codes properly, which tend to be project-specific.
  • Locale-specific definitions live in a separate configuration file. This would allow one to define the language code to use, as well as the projects it needs.

With this, the softcatala: true flag would become redundant, as well as the --json CLI option. The configuration is kept generic so that it's available to any locale. Locales can then choose the list of projects they want to pull in for TM generation.

Yes, this system was designed to work only with one locale.

The change that you are proposing is a large change. It requires changes to the application but also to review all configuration files.

One of additional core problems that the tool uses PO as file format that only handles one translation. We should consider switching to TMX format or even use a database backend.

I will like to understand how do you plan to use the system.

commented

My main goal would be being able to reuse your system for Basque and deploy it in our own server. We like the idea of having a common source of truth in a TM form which can easily be searched.

In the current state of the code this is not possible as we already pointed out, and I know this involves quite a large change from the current approach. The alternative would be to write this thing from scratch but I'd like to avoid that as much as possible.

I'm all in for making the necessary changes, but will take no less than 3 months.

There are a 123 active configuration files at /cfg/projects. Around 15% of the projects (KDE, GNOME, LibreOffice, GIMP, etc) provide 80% of the content.

My advice is that you can start now creating your own separate configuration files for Basque language for the top 20 projects. This should not take more than 2 hours since in most of the cases you will just need to change the URL. This will allow you to:

  • Get started now and be able to offer to your users something useful
  • Learn how the system works then we can discuss all the refactors needed
  • This will allow you to start contributing now

Let me how this sounds