Softcatala / translation-memory-tools

A set of tools to build, maintain and use translation memories

Home Page:https://www.softcatala.org/recursos/memories/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reachitecture the tool for a version 2.0

jordimas opened this issue · comments

Background

This ticket collects all the architecture improvements needed to fully the new set of requirements and address the limitations that we learnt until September 2014.

Downloading, converting and building translation memories

  • Decouple the download process #63
  • Create a configuration subdirectory where every project has its own json file #62

Solution: Consider moving to TMX as native format for the tool. Some considerations:

  • The limitation here is that there are not out of the box tools for merging (msgcat) catalogs.
  • Probably we will need to build something and consider contributing it to translate toolkit.
  • Many tools that convert from other formats like TS, strings, etc, they do convert to PO (ts2po) not to TMX. We need need to think if we are OK converting from these formats to PO and then to TMX or we need native conversors.
  • This will require rewritting the index creator, terminology analysis and other tools since all the of they relay on PO files as source format

Limitation: The conversion from any format to PO format is limited. The problems observed are:

  • Currently we are using the file extensions to identify the formats. In the case of INI or strings files you need sometimes to be more specific since these can have different variations.

Solution: By the default, as today, we have conversors associated to extensions. However, also having some kind of pattern matching in projects.json where you can specify per project which conversors to use.

Web application

Limitation: Currently all the Softcatalà application is tightly coupled with with the backends.
Solution: The Softcatalà application, and any other front ends, should be independent applications that different teams maintain that use APIs to interact with the system. In github, we should have a simple agonistic web application to show the APIs work (instead of Softcatalà one). We should provide 3 APIs:

  • API to search the text index (#28)
  • API to access the translation memory downloads created (date, file, etc)
  • API to access the terminology items created (glossaries)

Limitation: The web application is written using CGI
Solution: Write the application using MVC (#24)

Text Search engine

Potential limitation: We are currently using Whoosh as full text search engine. We are not sure how this will scale if we add 50 languages and 50 projects more for example.
Solution: See (#24)

Integration with Translation Memory servers

The vision here is not to implement a Translation Memory server. Implement #23 to integrate with (Amagrama) https://github.com/translate/amagama.

Great write-up!

As usual lack of time prevents me to get more deeply involved but I will try to help as much as I can.

@jordimas As a side note I am not 100% sure you have referenced the right issues.

Quick review:

  • We need to split it on the 3 separate process
    • Completely agree.
  • Move to TMX as native format for the tool
    • Agreed, but have in mind the lack of conversion tools.
      In TTK there is a long term plan to rewrite the internals so we just have one converter capable of converting from any format to any format, so maybe we can join efforts here. Anyway I see this long term.
  • Create a configuration subdirectory where every project has its own JSON file
    • Completely agree.
  • Have pattern matching in projects.json to specify per project which converters to use.
    • Not sure I understand the problem.
  • Integration with Translation Memory servers
    • I don't think you understand what amaGama is. amaGama uses its own database to serve TM results, so it doesn't feed from other tools.
      So if you want to feed CAT tools using this project (I still find translation-memory-tools to be not a good name) then the aim is to replace tools like amaGama.