fititnt / hxltm-action

[non-production-ready] Multilingual Terminology in Humanitarian Language Exchange. TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, (...)

Home Page:https://hxltm.etica.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement and document HXLTM configurable ontologia (use case: create non ad hoc templated files, but data standards not shiped with default hxltm ontologia, like custom XMLs, JSONs, etc)

fititnt opened this issue · comments

  • Implement and document HXLTM ad hoc templated file generation based on source multilingual dataset (use case: generate monolingual templates, like translated documentation) #3
    • Note: one difference with the #\3, is here you're likely to create like a custom XML, (like and XLIFF version or some TBX dialect) or some new JSON data standard that is not added to everyone use.
    • The #\3 is likely to be more focused on "generate files like READMEs or JSON with specific concept translations" while this issue here is a full dump (to export, maybe import if you can explain on ontologia) that is generic for any dataset, without hardcoded concepts

The reference public domain cli tools of HXLTM have option to specify a different ontologia file (which basically, have full control of not only how it exports, but import back from documented data standards). But this is not documented here on hxltm-action.

Add to this that the special option --archivum-configurationem-appendicem which we had not change to test how would be implemented, somewhat already would allow to only merge/replace specific additions. The advantage of this is the end user (or an GitHub Action documentation, trying to do some data transformation) could both still have the reference ontologia AND specify customizations.

Some objetives on this issue

The hxltm-action should explain both how to override all the ontologia and partial override.

Since this may be a so common scenario, some custom paths if they exist (since the HXLTM cli tolling already try to use files of user instead of what ship with the program) but instead of search by user home directory for the YAML/JSON, search on folder .github/hxltm.

Anyway, both cases will require customize the upstream programs. So this issue here is to keep track of what reasoning behind this

Implicit objetive: explain also how to replace the programs

This point actually already have some references for the next release (this is explained on the input parameter bin:

### Inputs
#### `bin`
**Required** The executable to run.

**Parameter examples**:
- `hxltmcli` _(or `.github/hxltm/hxltmcli.py`)_ (*)
- `hxltmdexml` _(or `.github/hxltm/hxltmdexml.py`)_ (*)

> <sub>(*): If necessary, a local customized fork of the reference HXLTM tools
  can be stored near where the data is processed. The suggested places are
  .github/hxltm/(file).py. This can both be useful for testing proposes or
  immediate hotfixes under urgency response where you as implementer cannot
  wait.</sub>

The "implicit objetive" is give some guidance when to override the ontolgia, and when to override the program. Some great use case to override the program is both for

  • very advanced customization (where the ontologia alone is not enough), for
  • lazy way to archive a schedule runner, and keep running for years, without care about updates
  • security validation (e.g. someone using the HXLTM as data format, but needs to have code evaluated by experts and cannot trust some community effort)

Current --help output

hxltmcli --help
# hxltmcli v0.8.8
# (...)
  --archivum-configurationem
                        Path to custom configuration file (The cor.hxltm.yml)
  --archivum-configurationem-appendicem
                        (Not implemented yet)Path to custom configuration file (The cor.hxltm.yml)

# (...)
hxltmcli --help
# hxltmdexml v0.7.1
# (...)
  --archivum-configurationem
                        Path to custom configuration file (The cor.hxltm.yml)
# (...)