OpenEnergyPlatform / open-MaStR

A collaborative software to download the energy database Marktstammdatenregister (MaStR)

Home Page:https://open-mastr.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create a `Mastr.translate` method

FlorianK13 opened this issue · comments

Description of the issue

We could now implement a translation method for the Mastr database, where all columns are translated to english. Thanks to LLMs like chatgpt we would not need to translate it on our own.

Ideas of solution

  1. Create a list of all distinct column names of all tables.
  2. Pass this list to chatGPT asking for a translation of every item.
  3. Create a dictionary with translations. If new columns are added and not available in the dict, they shall not be translated.

Workflow checklist

I think this could be solved as follows:

  1. Get a list of all column names from all tables, either by connecting to an existing database or by using the orm.py file and according sqlalchemy methods.
  2. Transfer this list to a set.
  3. Go to your favourite LLM and create a translation dictionary from this set of column names.
  4. Implement a Mastr.translate method that takes the downloaded database, iterates over all tables and all columns and translates them. The database should then be renamed to open_mastr_translated.db so that the open_mastr module will not try to work with it again when writing new data to this database.