bigscience-workshop / promptsource

Toolkit for creating, sharing and using natural language prompts.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

machine-translate structured text | prompt templates → multilingual data

tianjianjiang opened this issue · comments

Tricks that I've used before (as far as I can recall):

  • Google Translate, DeepL, or Microsoft Translator with <span id=...>.
    • Previously DeepL and MS didn't work well with tags, but I can try again.
  • English-oriented tricks with models on this list https://github.com/UKPLab/EasyNMT#available-models
    (BTW, Kenneth Heafield is an expert of OPUS-MT, see https://huggingface.slack.com/archives/C01L2DYQ7QX/p1634736849388300)
    • Designated survivors: say a template needs a slot for a word, the slot's designated survivor should be a special word that is almost universal, such as a digit.
      • Designated survivors with quotation marks: to properly translate a phrasal slot instead of a word.
    • Punctuations (e.g., quotation marks) instead of Jinja markups: this may be actually the easiest.

A side note: I've created https://github.com/orgs/bigscience-workshop/projects/7 and invited @stephenbach, @srush, @craffel, @awebson, and @VictorSanh to be admins. Please feel free to ignore that if you guys are not interested in trying it out for GitHub proj. mgmt.'s readability/usability. It need an admin of this repo to assign a ticket's projects, so it is almost blank. But you may be aware of how other projects use that anyway.

Closing due to inactivity. Feel free to reopen if you want to revisit this!