Feature proposal: get pronunciation from Lingua Libre

Question

Feature proposal: get pronunciation from Lingua Libre

gerritholl opened this issue 3 years ago · comments

Currently, users can either add audio files to their mnemosyne flash cards manually, or download a machine pronunciation using Google Translate Text to Speech (gtts). The latter is not very stable: it relies on an API interface to a service designed for a human interface, leading at times to issues such as pndurette/gTTS#232 (which I am still getting with the latest gTTS and mnemosyne today). Machine pronunciation may not always be optimal, and the reliance on Google is not in spirit with the FOSS that mnemosyne is.

Wikimedia France has a project called Lingua Libre (unfortunately affected by the OVHCloud Fire so currently offline, but apparently no audio files were lost), which collects free human recordings of all words in all languages. It's very large for French (195,000 recordings) and Bengali (50,000 recordings), and there are more than 10,000 recordings in Bengali, Marathi, German, English, Esperanto, Occitan, and Ukrainian, with 17 more languages having more than 1,000 recordings.

I think it would be a nice feature for mnemosyne to download one or more recordings for a word from Lingua Libre if available. They should always be free and available (barring data centre fires…), unlike non-free Google text-to-speech generation, and are spoken by native speakers. It could also help to give additional visibility to Lingua Libre and thus be a symbiotic win-win situation for language preservation and language learning.

I understand this is an open source project and I do not have any expectation that anybody will implement this. I may be available to work on this myself, but since I'm new to mnemosyne, I would first like to ask the core developers if they think such a feature would be a welcome fit or if they think this effort may be better spent elsewhere.

What do you think?

pbienst · Answer 1 · Wed Apr 14 2021 12:44:43 GMT+0800 (China Standard Time)

If you want to create a plugin for this, go for it!

(personally I use TTS for whole sentences, not words)