maniac0r / our-voices-model-competition

Our Voices Competition :: the DigiEduBerlin fork

Home Page:https://foundation.mozilla.org/en/blog/our-voices-taking-part/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Our voices model competition

About

We want to see - and incentivise! - great diversity, equity and inclusion-conscious work being done with the Common Voice dataset. We are running a model and methods competition with three broad themes, plus an open category.

Register your interest

Start today by registering your interest with this form, and you'll receive a participant pack with guidance, resources, advice and more to help you.

Please read the full rules if you are considering applying.

Participant packs

Localised versions of the participant packs can be found on our Google Drive.

What are we looking for?

Your entry must be a diversity, equity and inclusion-conscious Model or Method under one of the following categories. It must primarily make use of Mozilla Common Voice data from the 11th release (September 2022). Outside of this, we are being deliberately open-ended. However, here are some illustrative examples;

Categories About
Gender An STT model for an under-resourced language that performs equally well for women
Variant, Dialect or Accent 1) Proof of concept for an under-served language variant delivered with a small ‘toy’ corpus 2) Accent classifiers by, and for, a community
Methods and Measures 1) A benchmark bias corpus 2) Dataset audit methodology
Open Exciting DEI work primarily using Common Voice that doesn't fit into the categories above

Judging will be done by Sponsor or its designees, who shall have sole discretion in determining winners based on the following equally weighted criteria:

  • Word Error Rate (“WER”) - how many word recognition mistakes your model makes when used on a fresh dataset
  • WER Score when balanced by gender or Accent demographics (as per competition theme)
  • Utility - this is a judgment scoring by panelists evaluating how effective, original and useful your method or measure would be
  • Social need / ecosystem value - whether this model adds value to the universe of other models for the same language. We do not disqualify submissions that are not open source, but when considering ecosystem value-add within the wider rubric, we will consider the license under which your work is available.
  • Deployability rating - this is a judgment scoring by panelists evaluating how easy would this be to install in an application
  • Environmental impact rating (via GPU usage) - this has two components - expert panel rating plus required provision of their processing stats - aka how 'hungry' is your model? is it written to be efficient?

How are you making sure it's easy for all languages to participate?

  • We are actively encouraging submissions at proof of concept stage that use a small or 'toy' corpus
  • Our methodology and methods category enables teams to submit outlines for tools that they do not yet have the resources to build out further
  • We have allowed a month of development time to accomodate those relying on CPU / slower compute
  • Languages will be judged within 'Bands' - high resource, medium resource and low resource - to ensure a fairer competition between languages that exist in different contexts
  • We are creating a flexible, holistic rubric that makes it possible for judges to look at ecosystem value-add factors beyond performance metrics like Word Error Rate
  • For marginalised communities who have governance concerns about releasing their model under an open source license, they are welcome to submit with an explanation to that effect, and this will be considered accordingly

Our judges

Who are the judging panel?

Submission process

In order to submit your code for the competition, you should do the following:

  • Fork this repository in GitHub.
  • Create a new directory in a subdirectory of submit/ that corresponds to the category you want to submit in.
  • Commit and push your code to that subdirectory in your fork.
  • Open a pull request to this repository.
  • You must then make your final submission using this form. Submissions which are not submitted via the Typeform will not be able to be considered.

About

Our Voices Competition :: the DigiEduBerlin fork

https://foundation.mozilla.org/en/blog/our-voices-taking-part/


Languages

Language:Python 66.0%Language:Shell 34.0%