RoboTutorLLC / RoboTutor_2019

Main code for RoboTutor. Uploaded 11/20/2018 to XPRIZE from RoboTutorLLC/RoboTutor.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New prompt narrations

JackMostow opened this issue · comments

@judithodili - I have a script to check for missing narrations of prompts in Swahili translations, and make English-named copies of Swahili prompt narrations, but I stopped using it weeks before code drop 1 due to lack of local disk space to cache Google Drive files, creating extra work for the dev team to create them by hand – which I hope they did rather than make the implementation unnecessarily language-dependent by using the Swahili filenames in animator graphs.
The script stored prompts for each tab of Swahili translations under AUDIO ASSETS > PROMPTS in GDrive folders with the name of that tab, e.g. ARITHMETIC. I hear that you streamlined the folder structure (at least in GitHub) by pooling many narrations into the same folder for simplicity, which is fine assuming that it didn't eliminate any distinctions we needed to preserve between multiple narrations of the same prompt with different inflections or speakers.

Instead of putting new prompt narrations for each activity into folders named for that activity, I've simply been uploading to e.g. PROMPTS NARRATED by Leonora Kivuva the prompts folders named by Project LISTEN's Reading Tutor, e.g. RT11-POSEIDON MORE PROMPTS 160726.491 07202018.

To resurrect and update the script, I need to know:

  1. Would it be useful to do so, in terms of saving work for the dev team?

  2. Where is the script to copy prompts from GDrive into GitHub, and who runs it?

  3. Where are prompts for different tabs of Swahili translations:

    a. in Google Drive?

    b. in GitHub?

  4. Where is the script to translate .wav files into .mp3 files:

    a. for prompts?

    b. for words?

    c. for story narrations?

Thanks. - Jack

@judithodili -

  1. Please answer ASAP so I can update the script properly.
  2. I'm scheduled to record Leonora this week, so ASAP please send or point me to all the new prompts you need, and for which activities. We'll need to:
    a. Add them to the appropriate tabs in Swahili translations.
    b. Wordsmith them in English for me to narrate.
    c. Put them through Google Translate.
    d. Leonora will need to vet and fix the translations.
    e. I'll need to record her narrations of them.
    f. I'll need to run my script to generate English-named copies of them.
    g. Someone or something will need to convert the .wav to .mp3 and copy them to GitHub -- to where?
    Thanks. - Jack

@judithodili - The prompts in the Number Discrimination tab of [Swahili translations] came from our meeting with Emily and (remotely) David.
(https://docs.google.com/spreadsheets/d/11feAhhQqrpJC2waSpOkReMiG3SwmNOofrkir3JljmVM/edit#gid=265160028), which I already narrated in English (indicated by boldface) and Google-translated into Swahili for Leonora to fix and narrate.
Based on the demo version, we'll need to segment them where necessary to synchronize with graphical gestures such as highlighting and RoboFinger.

Prompts are (or at least should be) part of activity design, not afterthoughts to tack on during QA. They need to be wordsmithed carefully, preferably before translating and narrating them.

@kevindeland - Can you answer the questions in my 8/21/2018 post above regarding whether and how to re-automate the generation of English-named Swahili prompts?
Thanks. - Jack

  1. I don't know enough about what the script does

  2. There's no script, this is done manually.

  3. Google Drive: I don't know, I just find them by searching
    GitHub: in either this repo or this repo

  4. I don't know.

@judithodili, @amogh112, @kevindeland, @uhq1 -
I just uploaded more Swahili prompts narrated by Leonora for WRITING, NUMBER COMPARISON, ARITHMETIC, and QUESTIONS to new subfolders of PROMPTS NARRATED by Leonora Kivuva.
I'll narrate and upload their English versions ASAP.

Then I'd like to revive and update my script that creates English-named copies of Swahili prompts to spare you the work and risk of making them yourselves. But I'm still awaiting @judithodili's answers to my questions about where to put them.

@judithodili - All Swahili prompt narrations are in various subfolders of AUDIO ASSETS > PROMPTS > SWAHILI PROMPT NARRATIONS, mostly under PROMPTS NARRATED by Leonora Kivuva.
I used to put prompts for each activity in Swahili translations in a folder with the same name, e.g. ARITHMETIC, under PROMPTS NARRATED by Leonora Kivuva.
Then my script would go through Swahili translations, creating an English-named copy of each Swahili prompt narration, and checking for missing narrations. This script relied on Google Drive for Windows and stopped working when I ran out of enough disk space on the main disk drive of my PC to use it.

To revive the script (now that I've figured out how to change where Google Drive caches local files), I need to know how the prompts are organized now in GitHub and RoboTutor itself. Do you still keep the prompts for each activity in a separate folder, or do you keep them all in the same folder? Or do you use a common folder only for individual words? That's the structure I was asking about.

How do your developers currently locate the Swahili narrated translation of a given English prompt?

Are all prompts now in the same folder (and sound package)? If so, we can just move them into it from the various folders containing them.

If they're still in separate folders, I'll need to move the new prompts for each activity into its own folder. Currently each installment of new narrations for an activity is in a separate folder, some of which contain prompts from more than one activity that I'd need to sort into the folders for their respective activities.

Does RoboTutor now use table lookup to find the narrated Swahili translation of a given English prompt?
If so, there's no need to revive my script, though the table lookup may need to be modified to take account of how punctuation in a prompt is sanitized in naming the file containing the prompt so as to avoid violating OS restrictions on special characters in filenames.

Anyway, these are the questions whose answers I need in order to decide whether and how to revive my script. Let me know if they need further clarification, possibly in person.

Request for the future: If I ask you questions that you don't understand or whose answers you don't know, PLEASE tell me so that I can clarify the questions or figure out how else to find their answers.
Simply ignoring them drives me up the wall.

And thanks again for all your hard work running the dev group and QA process.

@judithodili - @kevindeland replied yesterday that he didn't know above.
Let me try asking both of you in a different way:

  1. How do your activities refer to prompts -- in English or in Swahili?
  2. Where do activities look for prompts? In a separate sound package for each activity, or a shared one?
  3. How do they find the Swahili prompts -- or do they?
    Thanks. - Jack

@judithodili, @kevindeland, @amogh112 - I just uploaded my narrations of more English prompts for NUMBER DISCRIMINATION and WRITING to ENGLISH PROMPT NARRATIONS > 2018-08-20 more prompts -- to move into folders?.

@kevindeland - Here's what's at stake with question 2:

  • If essentially all tutor prompts are in the "default" soundpackage, then we can simply pool all the GDrive folders containing them, without worrying about which prompts are in which folder, or the fact that some folders contain prompts for more than one activity.
  • In contrast, if each activity has its own soundpackage, we need to put its prompts in a separate folder to copy into its sound package in GitHub.

The set of prompts for an activity grows over time, sometimes in dribs and drabs, so their narrations are split among multiple folders containing incremental installments of new prompts. If we pool these folders into a single folder for each activity, we may wind up copying the same prompts into GitHub more than once, which is slow, and possibly even multiple copies of them, which would waste space. Also, pooling the folders would take additional work. So it would be great not to have to do so.

How do you prefer to:

  1. Map English prompts to their Swahili translations
    a. Runtime table lookup in RoboTutor?
    +: easy to use
    -: vulnerable to missing/incorrect entries

b. Offline table lookup
?: in what process, by whom, using what script(s)?

c. English-named copies of Swahili prompts
+: easy to use
+: can check for missing files
-: requires reviving prompt to make those copies

  1. Map prompts to their narrations
    a. Uses prompts as filenames
    b. Generate lookup table of prompts and filenames

Here's what I've been doing so far for new assets

  1. Either
    (a) Search for needed (Swahili) prompts on Google Drive. I use the search function to find the files, so to me, it doesn't matter what location they're in.
    Or (b) receive the folder of audio files from a student via email

  2. Create a new local directory that matches the desired directory structure (i.e. assets/audio/sw/cmu/xprize/....etc)

  3. Download audio assets (from (a) Drive or (b) student) and put them into that new folder.

  4. For personal testing, push the audio files onto my tablet.

  5. After personal testing, zip the files and push them onto Google Drive, for QA.

(this is still yet to be done, but it's what I presume would be the next step)
6. Put new assets into GitHub repo.

Answering Jack's questions:

  1. How do your activities refer to prompts -- in English or in Swahili?
    The same way the RoboTutor code has always been written to refer to prompts, via the tutor_descriptor code.

  2. Where do activities look for prompts? In a separate sound package for each activity, or a shared one?
    Each activity refers to a multitude of sound packages. Some sound packages are shared among many activities, while others are activity-specific. Refer to bubble pop.

  3. How do they find the Swahili prompts -- or do they?
    I don't know whether you mean the student or the code.

I think I understand your question @JackMostow...
The way we have been doing it is C, using English-named copies of Swahili prompts.

So for example, Tunaweza kuondoa would be called "We can take away.mp3", for use in the math tutor.

So I guess the answer would be that yes, the script would be useful if you can find it.

@judithodili - I can’t record Leonora narrating more prompts before October 3. When do you need them? We don’t need the time alignments for spoken-only stories, but using my current setup gets good audio quality and ensures that each narration filename matches the narrated text. Thanks. - Jack

@judithodili , @msalim , @amogh112 -

Realistically I won't revive my script in time to generate English-named Swahili prompt narrations, nor move them into activity-specific folders.

In order to locate the Swahili narration of a given prompt in Swahili translations, you'll need to transform it to the filename that Project LISTEN's Reading Tutor stores it as, so that you can search GDrive for that filename.
The transformation from prompt to filename:

  1. Strips out some characters, but not word-internal hyphens or apostrophes:
    a. Leading and trailing ,.:;!/_?-
    b. /:*?"<>()|_
  2. Condenses multiple spaces to one
  3. Truncates long filenames to the first 100 characters followed by _ and the number of truncated characters, e.g.
    Wazazi wake Tasneem walisikia juu ya jambo hilo na wakataka Tasneem kurudi kuishi nao katika nyumba _3.
    For a definitive specification of this transformation, please see its implementation in 2017-04-27 Jack_swahili_to_english.py.
    Sorry for any inconvenience. - Jack