research-paper deep-learning ultrasound ultrasound-imaging tongue tongue-tip-tracking lip mouth-tracking pose-estimation deeplabcut

Markerless pose estimation of speech articulators from ultrasound tongue images and lip video

These videos show the performance of the model on speakers that were not included in the training set. The video below also shows the performance on an ultrasound system, probe geometry and framerate which were not represented in the training set.

The ultrasound model estimates the position of 11 keypoints along the tongue surface plus a further 3 keypoints on the hyoid, base of the mandible and mental spine where short tendon attaches to mandible.

(The video above was made in AAA, software for speech articulatory analysis and recording by Articulate Instruments, using the pose-estimation models in this project trained using DeepLabCut (Mathis, A., Mamidanna, P., Cury, K.M. et al.). The video below was created using DeepLabCut's built-in video export).

How to use this project

To download all the files needed to run this project, you can clone this repository:

git clone https://github.com/articulateinstruments/DeepLabCut-for-Speech-Production.git

or click this link to download the project as a .zip file. (737 MB download / 1.48 GB on disk)
Click here for instructions on how to install DeepLabCut and run this project. (DeepLabCut will be 2.97 GB on disk)
Click here for instructions on how to use this project to analyse data. Note: Shuffle2 Lip and Ultrasound models are trained using revised labelling and significantly more images from new recordings. Results are best with these models.

Both guides contain detailed walk-throughs for people who are new to using DeepLabCut.

You do not need a GPU in your computer to use these models: you should be able to run this project on most PCs. If you have a powerful GPU then you can use it with this project to analyse data significantly faster.

What this project contains

This repository contains:

6 pre-trained models that are ready to use, specifically:
- 3 Ultrasound tongue surface, mandible, hyoid and short-tendon tracking models, for use on midsaggital ultrasound videos where the tongue tip is to the right.
- 3 Lip tracking models, for use on front-facing videos of human lips.
1 set of hand-labeled Ultrasound training data.
1 set of hand-labeled Lip training data.
1 set of hand-labeled Ultrasound test data.
1 set of hand-labeled Lip test data.

Authors

This research using DeepLabCut for speech production is by Wrench, A. and Balch-Tomes, J. (2022) (10.3390/s22031133).

DeepLabCut software was developed by Mathis, A., Mamidanna, P., Cury, K.M. et al. (2018) (10.1038/s41593-018-0209-y) with additional software by Nath, T., Mathis, A. et al. (2019) (10.1038/s41596-019-0176-0) and Mathis, A., Biasi T. et al. (2021)

About

Trained deep neural-net models for estimating articulatory keypoints from midsagittal ultrasound tongue videos and front-view lip camera videos using DeepLabCut. This research is by Wrench, A. and Balch-Tomes, J. (2022) (https://www.mdpi.com/1424-8220/22/3/1133) (https://doi.org/10.3390/s22031133).

research-paper deep-learning ultrasound ultrasound-imaging tongue tongue-tip-tracking lip mouth-tracking pose-estimation deeplabcut

GNU General Public License v3.0

Languages

Language:Batchfile 76.9%Language:Python 23.1%