Teachable Machine Dinosaur

Teachable Dino was created by two masters students for MHacks 2018.

Inspired by the idleness in a grad student's life which is accompanied by staring into laptops for hours, we decided to explore ways of incorporating leisure physical activity into their schedule. Teachable Dino hopes to save you from the office syndrome and encourage you to take a few short breaks to train him.

We have combined chrome's popular dino with Google's Teachable Machine Experiment. By using your webcam you can make gestures to train three actions : JUMP, DUCK and RUN. You can then play the game by using those gestures instead of the traditional keystrokes.

How to Play:

Open How to Train Your Dino in your web browser.
Allow access to the webcam. (Note none of the images are stored, everything happens on your local browser)
Click on JUMP to record an image of the corresponding gesture. Number of clicks equals the number of times trained, for better results please train at least 10 times.
Click on JUST RUN to record an image of the corresponding gesture. Number of clicks equals the number of times trained, for better results please train at least 10 times.
Now,play! You can control the dino by your trained gestures.

Note: Order of training should be JUMP and then JUST RUN. (This is a constraint we were not able to solve during the hackathon period of about 36 hours.)

Things to do, future iterations:

Not start the game immediately until training is done.
Add other versions of the chrome dino like vanilla cake, christmas, etc.
Add music to the game.

Watch a demo here. (We initially created three options but have currently removed the DUCK option to make the game easier.)

Technical Explorations:

We referred to tensorflow.js which was used to create projects like Teachable Machine. The code shows how you can create a KNN classifier that can be trained live in the browser on a webcam image. It is intentionally kept very simple so it can provide a starting point for new projects like these.

Other References:

Behind the scenes, the image from the webcam is being processed by an activation of MobileNet. This network is trained to recognize all sorts of classes from the imagenet dataset, and is optimized to be really small, making it useable in the browser. Instead of reading the prediction values from the MobileNet network, we instead take the second to last layer in the neural network and feed it into a KNN (k-nearest neighbors) classifier that allows you to train your own classes.

The benefit of using the MobileNet model instead of feeding the pixel values directly into the KNN classifier is that we use the high level abstractions that the neural network has learned in order to recognize the Imagenet classes. This allows us with very few samples to train a classifier that can recognize things like smiles vs frown, or small movements in your body. This technique is called Transfer Learning.

Tensorflow has a built in model for doing this. It's called the KNN Classifier, and this boilerplate code shows how to easily use it.

Use code

To use the code, first install the Javascript dependencies by running

npm install

Then start the local budo web server by running

npm start

This will start a web server on localhost:9966. Try and allow permission to your webcam, and add some examples by holding down the buttons.

Quick Reference - KNN Classifier

A quick overview of the most important function calls in the tensorflow.js KNN Classifier.

knnClassifier.create(): Returns a KNNImageClassifier.
.addExample(example, classIndex): Adds an example to the specific class training set.
.clearClass(classIndex): Clears a specific class for training data.
.predictClass(image): Runs the prediction on the image, and returns an object with a top class index and confidence score.

See the full implementation here

Quick Reference - MobileNet

A quick overview of the most important function calls we use from the tensorflow.js model MobileNet.

.load(): Loads and returns a model object.
.infer(image, endpoint): Get an intermediate activation or logit as Tensorflow.js tensors. Takes an image and the optional endpoint to predict through.

See the full implementation here

anantmittal / teachable-dino

Teachable Machine Dinosaur

Use code

Quick Reference - KNN Classifier

Quick Reference - MobileNet

About

Languages