gesture-recognition deep-learning surgical-data-science

Surgical Gesture Recognition Using Multi-Encoder Based Architecture

General info

In this project, we propose to use a multi encoder-single decoder architecture based on LSTM to solve the gesture recognition task, using combined kinematic and video input data.

Background

Gesture recognition is a type of perceptual computing user interface that allows computers to capture and interpret human gestures as commands. The general definition of gesture recognition is the ability of a computer to understand gestures and execute commands based on those gestures. Automatically recognizing surgical gestures is a crucial step towards a thorough understanding of the surgical skill. Possible areas of application include automatic skill assessment, intra-operative monitoring of critical surgical steps, and semi-automation of surgical tasks.

Solutions that rely only on raw video and do not require additional sensor hardware are especially attractive as they can be implemented at a low cost in many scenarios. However, surgical gesture recognition based only on video is a challenging problem that requires effective means to extract both visual and temporal information from the video.

Repository Description

Filename	description
`analysis.py`	Python file consists of the analysis.
`batch_gen.py`	Python file consists of the code to transform the data into batches.
`metrics.py`	Python file consists of the metrics compution used in the project.
`preprocess.py`	Python file consists of the preprocessing phase.
`model.py`	Python file consists of the implementation of the model.
`train_experiment.py`	The main file.
`Trainer.py`	Python file consists of the Trainer.
`visualization.py`	Python file consists of the visualization tools.
`figures`	Folder consists of all the images from the project.
`models`	Folder consists of all the models and optimizers for each split.
`Gesture Recognition report.pdf`	The report.
`config.yaml`	The configuration file.
`requirement.txt`	File containing all the packages we used in this project.

Architecture

In this project, we wanted to leverage the sequence dependency that appears in videos, in addition to the multi types of data we had (raw frames and kinematics data).

Requirement

To set up the environment and install the dependencies, run the following commands:

conda create --name venv
conda activate venv
conda install pip
pip install -r requirements.txt

To prepare the data, go to directory containing the code, and run:

python preprocess.py

To run the experiment, from the directory containing the code, run:

python train_experiment.py -c config.yaml

If a change of parameters is required, change inside config.yaml.

Notes

The APAS dataset was given by the TA.
Most of the code was adapted from the course TA.

About

Surgical Gesture Recognition Using Multi-Encoder Based Architecture.