maxboels/video-action-recognition-datasets

Description

This repository contains video datasets that can be used for training coarse to fine-grained (phase, step and action) temporal classification tasks.

Thank you to my colleague Luis C. Garcia-Peraza-Herrera for initiating the content and repo structure.

Surgical video datasets

Dataset	Task	Annotations	Procedures	Paper
CholecT50	Every frame is annotated with labels from the triplet: instrument, verb and target for the recognition of instrument-tissue interaction in laparoscopic cholecystectomies. This novel challenge investigates the state-of-the-art on surgical fine-grained activity recognition.	action, tools, tissue	50	N/A
Hei-Chole	A dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis.	phase, action, tools, skills	24 (33 test not released)	Lena Maier-Hein et al. 2021
DAISI	DAISI leverages on images and instructions to provide step-by-step demonstrations of how to perform procedures from various medical disciplines. The dataset was acquired from real surgical procedures and data from academic textbooks.	captions	13k images	Edgar Rojas-Munoz et al. 2020
Cholec80	80 videos of cholecystectomy surgeries performed by 13 surgeons. The videos are captured at 25 fps. The dataset is labeled with the phase (at 25 fps) and tool presence annotations (at 1 fps). A tool is defind as present in an image if at least half of the tool tip is visible.	phases, tools	80	Twinanda et al. 2016
CATARACTS	This dataset consists of 50 cataract surgery. It was annotated for two main tasks: surgical tool presence detection and surgical activity recognition. It was divided into two sets (train, test) for the surgical tool presence detection task and 3 sets (train, dev, test) for the activity recognition task.	phases, steps	101	N/A
PETRAW	Recognize all levels of granularity of the surgical workflow (phases, steps, and action verb) with different modalities configurations.	phases, steps, actions	100	N/A
MISAW	The “MIcro-Surgical Anastomose Workflow recognition on training sessions” (MISAW) sub-challenge as a part of the MICCAI 2020. Multi-Granularity recognition: One model to recognize phases, steps and activities. Information: stereoscopic video, kinematic data, workflow annotation at 3 levels of granularity (phases, steps, and activities).	phases, steps, activities, actions	27	Huaulmé et al. MICCAI 2021

Private datasets

ByPass40 - Strasbourg University
MitiSW - MITI group at the Klinikum rechts der Isar in Munich

Non-medical video datasets

Dataset	Task	Annotations	Procedures	Paper
Kinetics	A collection of large-scale, high-quality datasets of URL links of up to 650,000 video clips that cover 400/600/700 human action classes, depending on the dataset version. Each clip is human annotated with a single action class and lasts around 10 seconds.	action	700/400	Lucas Smaira (DeepMind) 2020
Breakfast	The Breakfast Actions Dataset comprises of 10 actions related to breakfast preparation, performed by 52 different individuals in 18 different kitchens.	action	77 hours	H. Kuehne CVPR 2014
50 Salads	Activity recognition research has shifted focus from distinguishing full-body motion patterns to recognizing complex interactions of multiple entities.	action, step	50	N/A
Epic-Kitchens-100	Largest dataset in first-person (egocentric) vision; multi-faceted, audio-visual, non-scripted recordings in native environments - i.e. the wearers' homes, capturing all daily activities in the kitchen over multiple days.	action, verb and noun	100
FineGym	FineGym, a new dataset built on top of gymnasium videos. It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy.	temporal action, sub-action and semantic three	99	N/A

Action bounding box detection

Dataset	Brief description	Images	Procedures	Paper
SARAS-MESAD2021	Dataset contains monocular digital recordings from da Vinci Xi robotic system. Two sub-datasets: MESAD-Real and MESAD-Phantom. MESAD-Real represents the prostatectomy procedures recorded on human patients. It contains four sessions of complete prostatectomy procedure performed by expert surgeons on real patients. MESAD-Phantom is also designed for surgeon action detection during prostatectomy, but is composed of videos captured during procedures on phantoms used for the training of surgeons. MESAD-Real comprises 21 action classes and MESAD-Phantom contemplates a smaller list of 14 action classes. Both the datasets share 11 action classes.	N/A	9	N/A

Skill assessment and workflow recognition

Dataset	Brief description	Images	Procedures	Paper
JIGSAWS	The JIGSAWS dataset consists of three components: kinematic data (Cartesian positions, orientations, velocities, angular velocities and gripper angle describing the motion of the manipulators), video data (stereo video captured from the endoscopic camera), and manual annotations of gestures (atomic surgical activity segment labels) and skill (global rating score using modified objective structured assessments of technical skills).	N/A	N/A	Gao et al. 2014
Cataract-101	This dataset contains 101 videos of cataract surgeries annotated with two kinds of information: Anonymous ID and experience level of operating surgeon, and starting points of quasi-standardized operation phases in videos.	1.3M	101	Schoeffmann et al. 2018
HeiCo	The data set contains of data from the ROBUST-MIS 2019 challenge and the Surgical Workflow Challenges from EndoVis 2017 and 2018.	10K	30	Maier-Hein et al. 2020
PETRAW	Dataset for online automatic recognition of surgical workflow by using both kinematic and stereoscopic video information on a micro-anastomosis training task.	N/A	100	N/A

Repositories holding multiple datasets

About

This repository contains video datasets that can be used for training coarse to fine-grained (phase, step and action) temporal classification tasks.

MIT License