Bukva: Russian Sign Language Alphabet Dataset

We introduce a video dataset Bukva for Russian Dactyl Recognition task. Bukva dataset size is about 27 GB, and it contains 3757 RGB videos with more than 101 samples for each RSL alphabet sign, including dynamic ones. The dataset is divided into training set and test set by subject user_id. The training set includes 3097 videos, and the test set includes 660 videos. The total video recording time is ~4 hours. About 17% of the videos are recorded in HD format, and 70% of the videos are in FullHD resolution.

Downloads

Downloads	Size (GB)	Comment
dataset	~27	Original HD+, Trimmed HD+, annotations

Annotation file is easy to use and contains some useful columns, see annotations.tsv file:

	attachment_id	user_id	text	begin	end	height	width	train	length
0	df5b08f0-...	18...	А	36	76	1920	1080	False	150
1	3d2b6a08-...	9a...	А	31	63	1920	1080	True	78
2	1915f996-...	ca...	А	25	81	1920	1080	True	98

where:

attachment_id - video file name
user_id - unique anonymized user ID
text - gesture class in Russian Langauge
begin - start of the gesture (for original dataset)
end - end of the gesture (for original dataset)
height - video height
width - video width
train - train or test boolean flag
length - video length

After downloading, you can unzip the archive by running the following command:

unzip <PATH_TO_ARCHIVE> -d <PATH_TO_SAVE>

The structure of the dataset is as follows:

├── original
│   ├── 0a1b79d6-...
│   ├── 0a53c65e-...
│   ├── ...
├── trimmed
│   ├── 0a1b79d6-...
│   ├── 0a53c65e-...
│   ├── ...
├── annotations.tsv

Models

We provide some pre-trained models as the baseline for Russian Dactyl Recognition.

Model Name	Model Size (MB)	Metric	ONNX
MobileNetV2_TSM	9.1	83.6	weights

Training

To train models from scratch you need to follow the instructions below.

Download dataset using link from section Download

Convert annotations to txt format using constants.py

<path_to_video> <class_id>
<path_to_video> <class_id>
...

Using mmaction2 framework to train models, prepare the environment.
Add the path to your train and test txt files to the training_pipeline_tsm.py config.
Choose model config from the configs folder and start training.

Demo

usage: demo.py [-h] -p CONFIG [--mp] [-v] [-l LENGTH]

optional arguments:
  -h, --help            show this help message and exit
  -p CONFIG, --config CONFIG
                        Path to config
  --mp                  Enable multiprocessing
  -v, --verbose         Enable logging
  -l LENGTH, --length LENGTH
                        Deque length for predictions


python demo.py -p <PATH_TO_CONFIG>

ai-forever / bukva

Bukva: Russian Sign Language Alphabet Dataset

Downloads

Models

Training

Demo

Dataset example

Authors and Credits

About

Languages