PurdueDigitalTwin/M2DAR

M²DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer

Yunsheng Ma, Liangqi Yuan, Amr Abdelraouf, Kyungtae Han, Rohit Gupta, Zihao Li, Ziran Wang

Introduction

Ensuring traffic safety and preventing accidents is a critical goal in daily driving. Computer vision technologies can be leveraged to achieve this goal by detecting distracted driving behaviors. In this paper, we present M²DAR, a multi-view, multi-scale framework for naturalistic driving action recognition and localization in untrimmed videos.

Methodology

M²DAR is a weight-sharing, Multi-Scale Transformer-based action recognition network that learns robust hierarchical representations. It features a novel election algorithm consisting of aggregation, filtering, merging, and selection processes to refine the preliminary results from the action recognition module across multiple views.

Installation

This project is based on PySlowFast and AiCityAction codebases. Please follow the instructions there to set up the environment. Additionally, install the decord library by running the following command:

pip install decord

Data preparation

Download the AI-City Challenge 2023 Track 3 Dataset and put it into data/orignal. The expected contents of this folder are:
- data/original/A1
- data/original/A2
- data/original/Distracted_Activity_Class_definition.txt
Create an annotation file annotation_A1.csv by running the following command:

python tools/data_converter/create_anno.py

Move all videos in both the A1 and A2 subsets in a single folder data/preprocessed/A1_A2_videos using the following command:

python tools/data_converter/relocate_video.py

Process the annotation file to fix errors in the original annotation file such as action end time < start time or wrong action labels by running the following command:

python tools/data_converter/process_anno.py

Cut the videos into clips using the following commands:

mkdir data/preprocessed/A1_clips
parallel -j 4 <data/A1_cut.sh

Split the processed_annotation_A1.csv into train and val subsets using the following commands:

python tools/data_converter/split_anno.py

Make annotation files for training on the whole A1 set:

$ mkdir data/annotations/splits/full
$ cat data/annotations/splits/splits_1/train.csv data/annotations/splits/splits_1/val.csv >data/annotations/splits/full/train.csv
$ cp data/annotations/splits/splits_1/val.csv data/annotations/splits/full/

Download the MViTv2-S pretrained model from here and put it as MViTv2_S_16x4_pretrained.pyth under data/ckpts.

Training

To train the model, set the working directory to be the project root path, then run:

export PYTHONPATH=$PWD/:$PYTHONPATH
python tools/run_net.py --cfg configs/MVITV2_S_16x4_448.yaml

Inference

1.Copy the video_ids.csv from the original dataset to the annotation directory:

cp data/original/A2/video_ids.csv data/annotations
mv data/annotations/video_ids.csv data/annotations/A2_video_ids.csv

Create the video list using the following:

python tools/data_converter/create_test_video_lst.py

Download the backbone model checkpoint from here, which is the one that achieved an overlap score of 0.5921 on the A2 test set and put it at data/ckpts/MViTv2_S_16x4_aicity_200ep.pyth.
Run the DAR module using the following commands, which will generate preliminary results stored at data/runs/A2 in pickle format:

python tools/inference/driver_action_rec.py

Run the Election algorithm to refine the preliminary findings from the DAR module using the following commands:

python tools/inference/election.py

The submission file is at data/inferences/submit.txt, which achieves an overlap score of 0.5921 on the A2 test set.

Citing M²DAR

If you use this code for your research, please cite our paper:

@inproceedings{ma_m2dar_2023,
	title = {{M2DAR}: {Multi}-{View} {Multi}-{Scale} {Driver} {Action} {Recognition} with {Vision} {Transformer}},
	booktitle = {2023 {IEEE}/{CVF} {Conference} on {Computer} {Vision} and {Pattern} {Recognition} {Workshops} ({CVPRW})},
	author = {Ma, Yunsheng and Yuan, Liangqi and Abdelraouf, Amr and Han, Kyungtae and Gupta, Rohit and Li, Zihao and Wang, Ziran},
	month = jun,
	year = {2023},
}

Acknowledgements

This repo is based on the PySlowFast and AiCityAction repos. Many thanks!

PurdueDigitalTwin / M2DAR

M²DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer

Introduction

Methodology

Installation

Data preparation

Training

Inference

Citing M²DAR

Acknowledgements

About

Languages

M2DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer

Introduction

Methodology

Installation

Data preparation

Training

Inference

Citing M2DAR

Acknowledgements

About

Languages

M²DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer

Citing M²DAR