liujf69 / MMCL-Action

[ACMMM 2024] Implementation of the paper “Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition“.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition

This is the official repo of Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition and our work is accepted by ACM Multimedia 2024 (ACM MM).
Paper
PWC
PWC
PWC

Download dataset

  1. NTU-RGB+D 60 dataset from https://rose1.ntu.edu.sg/dataset/actionRecognition/
  2. NTU-RGB+D 120 dataset from https://rose1.ntu.edu.sg/dataset/actionRecognition/
  3. NW-UCLA dataset from https://wangjiangb.github.io/my_data.html
  4. UTD-MHAD dataset from https://www.utdallas.edu/~kehtar/UTD-MHAD.html
  5. SYSU-Action dataset from https://www.isee-ai.cn/%7Ehujianfang/ProjectJOULE.html

Process dataset

  1. Refer to the method of CTR-GCN or TD-GCN for processing and preserving the skeleton data.
  2. Refer to the method of Extract_NTU_Person for processing and preserving the Video data.
  3. Refer to the method of LLMs for get the text features. MiniGPT-4, BLIP, DeepSeek-VL, GLM-4.
Frist, you must git clone the project of the multimodal LLMs.
Then, you need to preserve the text features, not the text content.
We suggest adopting more advanced multimodal LLMs (e.g. GLM-4V and DeepSeek-VL) and more complex prompts to obtain text features.

Train Model

Please store the data of different modalities in the specified path and modify the config file accordingly.

# NTU120-XSub
python main_MMCL.py --device 0 1 --config ./config/nturgbd120-cross-subject/joint.yaml

# NTU120-XSet
python main_MMCL.py --device 0 1 --config ./config/nturgbd120-cross-set/joint.yaml

# NTU60-XSub
python main_MMCL.py --device 0 1 --config ./config/nturgbd-cross-subject/joint.yaml

# NTU60-XView
python main_MMCL.py --device 0 1 --config ./config/nturgbd-cross-view/joint.yaml

Test Model

# NTU120-XSub
python main_MMCL.py --device 0 --config ./config/nturgbd120-cross-subject/joint.yaml --phase test --weights <work_dir>/NTU120-XSub.pt

# NTU120-XSet
python main_MMCL.py --device 0 --config ./config/nturgbd120-cross-set/joint.yaml --phase test --weights <work_dir>/NTU120-XSet.pt

# NTU60-XSub
python main_MMCL.py --device 0 --config ./config/nturgbd-cross-subject/joint.yaml --phase test --weights <work_dir>/NTU60-XSub.pt

# NTU60-XView
python main_MMCL.py --device 0 --config ./config/nturgbd-cross-view/joint.yaml --phase test --weights <work_dir>/NTU60-XView.pt

Result

Method NTU-60 X-Sub NTU-60 X-View NTU-120 X-Sub NTU-120 X-Set NW-UCLA
MMCL 93.5% 97.4% 90.3% 91.7% 97.5%
cd Ensemble
# NTU120-XSub
python ensemble.py \
--J_Score ./Score/NTU120_XSub_J.pkl \
--B_Score ./Score/NTU120_XSub_B.pkl \
--JM_Score ./Score/NTU120_XSub_JM.pkl \
--BM_Score ./Score/NTU120_XSub_BM.pkl \
--HDJ_Score ./Score/NTU120_XSub_HDJ.pkl \
--HDB_Score ./Score/NTU120_XSub_HDB.pkl \
--val_sample ./Val_Sample/NTU120_XSub_Val.txt \
--benchmark NTU120XSub

# Others are similar to this way.

Thanks

Our project is based on the CTR-GCN, TD-GCN, EPP-Net, BLIP, MiniGPT-4.

Citation

@inproceedings{liu2024mmcl,
  author = {Liu, Jinfu and Chen, Chen and Liu, Mengyuan},
  title = {Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition}, 
  booktitle = {Proceedings of the ACM Multimedia (ACM MM)}, 
  year = {2024}
}

About

[ACMMM 2024] Implementation of the paper “Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition“.


Languages

Language:Python 100.0%