zjukg / Structure-CLIP

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations

Home Page:https://arxiv.org/abs/2305.06152

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Structure-CLIP

license arxiv badge AAAI Pytorch

This paper introduces an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge to enhance multi-modal structured representations.

πŸ”” News

🌈 Model Architecture

Model_architecture

πŸ“š Dataset Download

Training datasets are available here (Code: 33ri).

πŸ“• Code Path

Code Structures

There are four parts in the code.

  • model: It contains the main files for Structure-CLIP network.
  • data: It contains the pre-training data splits and downstream dataset.
  • checkpoints: It saves checkpoint for reloading.
  • script: The training scripts for Structure-CLIP.

πŸ”¬ Dependencies

  • Python 3
  • PyTorch >= 1.8.0
  • Transformers>= 4.11.3
  • NumPy
  • All experiments are performed with one A100 GPU.

πŸš€ Train & Eval

The training script:

bash script/run.sh
[--train_path TRAIN_PATH] [--test_path TEST_PATH] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--lr LEARNING-RATE] [--weight_decay WEIGHT_DECAY] [--knowledge_weight KNOWLEDGE_WEIGHT] [--transformer_layer_num NUMBER] [--model_name MODEL_NAME] [--neg_loss_weight NEG_LOSS_WEIGHT] 

Note:

  • you can open the .sh file for parameter modification.

🀝 Cite:

Please consider citing this paper if you use the code or data from our work. Thanks a lot :)

@inproceedings{DBLP:conf/aaai/StructureCLIP,
  author       = {Yufeng Huang and
                  Jiji Tang and
                  Zhuo Chen and
                  Rongsheng Zhang and
                  Xinfeng Zhang and
                  Weijie Chen and
                  Zeng Zhao and
                  Zhou Zhao and
                  Tangjie Lv and
                  Zhipeng Hu and
                  Wen Zhang},
  title        = {Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations},
  booktitle    = {{AAAI}},
  publisher    = {{AAAI} Press},
  year         = {2024}
}

About

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations

https://arxiv.org/abs/2305.06152


Languages

Language:Python 99.9%Language:Shell 0.1%