molierflower / PaddleMM

Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

简体中文 | English

简介

多模态学习工具包 PaddleMM 以百度 PaddlePaddle 平台为主,兼容 PyTorch 提供 torch 版本,旨在于提供模态联合学习和跨模态学习算法模型库,为处理图片文本等多模态数据提供高效的解决方案,助力多模态学习应用落地。

PaddleMM 发布作者:

近期更新

2022.5.8

  • Add model TMC

更多

特性

  • 丰富的任务场景:工具包提供多模态融合、跨模态检索、图文生成等多种多模态学习任务算法模型库,支持用户自定义数据和训练。
  • 成功的工业应用:基于工具包算法已有相关落地应用,如球鞋真伪鉴定、图像字幕生成、舆情监控等。

应用展示

  • 球鞋真伪鉴定 (更多信息欢迎访问我们的网站 Ysneaker !)
  • 更多应用

落地实践

  • 与百度人才智库(TIC)合作 智能招聘 简历分析,基于多模态融合算法成功落地。

框架

PaddleMM 包括 paddle 版本 paddlemm 包和 torch 版本 torchmm,由以下三个模块组成:

  • 数据处理:提供统一的数据接口和多种数据处理格式
  • 模型库:包括多模态融合、跨模态检索、图文生成、多任务算法
  • 训练器:对每种任务设置统一的训练流程和相关指标计算

使用

下载工具包

git clone https://github.com/njustkmg/PaddleMM.git

paddlemm 使用示例:

from paddlemm import PaddleMM

# config: Model running parameters, see configs/
# data_root: Path to dataset
# image_root: Path to images
# gpu: Which gpu to use

runner = PaddleMM(config='configs/cmml.yml',
                  data_root='data/COCO', 
                  image_root='data/COCO/images', 
                  out_root='experiment/cmml_paddle',
                  gpu=0)

runner.train()
runner.test()

或者

python run.py --config configs/cmml.yml --data_root data/COCO --image_root data/COCO/images --out_root experiment/cmml_paddle --gpu 0

torchmm 使用示例:

from torchmm import TorchMM

# config: Model running parameters, see configs/
# data_root: Path to dataset
# image_root: Path to images
# cuda: Which gpu to use

runner = TorchMM(config='configs/cmml.yml',
                 data_root='data/COCO', 
                 image_root='data/COCO/images',
                 out_root='experiment/cmml_torch',
                 cuda=0)

runner.train()
runner.test()

或者

python run_torch.py --config configs/cmml.yml --data_root data/COCO --image_root data/COCO/images --out_root experiment/cmml_torch --cuda 0

模型库 (持续更新中)

技术支撑

多模态论文

  • Chuan Qin, Hengshu Zhu, Tong Xu, Chen Zhu, Liang Jiang, Enhong Chen, Hui Xiong, Enhancing Person-Job Fit for Talent Recruitment: An Ability-aware Neural Network Approach, In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-2018) , Ann Arbor, Michigan, USA, 2018.
  • Chen Zhu, Hengshu Zhu, Hui Xiong, Chao Ma, Fang Xie, Pengliang Ding, Pan Li, Person-Job Fit: Adapting the Right Talent for the Right Job with Joint Representation Learning, In ACM Transactions on Management Information Systems (ACM TMIS), 2018.
  • Dazhong Shen, Hengshu Zhu, Chuan Qin, Tong Xu, Enhong Chen, Hui Xiong, Joint Representation Learning with Relation-enhanced Topic Models for Intelligent Job Interview Assessment, In ACM Transactions on Information Systems (ACM TOIS) , 2021.
  • Yang Yang, Jia-Qi Yang, Ran Bao, De-Chuan Zhan, Hengshu Zhu, Xiao-Ru Gao, Hui Xiong, Jian Yang. Corporate Relative Valuation using Heterogeneous Multi-Modal Graph Neural Network. IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), 2021. (CCF-A). Code
  • Yang Yang, De-Chuan Zhan, Yi-Feng Wu, Zhi-Bin Liu, Hui Xiong, and Yuan Jiang. Semi-Supervised Multi-Modal Clustering and Classification with Incomplete Modalities. IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), 2020. (CCF-A)
  • Yang Yang, Chubing Zhang, Yi-Chu Xu, Dianhai Yu, De-Chuan Zhan, Jian Yang. Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective. Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI-2021), Montreal, Canada, 2021. (CCF-A).
  • Yang Yang, Yi-Feng Wu, De-Chuan Zhan, Zhi-Bin Liu, Yuan Jiang. Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport. Proceedings of the Annual Conference on ACM SIGKDD (KDD-2018) , London, UK, 2018. Code

更多论文欢迎访问我们的网站 njustkmg

飞桨论文复现挑战赛

  • 飞桨论文复现挑战赛 (第四期):《Comprehensive Semi-Supervised Multi-Modal Learning》赛题冠军
  • 飞桨论文复现挑战赛 (第五期):《From Recognition to Cognition: Visual Commonsense Reasoning》赛题冠军

贡献

  • PaddlePaddle 复现代码问题记录 链接
  • 我们非常欢迎您为 PaddleMM 贡献代码,也十分感谢你的反馈。

许可证书

本项目的发布受 Apache 2.0 license 许可认证。

About

Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.

License:Apache License 2.0


Languages

Language:Python 100.0%