OpenDataLab's repositories
WanJuan1.0
万卷1.0多模态语料
opendatalab-python-sdk
SDK of OpenDataLab - https://opendatalab.org.cn
CLIP-Parrot-Bias
Parrot Captions Teach CLIP to Spot Text
opendatalab-datasets
datasets resource
labelU-Kit
Data annotation component library --provided as NPM packages
MLLM-DataEngine
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
labelU-frontend
LabelU front-end library
WanJuan2.0-WanJuan-CC
WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。
000
Apache-2.0000