chenchongthu

Chong Chen's starred repositories

Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language:PythonMIT382900

T2Ranking

T2Ranking: A large-scale Chinese benchmark for passage ranking.

Language:Python13800

We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台，我们欢迎开源爱好者发起任何有意义的pr！

Language:Jupyter NotebookApache-2.0250200

ce_pretrain

预训练中英文混合bert模型

Language:Python100

zuowen-dataset-pt1

:paper: 作文数据集 - 第 1 部分

1100

colbert

colbert for dense retrieval, including multi view version, dureader-retrieval as an example

Language:PythonApache-2.0600

OpenMatch

An Open-Source Package for Information Retrieval

Language:PythonMIT14000

haystack-search-engine

A Semantic Search Engine Built on Arxiv dataset from Kaggle.

Language:Jupyter Notebook700

haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Language:PythonApache-2.01425700