Zehan Li's repositories
dual-cross-encoder
Dual Cross Encoder for Dense Retrieval
embeddings
Training Large-scale Text Embedding Models with 🤗 Transformers
ParaSolver
Numerical simulation of particle deformation in the fluid flow
GR-for-KBQG
Graph Retrieval for Question Generation over Knowledge Base
triple2seq
PyTorch reimplementation of Serban et al.'s paper "Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus" at ACL'2016
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
conv-diff-mp
Solving 2D convection diffusion equation using julia multiprocessing
cs224n
Coding assignments in CS224N 2021
hello-world
test
datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
freeze-entity
Robust Text Classification via Entity Freezing
huggingface_hub
The official Python client for the Huggingface Hub.
lm-evaluation-harness
A framework for few-shot evaluation of language models.
MachineLearning
Implementation of some common machine learning models from scratch with Numpy
mteb
MTEB: Massive Text Embedding Benchmark
nanotron
Minimalistic large language model 3D-parallelism training
tevatron
Tevatron - A flexible toolkit for dense retrieval research and development.
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
yologo-dataset
YOLO based logo detection