Gabriel Martín Blázquez's starred repositories
awesome-synthetic-datasets
awesome synthetic (text) datasets
candle-ext
An extension library to Candle that provides PyTorch functions not currently available in Candle
cohere-toolkit
Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.
StringZilla
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
faster-fifo
Faster alternative to Python's multiprocessing.Queue (IPC FIFO queue)
data-is-better-together
Let's build better datasets, together!
transformer-heads
Toolkit for attaching, training, saving and loading of new heads for transformer models
distilabel-spin-dibt
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
LLaMA-Factory
Unify Efficient Fine-Tuning of 100+ LLMs
fsdp_qlora
Training LLMs with QLoRA + FSDP
text-clustering
Easily embed, cluster and semantically label text datasets
LLM-Blender
[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the diverse strengths of multiple open-source LLMs. LLM-Blender cut the weaknesses through ranking and integrate the strengths through fusing generation to enhance the capability of LLMs.
vertex-ai-huggingface
🤗 Collection of examples on how to train, deploy and monitor HuggingFace models in Google Cloud Vertex AI