Loubna Ben Allal's repositories
santacoder-finetuning
Fine-tune SantaCoder for Code/Text Generation.
nanotron-smol-cluster
Megatron-LM setup in the smol-cluster
Sign-Segmentation-with-Transformers
Detection of temporal boundaries in sign language videos, as part of the Object Recognition & Computer Vision course in the MVA master program.
bloom-code-evaluation
Evaluation of BLOOM on the HumanEval benchmark
bigcode-analysis
Repository for analysis and experiments in the BigCode project.
canine-mednli
CANINE for Medical Natural Language Inference on MedNLI data, as part of the Algorithms for Speech and NLP course of the MVA master program.
Link-Prediction-in-Citation-Graphs
Link prediction in a Citation Network using text and graph-based features. This work is part of the ALTEGRAD course in the MVA master.
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
apps
APPS: Automated Programming Progress Standard (NeurIPS 2021)
bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
blog
Public repo for HF blog posts
data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
hub-docs
Frontend components, documentation and information hosted on the Hugging Face website.
lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
MultiPL-E
A multi-programming language benchmark for evaluating the performance of large language model of code.
odex
Execution-Based Evaluation for Open Domain Code Generation
presidio
Context aware, pluggable and customizable data protection and de-identification SDK for text and images
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.