pacman100

Sourab Mangrulkar's starred repositories

t-few

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

Language:PythonMIT42600

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonApache-2.01247000

petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Language:PythonMIT911700

natural-instructions

Expanding natural instructions

Language:PythonApache-2.095000

hh-rlhf

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

MIT157400

pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs

Language:PythonBSD-3-Clause364500

bigcode-dataset

Language:Jupyter NotebookApache-2.035900

cc2dataset

Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...

Language:PythonMIT30400

annotated_deep_learning_paper_implementations

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Language:PythonMIT5446700

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT1962900

deep-rl-class

This repo contains the syllabus of the Hugging Face Deep Reinforcement Learning Course.

Language:MDXApache-2.0383700

P-tuning-v2

An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks

Language:PythonApache-2.0196800

fashion-iq

Language:Python14400

ml-stable-diffusion

Stable Diffusion with Core ML on Apple Silicon

Language:PythonMIT1673500

diffusion-models-class

Materials for the Hugging Face Diffusion Models Course

Language:Jupyter NotebookApache-2.0355900

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause1362600

torchscale

Foundation Architecture for (M)LLMs

Language:PythonMIT300400

PromptPapers

Must-read papers on prompt-based tuning for pre-trained language models.

405800

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonMIT1043200

electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Language:PythonApache-2.0232600

DeBERTa

The implementation of DeBERTa

Language:PythonMIT197300

parallelformers

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

Language:PythonApache-2.077600

galai

Model API for GALACTICA

Language:Jupyter NotebookApache-2.0267500

ZerO-initialization

Language:Jupyter NotebookApache-2.07300

E.T.

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Language:CMIT8500

teach

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Language:Python13400

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonApache-2.0885500

torchdynamo-tests

Language:Python2000

triton

Development repository for the Triton language and compiler

Language:C++MIT1291900

pytorch_geometric

Graph Neural Network Library for PyTorch

Language:PythonMIT2108100