backpropper

followers

following

stars

MILA

London, United Kingdom

https://www.guabhinav.com

Abhinav Gupta's repositories

academicpages.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptMIT1 10

babelcode

Language:PythonApache-2.0000

distrax

Language:PythonApache-2.0000

dm_robotics

Libraries, tools and tasks created and used at DeepMind Robotics.

Language:PythonApache-2.0010

ede

Code for the paper "Uncertainty-Driven Exploration for Generalization in Reinforcement Learning".

Language:PythonNOASSERTION010

evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

NOASSERTION000

human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"

MIT000

inference

Reference implementations of MLPerf™ inference benchmarks

Apache-2.0000

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:PythonBSD-3-Clause000

llama

Inference code for LLaMA models

Language:PythonNOASSERTION000

llama-recipes

Examples and recipes for Llama model

Language:PythonNOASSERTION000

LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Language:PythonApache-2.0000

math

The MATH Dataset (NeurIPS 2021)

Language:PythonMIT000

mlx

MLX: An array framework for Apple silicon

Language:C++MIT000

mlx-examples

Examples in the MLX framework

Language:PythonMIT000

openai-cookbook

Examples and guides for using the OpenAI API

Language:Python010

openai-quickstart-python

Python example app from the OpenAI API quickstart tutorial

Language:CSS010

OpenDevin

🐚 OpenDevin: Code Less, Make More

Language:PythonMIT000

optax

Optax is a gradient processing and optimization library for JAX.

Language:PythonApache-2.0000

PromptPG

Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".

Language:PythonMIT000

reward-bench

RewardBench: the first evaluation tool for reward models.

Language:PythonApache-2.0000

rliable

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

Language:Jupyter NotebookApache-2.0010

Stable-Alignment

Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".

Language:PythonNOASSERTION000

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Language:PythonApache-2.0000

summarize-from-feedback

Code for "Learning to summarize from human feedback"

Language:PythonNOASSERTION010

SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.29% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run.

MIT000

tiktoken

Language:PythonMIT010

training

Reference implementations of MLPerf™ training benchmarks

Language:PythonApache-2.0000

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.0000

website

000