Costa Huang's starred repositories
PokemonRedExperiments
Playing Pokemon Red with Reinforcement Learning
direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
summarize-from-feedback
Code for "Learning to summarize from human feedback"
text-clustering
Easily embed, cluster and semantically label text datasets
paged-attention-minimal
a minimal cache manager for PagedAttention, on top of llama3.
cogment-lab
A toolkit for practical Human-AI cooperation research