Costa Huang's starred repositories
PokemonRedExperiments
Playing Pokemon Red with Reinforcement Learning
direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
summarize-from-feedback
Code for "Learning to summarize from human feedback"
RingAttention
Transformers with Arbitrarily Large Context
text-clustering
Easily embed, cluster and semantically label text datasets
docstring_parser
Parse Python docstrings in various flavors.
pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
cogment-lab
A toolkit for practical Human-AI cooperation research