kykim0's starred repositories
nlp-bible-code
자연어처리 바이블의 실습 자료입니다.
data-selection-survey
A Survey on Data Selection for Language Models
calibration-framework
The net:cal calibration framework is a Python 3 library for measuring and mitigating miscalibration of uncertainty estimates, e.g., by a neural network.
evolutionary-model-merge
Official repository of Evolutionary Optimization of Model Merging Recipes
reward-bench
RewardBench: the first evaluation tool for reward models.
awesome-llm-human-preference-datasets
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
tmux-resurrect
Persists tmux environment across system restarts.
alpaca-lora
Instruct-tune LLaMA on consumer hardware
alignment-handbook
Robust recipes to align language models with human and AI preferences
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
tikzplotlib
:bar_chart: Save matplotlib figures as TikZ/PGFplots for smooth integration into LaTeX.
alpaca_farm
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
DirectBehaviorSpecification
Code to reproduce the Arena environment experiments from Direct Behavior Specification via Constrained Reinforcement Learning.