kykim0

kykim0's starred repositories

dreamer

Dream to Control: Learning Behaviors by Latent Imagination

Language:PythonApache-2.059600

nlp-bible-code

자연어처리 바이블의 실습 자료입니다.

Language:Jupyter Notebook4900

KULLM

☁️ 구름(KULLM): 고려대학교에서 개발한, 한국어에 특화된 LLM

Apache-2.054600

SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

Language:PythonApache-2.090100

open-instruct

Language:PythonApache-2.0109800

data-selection-survey

A Survey on Data Selection for Language Models

CC0-1.011900

LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

Language:Jupyter NotebookMIT28600

mamba

Mamba SSM architecture

Language:PythonApache-2.01177800

calibration-framework

The net:cal calibration framework is a Python 3 library for measuring and mitigating miscalibration of uncertainty estimates, e.g., by a neural network.

Language:PythonApache-2.032700

captum

Model interpretability and understanding for PyTorch

Language:PythonBSD-3-Clause472000

tabular-benchmark

Language:Python43600

OpenFE

OpenFE: automated feature generation with expert-level performance

Language:PythonMIT70700

evolutionary-model-merge

Official repository of Evolutionary Optimization of Model Merging Recipes

Language:PythonApache-2.0110600

reward-bench

RewardBench: the first evaluation tool for reward models.

Language:PythonApache-2.029400

transformer-debugger

Language:PythonMIT398100

awesome-llm-human-preference-datasets

A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.

MIT28000

enn

Language:PythonApache-2.028400

stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Language:PythonMIT841000

tmux-resurrect

Persists tmux environment across system restarts.

Language:ShellMIT1096800

alpaca-lora

Instruct-tune LLaMA on consumer hardware

Language:Jupyter NotebookApache-2.01841900

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.03578600

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02317200

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonApache-2.0423300

ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Language:PythonNOASSERTION58500

tikzplotlib

:bar_chart: Save matplotlib figures as TikZ/PGFplots for smooth integration into LaTeX.

Language:PythonMIT235600

weak-to-strong

Language:PythonMIT245900

alpaca_farm

A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.

Language:PythonApache-2.074100

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonMIT3489900

DirectBehaviorSpecification

Code to reproduce the Arena environment experiments from Direct Behavior Specification via Constrained Reinforcement Learning.

Language:ASP.NETNOASSERTION1900

flax

Flax is a neural network library for JAX that is designed for flexibility.

Language:PythonApache-2.0582000