Ziniu Li's repositories
policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
RL-PPO-Keras
Proximal Policy Optimization(PPO) with Keras Implementation
alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
baby-llama2-chinese
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
Chinese-LLaMA-Alpaca-2
中文 LLaMA-2 & Alpaca-2 大模型二期项目 + 本地CPU/GPU训练部署 (Chinese LLaMA-2 & Alpaca-2 LLMs)
clash-for-linux
Linux 端使用 Clash 作为代理工具
deep-learning-notes
Experiments with Deep Learning
go-explore
Code for Go-Explore: a New Approach for Hard-Exploration Problems
gym-minigrid
Minimalistic gridworld package for OpenAI Gym
Model-Uncertainty-in-Neural-Networks
TensorFlow implementation of Model-Uncertainty-in-Neural-Networks
random-network-distillation
Code for the paper "Exploration by Random Network Distillation"
sample-efficient-bayesian-rl
Source for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL
webpage-template
Adapted from the widely used project webpage template made by the colorful folks.