liziniu

Ziniu Li's repositories

ReMax

Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)

Language:Python121 2 3

policy_optimization

Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)

Language:Python23 10

RL-PPO-Keras

Proximal Policy Optimization(PPO) with Keras Implementation

Language:Python15 1 3

HyperDQN

Code for ICLR 2022 Paper (HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning)

Language:Python9 10

ISWBC

Code for NeurIPS 2023 Paper (Imitation Learning from Imperfection: Theoretical Justifications and Algorithms)

Language:Python4 10

ILwSD

Language:Python3 10

RLX

RLX is an RL codebase based on TensorFlow. It implements algorithms like SAC, ACER, GAIL and TRPO. It is easy to use.

Language:Python3 20

liziniu.github.io

Language:Python2 10

bib-merge

Language:Python1 20

alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Language:Jupyter NotebookApache-2.0000

baby-llama2-chinese

用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库；24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.

Language:PythonMIT000

baselines

Language:PythonMIT010

cgmm

Language:Python020

Chinese-LLaMA-Alpaca-2

中文 LLaMA-2 & Alpaca-2 大模型二期项目 + 本地CPU/GPU训练部署 (Chinese LLaMA-2 & Alpaca-2 LLMs)

Language:PythonApache-2.0000

clash-for-linux

Linux 端使用 Clash 作为代理工具

Language:Shell000

CVAE

Language:Python020

dagger

Language:Python020

deep-learning-notes

Experiments with Deep Learning

Language:Jupyter Notebook010

go-explore

Code for Go-Explore: a New Approach for Hard-Exploration Problems

Language:PythonNOASSERTION020

gym-fetch-stack

Language:Python010

gym-minigrid

Minimalistic gridworld package for OpenAI Gym

Language:PythonBSD-3-Clause020

iclr-blog-track.github.io

Language:HTMLNOASSERTION000

Maze

Language:Jupyter Notebook010

Model-Uncertainty-in-Neural-Networks

TensorFlow implementation of Model-Uncertainty-in-Neural-Networks

Language:Jupyter Notebook020

random-network-distillation

Code for the paper "Exploration by Random Network Distillation"

Language:Python030

rllab-curriculum

Language:PythonNOASSERTION010

sample-efficient-bayesian-rl

Source for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL

Language:Jupyter NotebookMIT010

stable-baselines

Language:PythonMIT020

SuperMario

Language:Jupyter Notebook020

webpage-template

Adapted from the widely used project webpage template made by the colorful folks.

Language:HTML000