Wei Xiong's repositories
Decentralized-Proximal-Algorithm-with-Variance-Reduction
This is the code used for the paper "PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction", prepint.
Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning
This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DPO and KTO.
multi-armed-bandit-test-framework
This is the code about multi_armed bandit used for my undergraduate thesis.
LMFlow_RAFT_Dev
This is a sub-branch for developing RAFT algorithm.
MPMAB_BEACON
This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.
multi_player_multi_armed_bandit_algorithms
Implementation of state-of-the-art multi-player multi-armed bandit problem algorithms.
Observe_then_Incentivize
This is the official implementation for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents" in NeurIPS 2021.
RLHF-Reward-Modeling-dev
Recipes to train reward model for RLHF.
awesome-offline-rl
An index of algorithms for offline reinforcement learning (offline-rl)
awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
Markdown4Zhihu
一键解决知乎导入Markdown文件时图片和公式等问题。
ai-for-grant-writing
A curated list of resources for using LLMs to develop more competitive grant applications.
alignment-handbook
Robust recipes to align language models with human and AI preferences
BanditLib
Library of contextual bandits algorithms
functionary
Chat language model that can use tools and interpret the results
NeMo-Skills
A pipeline to improve skills of large language models
RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.
reward-bench
RewardBench: the first evaluation tool for reward models.
sample-efficient-bayesian-rl
Source for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL
UltraFeedback
A large-scale, fine-grained, diverse preference dataset (and models).
Xwin-LM
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
zhihu
我的知乎内容