Wei Xiong's repositories

Decentralized-Proximal-Algorithm-with-Variance-Reduction

This is the code used for the paper "PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction", prepint.

Language:PythonStargazers:15Issues:1Issues:0

Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning

This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DPO and KTO.

Language:PythonStargazers:12Issues:0Issues:0

multi-armed-bandit-test-framework

This is the code about multi_armed bandit used for my undergraduate thesis.

Language:PythonStargazers:5Issues:1Issues:0

LMFlow_RAFT_Dev

This is a sub-branch for developing RAFT algorithm.

Language:PythonLicense:Apache-2.0Stargazers:4Issues:0Issues:0
Stargazers:2Issues:0Issues:0

MPMAB_BEACON

This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.

Language:PythonLicense:MITStargazers:2Issues:0Issues:0
Language:PythonStargazers:1Issues:0Issues:0

multi_player_multi_armed_bandit_algorithms

Implementation of state-of-the-art multi-player multi-armed bandit problem algorithms.

Language:PythonStargazers:1Issues:1Issues:0

Observe_then_Incentivize

This is the official implementation for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents" in NeurIPS 2021.

Language:PythonLicense:MITStargazers:1Issues:0Issues:0
Language:PythonStargazers:1Issues:0Issues:0

RLHF-Reward-Modeling-dev

Recipes to train reward model for RLHF.

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

ToRA

ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].

Language:PythonLicense:MITStargazers:1Issues:0Issues:0
Language:PythonStargazers:1Issues:0Issues:0

awesome-offline-rl

An index of algorithms for offline reinforcement learning (offline-rl)

Stargazers:0Issues:0Issues:0

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

License:Apache-2.0Stargazers:0Issues:0Issues:0

Markdown4Zhihu

一键解决知乎导入Markdown文件时图片和公式等问题。

Language:PythonStargazers:0Issues:0Issues:0

ai-for-grant-writing

A curated list of resources for using LLMs to develop more competitive grant applications.

License:CC-BY-4.0Stargazers:0Issues:0Issues:0

alignment-handbook

Robust recipes to align language models with human and AI preferences

License:Apache-2.0Stargazers:0Issues:0Issues:0

BanditLib

Library of contextual bandits algorithms

License:MITStargazers:0Issues:0Issues:0

functionary

Chat language model that can use tools and interpret the results

License:MITStargazers:0Issues:0Issues:0

NeMo-Skills

A pipeline to improve skills of large language models

License:Apache-2.0Stargazers:0Issues:0Issues:0

RAFT

This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.

Stargazers:0Issues:0Issues:0

reward-bench

RewardBench: the first evaluation tool for reward models.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

sample-efficient-bayesian-rl

Source for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL

License:MITStargazers:0Issues:0Issues:0

UltraFeedback

A large-scale, fine-grained, diverse preference dataset (and models).

License:MITStargazers:0Issues:0Issues:0
Language:HTMLStargazers:0Issues:1Issues:0

Xwin-LM

Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment

Language:PythonStargazers:0Issues:0Issues:0

zhihu

我的知乎内容

License:NOASSERTIONStargazers:0Issues:0Issues:0