WeiXiongUST

Wei Xiong's repositories

Decentralized-Proximal-Algorithm-with-Variance-Reduction

This is the code used for the paper "PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction", prepint.

Language:Python15 10

Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning

This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DPO and KTO.

Language:Python1200

multi-armed-bandit-test-framework

This is the code about multi_armed bandit used for my undergraduate thesis.

Language:Python5 10

LMFlow_RAFT_Dev

This is a sub-branch for developing RAFT algorithm.

Language:PythonApache-2.0400

MATH6913U

200

MPMAB_BEACON

This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.

Language:PythonMIT200

iterative-rlhf

Language:Python100

multi_player_multi_armed_bandit_algorithms

Implementation of state-of-the-art multi-player multi-armed bandit problem algorithms.

Language:Python1 10

Observe_then_Incentivize

This is the official implementation for the paper "(Almost) Free Incentivized Exploration from Decentralized Learning Agents" in NeurIPS 2021.

Language:PythonMIT100

Online-RLHF

Language:Python100

RLHF-Reward-Modeling-dev

Recipes to train reward model for RLHF.

Language:PythonApache-2.0100

ToRA

ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].

Language:PythonMIT100

vllm_eval

Language:Python100

awesome-offline-rl

An index of algorithms for offline reinforcement learning (offline-rl)

000

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

Apache-2.0000

Markdown4Zhihu

一键解决知乎导入Markdown文件时图片和公式等问题。

Language:Python000

ai-for-grant-writing

A curated list of resources for using LLMs to develop more competitive grant applications.

CC-BY-4.0000

alignment-handbook

Robust recipes to align language models with human and AI preferences

Apache-2.0000

BanditLib

Library of contextual bandits algorithms

MIT000

functionary

Chat language model that can use tools and interpret the results

MIT000

NeMo-Skills

A pipeline to improve skills of large language models

Apache-2.0000

RAFT

This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or rejection sampling fine-tuning.

000

reward-bench

RewardBench: the first evaluation tool for reward models.

Language:PythonApache-2.0000

sample-efficient-bayesian-rl

Source for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL

MIT000

UltraFeedback

A large-scale, fine-grained, diverse preference dataset (and models).

MIT000

WeiXiongUST

Language:HTML010

Xwin-LM

Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment

Language:Python000

zhihu

我的知乎内容

NOASSERTION000