Gu Pengjie's starred repositories

outer-value-function-meta-rl

Code of the paper: Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function

Language:Jupyter NotebookStargazers:13Issues:0Issues:0

efficient-kan

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Language:PythonLicense:MITStargazers:3619Issues:0Issues:0

SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward

Language:PythonLicense:MITStargazers:542Issues:0Issues:0

f-divergence-dpo

Direct preference optimization with f-divergences.

Language:PythonLicense:Apache-2.0Stargazers:9Issues:0Issues:0

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonLicense:Apache-2.0Stargazers:4247Issues:0Issues:0

Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

Language:PythonLicense:MITStargazers:1204Issues:0Issues:0

Academic-project-page-template

A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/

Language:JavaScriptStargazers:1552Issues:0Issues:0

cpl

Code for Contrastive Preference Learning (CPL)

Language:PythonLicense:MITStargazers:142Issues:0Issues:0

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Language:PythonLicense:Apache-2.0Stargazers:1889Issues:0Issues:0

Synapse

[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control

Language:HTMLLicense:MITStargazers:41Issues:0Issues:0

video-subtitle-extractor

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Language:PythonLicense:Apache-2.0Stargazers:5350Issues:0Issues:0
Language:PythonLicense:MITStargazers:241Issues:0Issues:0

LLM-Agents-Papers

A repo lists papers related to LLM based agent

Language:PythonStargazers:855Issues:0Issues:0

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Language:PythonLicense:MITStargazers:1175Issues:0Issues:0

d3rlpy

An offline deep reinforcement learning library

Language:PythonLicense:MITStargazers:1261Issues:0Issues:0

Awesome-LLM-for-RecSys

Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics.

License:MITStargazers:855Issues:0Issues:0

oprl

Official Codebase for TMLR 2023, Benchmarks and Algorithms for Offline Preference-Based Reward Learning

Language:PythonLicense:MITStargazers:16Issues:0Issues:0

AlignLLMHumanSurvey

Aligning Large Language Models with Human: A Survey

Stargazers:643Issues:0Issues:0

tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings

Language:CudaLicense:BSD-3-ClauseStargazers:1754Issues:0Issues:0

Variational-Recurrent-Models

Codes for the study "Variational Recurrent Models for Solving Partially Observable Control Tasks", published as a conference paper at ICLR 2020 (https://openreview.net/forum?id=r1lL4a4tDB)

Language:PythonLicense:MITStargazers:49Issues:0Issues:0

VQ-VAE

Minimalist implementation of VQ-VAE in Pytorch

Language:PythonLicense:BSD-3-ClauseStargazers:481Issues:0Issues:0

vqvae

VQ-VAE implementation in pytorch, supporting EMA and Gumbel trainings. Applicable for images and time series.

Language:Jupyter NotebookStargazers:9Issues:0Issues:0

dreamerv2

Pytorch implementation of Dreamer-v2: Visual Model Based RL Algorithm.

Language:PythonLicense:MITStargazers:234Issues:0Issues:0

dreamerv3-torch

Implementation of Dreamer v3 in pytorch.

Language:PythonLicense:MITStargazers:358Issues:0Issues:0

awesome-offline-rl

An index of algorithms for offline reinforcement learning (offline-rl)

Stargazers:886Issues:0Issues:0

TradeMaster

TradeMaster is an open-source platform for quantitative trading empowered by reinforcement learning :fire: :zap: :rainbow:

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1257Issues:0Issues:0

unity-ml-agents-turret-defense

A reinforcement learning agent playing as the turret, where its goal is to allow ten friendly units to enter the base, and loses if an enemy unit has entered the base or if two friendly units were shot.

Language:TeXLicense:Apache-2.0Stargazers:16Issues:0Issues:0

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:66782Issues:0Issues:0

OpenPSG

Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22

Language:PythonLicense:MITStargazers:403Issues:0Issues:0

PyTorch-Pretrained-ViT

Vision Transformer (ViT) in PyTorch

Language:PythonStargazers:761Issues:0Issues:0