Hideki105

PyTorch implementation of the Offline Reinforcement Learning algorithm CQL. Includes the versions DQN-CQL and SAC-CQL for discrete and continuous action spaces.

Language:Python000

dace

DACE: Distribution-Aware Counterfactual Explanation [IJCAI-20]

Language:PythonMIT000

deep-learning-from-scratch-4

ゼロから作るDeep Learning④強化学習編

Language:Jupyter NotebookMIT000

Diffusion-Models-pytorch

Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)

Language:PythonApache-2.0000

Diffusion-Policies-for-Offline-RL

Language:PythonApache-2.0000

doro

Distributional and Outlier Robust Optimization (ICML 2021)

Language:PythonMIT000

gail-pytorch

PyTorch implementation of GAIL and PPO reinforcement learning algorithms

Language:Python000

linear-programming

主双対内点法による線形計画法

Language:PythonMIT000

gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"

NOASSERTION000

gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch

MIT000

graduate_exam

京都大学数学系の院試の問題と解答です

CC-BY-SA-4.0000

GraduateSchoolEntranceExamination

東京大学大学院情報理工学系研究科入試問題過去問解答など

Language:TeX000

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

MIT000

manifold-optimization-book

『多様体上の最適化理論』サポートページ

000

minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

MIT000

mogp

Mixture of Gaussian Processes Model for Sparse Longitudinal Data

BSD-3-Clause000

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

MIT000

PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

MIT000

python_simple_mppi

Python implementation of MPPI (Model Predictive Path-Integral) controller to understand the basic idea. Mandatory dependencies are numpy and matplotlib only.

NOASSERTION000

riemannian-optimization

リーマン多様体上の最適化

Language:PythonMIT010

robust-optimization

ロバスト最適化

Language:PythonMIT000

robustOT

Robust Optimal Transport code

000

sam

SAM: Sharpness-Aware Minimization (PyTorch)

MIT000

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

MIT000