There are 4 repositories under exploration-exploitation topic.
Classic papers and resources on recommendation
OpenDILab Decision AI Engine
For deep RL and the future of AI.
推荐、广告工业界经典以及最前沿的论文、资料集合/ Must-read Papers on Recommendation System and CTR Prediction
Python implementations of contextual bandits algorithms
Code to reproduce the experiments in Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation (MEEE).
A curated list of awesome exploration RL resources (continually updated)
This is the pytorch implementation of ICML 2018 paper - Self-Imitation Learning.
Code for NeurIPS 2022 paper Exploiting Reward Shifting in Value-Based Deep RL
Source for the sample efficient tabular RL submission to the 2019 NIPS workshop on Biological and Artificial RL
Repository Containing Comparison of two methods for dealing with Exploration-Exploitation dilemma for MultiArmed Bandits
Personalized and Interactive Music Recommendation with Bandit approach
Focuses on Reinforcement Learning related concepts, use cases, and learning approaches
Official implementation of LECO (NeurIPS'22)
Deep Intrinsically Motivated Exploration in Continuous Control
A short implementation of bandit algorithms - ETC, UCB, MOSS and KL-UCB
Some Key Points from the Deep Learning Tuning Playbook
Research Thesis - Reinforcement Learning
This project focuses on comparing different Reinforcement Learning Algorithms, including monte-carlo, q-learning, lambda q-learning epsilon-greedy variations, etc.
Action elimination for multi-armed bandits
This repository contains a variety of projects related to reinforcement learning, showcasing different approaches to implementing it in various scenarios.
This project uses Reinforcement Learning to teach an agent to drive by itself and learn from its observations so that it can maximize the reward(180+ lines)
An Optimistic Approach to the Q-Network Error in Actor-Critic Methods
Human and sim. behavioral / small-scale neural data for paper: https://www.biorxiv.org/content/10.1101/2022.10.03.510668v2
over-parameterization = exploration ?
This is an implementation of the Reinforcement Learning multi-arm-bandit experiment using different exploration techniques.
Reinforcement Learning (COMP 579) Project
Exploitation vs Exploration problem stated as A/B-testing with maximum profit per unit time.
OSPO is a novel metaheuristic algorithm which has the potential to solve different kinds of problems with promising performance.
A companion repository for 'Inverse Bayesian Optimization: Learning Human Acquisition Functions in an Exploration vs Exploitation Search Task'