rl_warmups

This repo focuses on deep reinforcement learning and contains some of the core pieces for doing basic rl..

I started this along my AI / ML journey but had to put it on pause for the time being. As such, it is untested and far from completed.

Algorithms

To evaluate an Expected SARSA policy choose the action from a probability distribution weighted by the Q value of each action.

For control, you can calculate the target using the sum of the Q values of the next observation weighted by the probability of choosing each action.

To evaluate a DQN policy, always take the action with the highest Q value.

For control, you can calculate the target using the maximum predicted Q value of the next state.

To evaluate a REINFORCE policy choose an action from a probability distribution weighted by the Q values.

For control, collect an entire episode and unwind it, assigning rewards as appropriate.

The loss is the sum of the negative log probabilities of each action times the reward

My own implementations of common deep reinforcement learning algorithms using pytorch

Language:Python 100.0%