iandanforth / preference

Preference is an action selection method. It is an alternative to softmax, greedy, epsilon-greedy etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Softmax Action Detection Visualization

Visualization Demo

Description of Softmax Action Selection

The impact of temperature (tau) in the softmax equation on the probability of an action being selected may not be immediately obvious.

This visualization is a simple way to see that impact.

Things to try

  • Set temperature = 1
    • Set the value of 'a' near the value of 'b'. Notice how small changes in value in this regime have large impacts.
    • Set temperature to 1000 and try again.
  • Try to fully recover the equiprobable action selection policy.

About

Preference is an action selection method. It is an alternative to softmax, greedy, epsilon-greedy etc.


Languages

Language:Jupyter Notebook 99.6%Language:JavaScript 0.4%Language:CSS 0.0%Language:HTML 0.0%