Learn a control policy to optimally swing a pendulum from vertical down, to vertical up with torque limits and (potentially) noise. Both the pendulum and the policy are animated as the process is going. The difference from dynamic programming, for instance, is that the policy is learned only by doing forward simulation. No knowledge of the dynamics is used to make the policy.