Exactly what it says on the tin.
This repo provides an implementation of AdaSGHMC which combines SGHMC with Adam. The algorithm samples correctly from the posterior distribution in the limit of alpha -> 0, beta2 -> 1, contains correction factors for uniform diagonal noise, and behaves exactly like adam when the magnitude of the loss -> infinity.
TODO:
- Add utilities for deriving sensible priors from transformers
- Usage instructions
- Use cases
- Better test cases
- Parallel tempering