The cramming paper showed that it is possible to almost reproduce BERT on 24h of compute on consumer grade hardware. That sounds fun, lets try it!
- results :p
- simple visualizer
- Switch to the proper dataset
- Removed duplicates from dataset nice to have
- Multi gpu support