Self-Attention experiments in Vision
To do
- Add RPE, Rotary positional embeddings
- Fix experiment code, update models to work without separate config
- Test on TPUv3-8
- Run first training runs comparing DeiT with absolute learned vs. rotary pos embeddings
- Add class-attention layers, layerscale (CaiT)
- Add CvT
- Add TNT, Twins