ben fattori's repositories
Little-GPT
GPT* - Training faster small transformers using ALiBi, Parallel Residual Connections and more!
transformer_shmap
Tensor Parallelism with JAX + Shard Map
ZeRO-transformer
Two implementations of ZeRO-1 optimizer sharding in JAX
Flax-ResNets
CIFAR10 ResNets implemented in JAX+Flax
LeagueMatchScraper
Code to scrape League of Legends matches using the Riot Games API.
RepVGG-CIFAR10
RepVGG models specifically for CIFAR10 and CIFAR 100. Based on RepVGG: Making VGG-style ConvNets Great Again (Ding et. al)
StochasticDepthNets
PyTorch implementation of ResNet110 as described in Deep Networks with Stochastic Depth (Huang et al.)
wtf-wikipedia-python
raw wikipedia XML to LM_Dataformat in under 4 hours
Monte-Carlo-Fractal-Dimensionality
Efficient algorithm using random sampling to calculate the dimension of many basic fractals. Implemented algorithm in Python.
GeometricDeepLearning
Introductory Geometric Deep Learning Presentation from September 2021
Python-Unigram
Unigram tokenization algorithm in Python
tritonformer
Differentiable transformer in Triton, matching the performance of PyTorch + cuDNN/cuBLAS
CudaSoftmax
Softmax CUDA kernel :)
fattorib.github.io
Website
flashy_linear_attention
Flash linear attention kernels in Triton
Fundamental-Domain
Code to generate a section of the fundamental domain for the action of the special linear group on the space of (integral) binary cubic forms. As it stands, the code is quite inefficient. In the future I hope to optimize it.
fusedswiglu
Fused SwiGLU Triton kernels
InfoGAN-Jax
InfoGAN in Jax with small Gradio app
lm-evaluation-harness
Fork of lm-evaluation-harness for evaluating my custom models
Python-BPE
I wrote Byte-Pair encoding but its 600x slower than 🤗
ResNets-CIFAR10
PyTorch implementation of the CIFAR10 ResNets, based on Deep Residual Learning for Image Recognition (He et al.)