Zhangir Azerbayev's repositories
proof-pile
Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.
mm-extract
Extracting human readable pre-training data from set.mm
nn-generalization
Neural Network Generalization Reading List
llemma_formal2formal
Llemma formal2formal (tactic prediction) theorem proving experiments
math_cc_net
Tools to download and cleanup Common Crawl data
doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
infinigen
Infinite Photorealistic Worlds using Procedural Generation
lean-chat-server
Server for lean-chat
levanter
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
llmstep
llmstep: [L]LM proofstep suggestions in Lean 4.
mathematics_in_lean
The user home repository for the Mathematics in Lean tutorial.
mathport
Experimenting with porting proofnet and minif2f
miniF2F
Adding Lean 4
mizar-mirror
storing mizar files here to make them easy to access through the REST api.
nvim-config
nvim configuration
seqax
seqax = sequence modeling + JAX
tiny-ml-projects
misc. machine learning projects monorepo
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
zhangir-azerbayev.github.io
Website