PutnamBench

PutnamBench is a benchmark for evaluation of theorem proving algorithms on competition mathematics problems sourced from the William Lowell Putnam Mathematical Competition years 1965 - 2023. Our formalizations currently support three formal languages : Lean 4 $\land$ Isabelle $\land$ Coq. PutnamBench comprises of over 1400 manual formalizations, aggregated over all languages.

PutnamBench aims to support research in automated mathematical reasoning by providing a multilingual benchmark for evaluating theorem-proving algorithms. It is released under permissive licenses (Apache 2.0 for Lean 4 and Isabelle, MIT for Coq), and we encourage community contributions (TODO: After initial release, as of 6/26 this is expected to be in about 2 weeks time.)

Citation

The associated paper for PutnamBench is {TODO}. Please consider including the following citation if you find PutnamBench useful. {TODO}

Statistics

Language	Count
Lean 4	606
Isabelle	550
Coq	348

We also report the number of problems in a certain category. Note that some problems fall under multiple categories.

Category	Total Quantity
Algebra	219
Analysis	176
Number Theory	97
Linear Algebra	43
Abstract Algebra	25
Geometry	22
Combinatorics	12
Set Theory	4
Probability	2

Versioning

Version: v-1
Not yet officially released. Please refrain from opening issues or making PRs until the initial release.

About

An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.

Languages

Language:Lean 41.7%Language:Isabelle 38.8%Language:Coq 17.9%Language:Python 1.5%Language:Shell 0.1%