trishullab / PUTNAM

An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PutnamBench

PutnamBench is a benchmark for evaluation of theorem proving algorithms on competition mathematics problems sourced from the William Lowell Putnam Mathematical Competition years 1965 - 2023. Our formalizations currently support three formal languages : Lean 4 $\land$ Isabelle $\land$ Coq. PutnamBench comprises of over 1400 manual formalizations, aggregated over all languages.

PutnamBench aims to support research in automated mathematical reasoning by providing a multilingual benchmark for evaluating theorem-proving algorithms. It is released under permissive licenses (Apache 2.0 for Lean 4 and Isabelle, MIT for Coq), and we encourage community contributions (TODO: After initial release, as of 6/26 this is expected to be in about 2 weeks time.)

Citation

The associated paper for PutnamBench is {TODO}. Please consider including the following citation if you find PutnamBench useful. {TODO}

Statistics

Language Count
Lean 4 606
Isabelle 550
Coq 348

We also report the number of problems in a certain category. Note that some problems fall under multiple categories.

Category Total Quantity
Algebra 219
Analysis 176
Number Theory 97
Linear Algebra 43
Abstract Algebra 25
Geometry 22
Combinatorics 12
Set Theory 4
Probability 2

Versioning

  • Version: v-1
  • Not yet officially released. Please refrain from opening issues or making PRs until the initial release.

About

An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.


Languages

Language:Lean 41.7%Language:Isabelle 38.8%Language:Coq 17.9%Language:Python 1.5%Language:Shell 0.1%