DL4MATH - Reading List

🧰 Resources

Related Surveys

A Survey of Question Answering for Math and Science Problem, arXiv:1705.04530 [paper]
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers, TPAMI 2019 [paper]
Representing Numbers in NLP: a Survey and a Vision, NACL 2021 [paper]
Survey on Mathematical Word Problem Solving Using Natural Language Processing, ICIICT 2021 [paper]
A Survey in Mathematical Language Processing, arXiv:2205.15231 [paper]
Partial Differential Equations Meet Deep Neural Networks: A Survey, arXiv:2211.05567 [paper]

Workshops

The 1st MATH-AI Workshop: the Role of Mathematical Reasoning in General Artificial Intelligence, ICLR 2021 [website]
Math AI for Education: Bridging the Gap Between Research and Smart Education (MATHAI4ED)], NeurIPS 2021 [website]
The 1st Workshop on Mathematical Natural Language Processing, EMNLP 2022 [website]
🔥 The 2nd MATH-AI Workshop: Toward Human-Level Mathematical Reasoning, NeurIPS 2022 [website]

Talks

Computer Scientist Explains One Concept in 5 Levels of Difficulty, 2022 [YouTube]

🎨 Mathematical Reasoning Benchmarks

Math Word Problems (MWP)

[AI2] Learning to Solve Arithmetic Word Problems with Verb Categorization, EMNLP 2014 [paper]
[Alg514] Learning to automatically solve algebra word problems, ACL 2014 [paper]
[IL] Reasoning about Quantities in Natural Language, TACL 2015 [paper]
[SingleEQ] Parsing Algebraic Word Problems into Equations, TACL 2015 [paper]
[DRAW] Draw: A challenging and diverse algebra word problem set, 2015 [paper]
[Dolphin1878] Automatically solving number word problems by semantic parsing and reasoning, EMNLP 2015 [paper]
[Dolphin18K] How well do computers solve math word problems? large-scale dataset construction and evaluation, ACL 2016 [paper]
[MAWPS] MAWPS: A math word problem repository, NAACL-HLT 2016 [paper]
[AllArith] Unit dependency graph and its application to arithmetic word problem solving, AAAI 2017 [paper]
[DRAW-1K] Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems, ACL 2017 [paper]
[Math23K] Deep neural solver for math word problems, EMNLP 2017 [paper]
[AQuA-RAT] Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, ACL 2017 [paper]
[Aggregate] Mapping to Declarative Knowledge for Word Problem Solving, TACL 2018 [paper]
[MathQA] MathQA: Towards interpretable math word problem solving with operation-based formalisms, NAACL-HLT 2019 [paper]
[ASDiv] A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers, ACL 2020 [paper]
[Ape210K] Ape210k: A large-scale and template-rich dataset of math word problems, arXiv:2009.11506 [paper]
[SVAMP] Are NLP Models really able to Solve Simple Math Word Problems?, NAACL-HIT 2021 [paper]
🔥 [IconQA] IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning, NeurIPS 2021 (Datasets and Benchmarks)] [paper]
🔥 [GSM8K] Training verifiers to solve math word problems, arXiv:2110.14168 [paper]
[MathQA-Python] Program synthesis with large language models, arXiv:2108.07732 [paper]
[ArMATH] ArMATH: a Dataset for Solving Arabic Math Word Problems, LREC 2022 [paper]
🔥 [TabMWP] Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, arXiv:2209.14610, 2022 [paper]

Geometry Problem Solving (GPS)

[GEOS] Solving geometry problems: Combining text and diagram interpretation, EMNLP 2015 [paper]
[GeoShader] Synthesis of solutions for shaded area geometry problems, The Thirtieth International Flairs Conference, 2017 [paper]
[GEOS-OS] Learning to solve geometry problems from natural language demonstrations in textbooks, Proceedings of the 6th Joint Conference on Lexical and Computational Semantics, 2017 [paper]
[GEOS++] From textbooks to knowledge: A case study in harvesting axiomatic knowledge from textbooks to solve geometry problems, EMNLP 2017 [paper]
[GeoQA] GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning, Findings of ACL 2021 [paper]
🔥 [Geometry3K] Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning, ACL 2021 [paper]
🔥 [UniGeo] UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression, EMNLP 2022 [paper]
[GeoRE] GeoRE: A Relation Extraction Dataset for Chinese Geometry Problems, NeurIPS 2021 MATHAI4ED Workshop [paper]
[GeoQA+] An Augmented Benchmark Dataset for Geometric Question Answering through Dual Parallel Text Encoding, ICCL 2022 [paper]

Theorem Proving (TP)

[HOList] HOList: An environment for machine learning of higher order logic theorem proving, ICML 2019 [paper]
[INT] INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving, ICLR 2021 [paper]
🔥 NaturalProofs: Mathematical Theorem Proving in Natural Language, NeurIPS 2021 (Datasets and Benchmarks) [paper]
🔥 [MiniF2F] MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics, ICLR 2022 [paper]

Math Question Answering (MathQA)

[Fermi] How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI, EMNLP 2020 [paper]
[TAT-QA] TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance, ACL-JCNLP 2021 [paper]
[FinQA] FinQA: A Dataset of Numerical Reasoning over Financial Data, EMNLP 2021 [paper]
[NumGLUE] NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks, ACL 2022 [paper]
[MultiHiertt] MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data, ACL 2022 [paper]
🔥 Lila: A Unified Benchmark for Mathematical Reasoning, EMNLP 2022 [paper]

Other Math Tasks

[TextbookQA] Are You Smarter Than A Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension, CVPR 2017 [paper]
[Figureqa] Figureqa: An annotated figure dataset for visual reasoning, arXiv:1710.07300 [paper]
[Dvqa] Dvqa: Understanding data visualizations via question answering, CVPR 2018 [paper]
[Raven] Raven: A dataset for relational and analogical visual reasoning, CVPR 2019 [paper]
[MNS] Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning, AAAI 2020 [paper]
[P3] Programming Puzzles, NeurIPS 2021 (Datasets and Benchmarks) [paper]
[IsarStep] IsarStep: a Benchmark for High-level Mathematical Reasoning, ICLR 2021 [paper]
[PhysNLU] PhysNLU: A Language Resource for Evaluating Natural Language Understanding and Explanation Coherence in Physics, 2022 [paper]
[ScienceQA] Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering, NeurIPS 2022 [paper]
[PGDP5K] PGDP5K: A Diagram Parsing Dataset for Plane Geometry Problems, arXiv:2205.0994 [paper]
[ConvFinQA] ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering, arXiv:2210.03849 [paper]
[APPS, code generation] Measuring Coding Challenge Competence With APPS, NeurIPS 2021 (Datasets and Benchmarks) [paper]

🧩 Neural Networks for Math

Neural Math Word Problem Solving

[symbolic reasoning] Semantic parsing of pre-university math problems, ACL 2017 [paper]
[Equation templates] Learning fine-grained expressions to solve math word problems, EMNLP 2017 [paper]
[Dependency Graph] Unit dependency graph and its application to arithmetic word problem solving, AAAI 2017 [paper]
Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, ACL 2017 [paper]
[expression tree] Translating a math word problem to an expression tree, EMNLP 2018 [paper]
[logical reasoning] Mapping to declarative knowledge for word problem solving, TACL 2018 [paper]
[equation templates] Template-based math word problem solvers with recursive neural networks, AAAI 2019 [paper]
[expression tree] Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems, EMNLP 2020 [paper]
[Weak Supervision] Learning by Fixing: Solving Math Word Problems with Weak Supervision, AAAI 2021 [paper]
Solving Math Word Problems with Teacher Supervision, IJCAI 2021 [paper]
Analogical Math Word Problems Solving with Enhanced Problem-Solution Association, EMNLP 2022 [paper]

Neural Geometry Solving

Synthesis of geometry proof problems, AAAI 2014 [paper]
Diagram understanding in geometry questions, AAAI 2014 [paper]
Retrieving geometric information from images: the case of hand-drawn diagrams, Data Mining and Knowledge Discovery 2017 [paper]
Automatic understanding and formalization of natural language geometry problems using syntax-semantics models, International Journal of Innovative Computing, Information and Control 2018 [paper]
A Framework for Solving Explicit Arithmetic Word Problems and Proving Plane Geometry Theorems, International Journal of Pattern Recognition and Artificial Intelligence 2019 [paper]
[Knowledge] Discourse in multimedia: A case study in extracting geometry knowledge from textbooks, Computational Linguistics, 2020 [paper]

Neural Theorem Proving

DeepMath - Deep Sequence Models for Premise Selection, NeurIPS 2016 [paper]
Deep network guided proof search, arXiv:1701.06972 [paper]
Graph representations for higher-order logic and theorem proving, AAAI 2020 [paper]
Neural Theorem Proving on Inequality Problems, AITP 2020 [paper]
Latent Action Space for Efficient Planning in Theorem Proving, 2021 [paper]
Learning to Give Checkable Answers with Prover-Verifier Games, arXiv:2108.12099 [paper]
REFACTOR: Learning to Extract Theorems from Proofs, 2022 [paper]

Neural Networks for MathQA

Combining retrieval, statistics, and inference to answer elementary science questions, AAAI 2016 [paper]

Neural Networks for Other Math Tasks

🔥 Advancing mathematics by guiding human intuition with AI, Nature 2021 [paper]
Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics, AAAI 2022 [paper]
🔥 Discovering faster matrix multiplication algorithms with reinforcement learning, Nature 2022 [paper]

📜 Pre-trained Models for Math

Pre-trained Language Models (PTLMs)

[GPT-2] Language models are unsupervised multitask learners, 2019 [paper]
[UnifiedQA] UNIFIEDQA: Crossing Format Boundaries with a Single QA System, EMNLP 2020 [paper]

Language Models for MWPs

Lime: Learning inductive bias for primitives of mathematical reasoning, ICML 2021 [paper]
🔥 [IconQA] IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning, NeurIPS 2021 (Datasets and Benchmarks)] [paper]
MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving, Findings of NAACL 2022 [paper]
TAPEX: Table Pre-training via Learning a Neural SQL Executor, ICLR 2022 [paper]
Insights into Pre-training via Simpler Synthetic Tasks, NeurIPS 2022 [paper]
Learning from Self-Sampled Correct and Partially-Correct Programs, arXiv:2205.14318 [paper]
Solving quantitative reasoning problems with language models, arXiv:2206.14858 [paper]

Language Models for Geometry Solvers

🔥 [Inter-GPS] Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning, ACL 2021 [paper]
🔥 [UniGeo] UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression, EMNLP 2022 [paper]

Language Models for Theorem Proving

Generative Language Modeling for Automated Theorem Proving, arXiv:2009.03393 [paper]
HyperTree Proof Search for Neural Theorem Proving, arXiv:2205.11491 [paper]
Proof Artifact Co-training for Theorem Proving with Language Models, ICLR 2022 [paper]
Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers, NeurIPS 2022 [paper]
[LISA] LISA: Language models of ISAbelle proofs, AITP 2021 [paper]

Language Models for MathQA

From 'F' to 'A' on the NY Regents Science Exams: An Overview of the Aristo Project, arXiv:1909.01958 [paper]
Injecting Numerical Reasoning Skills into Language Models, ACL 2020 [paper]
Injecting Numerical Reasoning Skills into Knowledge Base Question Answering Models, arXiv:2112.06109 [paper]

Language Models for Other Math Tasks

Linear algebra with transformers, TMLR 2022 [paper]
Show Your Work: Scratchpads for Intermediate Computation with Language Models, arXiv:2112.00114 [paper]

🌠 In-context Learning with LLMs for Math

Large Language Models (100B+)]

🔥 [GPT-3] Language models are few-shot learners, NeurIPS 2020 [paper]
🔥 [Codex] Evaluating large language models trained on code, arXiv:2107.03374 [paper]
🔥 [PaLM] PaLM: Scaling Language Modeling with Pathways, arXiv:2204.02311 [paper]

Prompt Learning for MWPs

Calibrate before use: Improving few-shot performance of language models, ICML 2021 [paper]
Emergent Abilities of Large Language Models, Transactions on Machine Learning Research 2022 [paper]
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity, ACL 2022 [paper]
What Makes Good In-Context Examples for GPT-3? The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures 2022 [paper]
🔥 [CoT] Chain of thought prompting elicits reasoning in large language models, arXiv:2201.11903 [paper]
🔥 Self-consistency improves chain of thought reasoning in language models, arXiv:2203.11171 [paper]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, arXiv:2205.10625 [paper]
🔥 [Zero-shot CoT] Large Language Models are Zero-Shot Reasoners, preprint arXiv:2205.11916 [paper]
🔥 [CoT GPT-3 + RL] Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, arXiv:2209.14610, 2022 [paper]
🔥 Language models are multilingual chain-of-thought reasoners, arXiv:2210.03057 [paper]
Automatic Chain of Thought Prompting in Large Language Models, arXiv:2210.03493 [paper]
Large Language Models are few(1)]-shot Table Reasoners, arXiv:2210.06710 [paper]
Challenging BIG-Bench tasks and whether chain-of-thought can solve them, arXiv:2210.09261 [paper]
Scaling Instruction-Finetuned Language Models, arXiv:2210.11416 [paper]

Prompt Learning for Proving

[PaLM, Codex] Autoformalization with Large Language Models, NeurIPS 2022 [paper]
[Codex] Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs, arXiv:2210.12283 [paper]

Prompt Learning for MathQA

🔥 A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level, PNAS 2022 [paper]
🔥 Minerva: Solving Quantitative Reasoning Problems with Language Models, NeurIPS 2022 [paper]

♣️ Other Methods for Math

Early Methods

Empirical explorations of the geometry theorem machine, Western Joint IRE-AIEE-ACM Computer Conference 1960 [paper]
Basic principles of mechanical theorem proving in elementary geometries, Journal of Automated Reasoning 1986 [paper]
Automated generation of readable proofs with geometric invariants, Journal of Automated Reasoning 1996 [paper]
My computer is an honor student—but how intelligent is it? Standardized tests as a measure of AI, AI Magazine 2016 [paper]

Symbolic Methods

Learning pipelines with limited data and domain knowledge: A study in parsing physics problems, NeurIPS 2018 [paper]
Automatically proving plane geometry theorems stated by text and diagram, International Journal of Pattern Recognition and Artificial Intelligence 2019 [paper]

Pure ML Methods

Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language, JCDL 2020 [paper]

hongyunnchen / dl4math

DL4MATH - Reading List

🧰 Resources

Related Surveys

Workshops

Talks

🎨 Mathematical Reasoning Benchmarks

Math Word Problems (MWP)

Geometry Problem Solving (GPS)

Theorem Proving (TP)

Math Question Answering (MathQA)

Other Math Tasks

🧩 Neural Networks for Math

Neural Math Word Problem Solving

Neural Geometry Solving

Neural Theorem Proving

Neural Networks for MathQA

Neural Networks for Other Math Tasks

📜 Pre-trained Models for Math

Pre-trained Language Models (PTLMs)

Language Models for MWPs

Language Models for Geometry Solvers

Language Models for Theorem Proving

Language Models for MathQA

Language Models for Other Math Tasks

🌠 In-context Learning with LLMs for Math

Large Language Models (100B+)]

Prompt Learning for MWPs

Prompt Learning for Proving

Prompt Learning for MathQA

♣️ Other Methods for Math

Early Methods

Symbolic Methods

Pure ML Methods

About