- A Survey of Question Answering for Math and Science Problem, arXiv:1705.04530 [paper]
- The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers, TPAMI 2019 [paper]
- Representing Numbers in NLP: a Survey and a Vision, NACL 2021 [paper]
- Survey on Mathematical Word Problem Solving Using Natural Language Processing, ICIICT 2021 [paper]
- A Survey in Mathematical Language Processing, arXiv:2205.15231 [paper]
- Partial Differential Equations Meet Deep Neural Networks: A Survey, arXiv:2211.05567 [paper]
- The 1st MATH-AI Workshop: the Role of Mathematical Reasoning in General Artificial Intelligence, ICLR 2021 [website]
- Math AI for Education: Bridging the Gap Between Research and Smart Education (MATHAI4ED)], NeurIPS 2021 [website]
- The 1st Workshop on Mathematical Natural Language Processing, EMNLP 2022 [website]
- π₯ The 2nd MATH-AI Workshop: Toward Human-Level Mathematical Reasoning, NeurIPS 2022 [website]
- Computer Scientist Explains One Concept in 5 Levels of Difficulty, 2022 [YouTube]
- [AI2] Learning to Solve Arithmetic Word Problems with Verb Categorization, EMNLP 2014 [paper]
- [Alg514] Learning to automatically solve algebra word problems, ACL 2014 [paper]
- [IL] Reasoning about Quantities in Natural Language, TACL 2015 [paper]
- [SingleEQ] Parsing Algebraic Word Problems into Equations, TACL 2015 [paper]
- [DRAW] Draw: A challenging and diverse algebra word problem set, 2015 [paper]
- [Dolphin1878] Automatically solving number word problems by semantic parsing and reasoning, EMNLP 2015 [paper]
- [Dolphin18K] How well do computers solve math word problems? large-scale dataset construction and evaluation, ACL 2016 [paper]
- [MAWPS] MAWPS: A math word problem repository, NAACL-HLT 2016 [paper]
- [AllArith] Unit dependency graph and its application to arithmetic word problem solving, AAAI 2017 [paper]
- [DRAW-1K] Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems, ACL 2017 [paper]
- [Math23K] Deep neural solver for math word problems, EMNLP 2017 [paper]
- [AQuA-RAT] Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, ACL 2017 [paper]
- [Aggregate] Mapping to Declarative Knowledge for Word Problem Solving, TACL 2018 [paper]
- [MathQA] MathQA: Towards interpretable math word problem solving with operation-based formalisms, NAACL-HLT 2019 [paper]
- [ASDiv] A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers, ACL 2020 [paper]
- [Ape210K] Ape210k: A large-scale and template-rich dataset of math word problems, arXiv:2009.11506 [paper]
- [SVAMP] Are NLP Models really able to Solve Simple Math Word Problems?, NAACL-HIT 2021 [paper]
- π₯ [IconQA] IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning, NeurIPS 2021 (Datasets and Benchmarks)] [paper]
- π₯ [GSM8K] Training verifiers to solve math word problems, arXiv:2110.14168 [paper]
- [MathQA-Python] Program synthesis with large language models, arXiv:2108.07732 [paper]
- [ArMATH] ArMATH: a Dataset for Solving Arabic Math Word Problems, LREC 2022 [paper]
- π₯ [TabMWP] Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, arXiv:2209.14610, 2022 [paper]
- [GEOS] Solving geometry problems: Combining text and diagram interpretation, EMNLP 2015 [paper]
- [GeoShader] Synthesis of solutions for shaded area geometry problems, The Thirtieth International Flairs Conference, 2017 [paper]
- [GEOS-OS] Learning to solve geometry problems from natural language demonstrations in textbooks, Proceedings of the 6th Joint Conference on Lexical and Computational Semantics, 2017 [paper]
- [GEOS++] From textbooks to knowledge: A case study in harvesting axiomatic knowledge from textbooks to solve geometry problems, EMNLP 2017 [paper]
- [GeoQA] GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning, Findings of ACL 2021 [paper]
- π₯ [Geometry3K] Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning, ACL 2021 [paper]
- π₯ [UniGeo] UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression, EMNLP 2022 [paper]
- [GeoRE] GeoRE: A Relation Extraction Dataset for Chinese Geometry Problems, NeurIPS 2021 MATHAI4ED Workshop [paper]
- [GeoQA+] An Augmented Benchmark Dataset for Geometric Question Answering through Dual Parallel Text Encoding, ICCL 2022 [paper]
- [HOList] HOList: An environment for machine learning of higher order logic theorem proving, ICML 2019 [paper]
- [INT] INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving, ICLR 2021 [paper]
- π₯ NaturalProofs: Mathematical Theorem Proving in Natural Language, NeurIPS 2021 (Datasets and Benchmarks) [paper]
- π₯ [MiniF2F] MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics, ICLR 2022 [paper]
- [Fermi] How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI, EMNLP 2020 [paper]
- [TAT-QA] TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance, ACL-JCNLP 2021 [paper]
- [FinQA] FinQA: A Dataset of Numerical Reasoning over Financial Data, EMNLP 2021 [paper]
- [NumGLUE] NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks, ACL 2022 [paper]
- [MultiHiertt] MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data, ACL 2022 [paper]
- π₯ Lila: A Unified Benchmark for Mathematical Reasoning, EMNLP 2022 [paper]
- [TextbookQA] Are You Smarter Than A Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension, CVPR 2017 [paper]
- [Figureqa] Figureqa: An annotated figure dataset for visual reasoning, arXiv:1710.07300 [paper]
- [Dvqa] Dvqa: Understanding data visualizations via question answering, CVPR 2018 [paper]
- [Raven] Raven: A dataset for relational and analogical visual reasoning, CVPR 2019 [paper]
- [MNS] Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning, AAAI 2020 [paper]
- [P3] Programming Puzzles, NeurIPS 2021 (Datasets and Benchmarks) [paper]
- [IsarStep] IsarStep: a Benchmark for High-level Mathematical Reasoning, ICLR 2021 [paper]
- [PhysNLU] PhysNLU: A Language Resource for Evaluating Natural Language Understanding and Explanation Coherence in Physics, 2022 [paper]
- [ScienceQA] Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering, NeurIPS 2022 [paper]
- [PGDP5K] PGDP5K: A Diagram Parsing Dataset for Plane Geometry Problems, arXiv:2205.0994 [paper]
- [ConvFinQA] ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering, arXiv:2210.03849 [paper]
- [APPS, code generation] Measuring Coding Challenge Competence With APPS, NeurIPS 2021 (Datasets and Benchmarks) [paper]
- [symbolic reasoning] Semantic parsing of pre-university math problems, ACL 2017 [paper]
- [Equation templates] Learning fine-grained expressions to solve math word problems, EMNLP 2017 [paper]
- [Dependency Graph] Unit dependency graph and its application to arithmetic word problem solving, AAAI 2017 [paper]
- Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, ACL 2017 [paper]
- [expression tree] Translating a math word problem to an expression tree, EMNLP 2018 [paper]
- [logical reasoning] Mapping to declarative knowledge for word problem solving, TACL 2018 [paper]
- [equation templates] Template-based math word problem solvers with recursive neural networks, AAAI 2019 [paper]
- [expression tree] Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems, EMNLP 2020 [paper]
- [Weak Supervision] Learning by Fixing: Solving Math Word Problems with Weak Supervision, AAAI 2021 [paper]
- Solving Math Word Problems with Teacher Supervision, IJCAI 2021 [paper]
- Analogical Math Word Problems Solving with Enhanced Problem-Solution Association, EMNLP 2022 [paper]
- Synthesis of geometry proof problems, AAAI 2014 [paper]
- Diagram understanding in geometry questions, AAAI 2014 [paper]
- Retrieving geometric information from images: the case of hand-drawn diagrams, Data Mining and Knowledge Discovery 2017 [paper]
- Automatic understanding and formalization of natural language geometry problems using syntax-semantics models, International Journal of Innovative Computing, Information and Control 2018 [paper]
- A Framework for Solving Explicit Arithmetic Word Problems and Proving Plane Geometry Theorems, International Journal of Pattern Recognition and Artificial Intelligence 2019 [paper]
- [Knowledge] Discourse in multimedia: A case study in extracting geometry knowledge from textbooks, Computational Linguistics, 2020 [paper]
- DeepMath - Deep Sequence Models for Premise Selection, NeurIPS 2016 [paper]
- Deep network guided proof search, arXiv:1701.06972 [paper]
- Graph representations for higher-order logic and theorem proving, AAAI 2020 [paper]
- Neural Theorem Proving on Inequality Problems, AITP 2020 [paper]
- Latent Action Space for Efficient Planning in Theorem Proving, 2021 [paper]
- Learning to Give Checkable Answers with Prover-Verifier Games, arXiv:2108.12099 [paper]
- REFACTOR: Learning to Extract Theorems from Proofs, 2022 [paper]
- Combining retrieval, statistics, and inference to answer elementary science questions, AAAI 2016 [paper]
- π₯ Advancing mathematics by guiding human intuition with AI, Nature 2021 [paper]
- Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics, AAAI 2022 [paper]
- π₯ Discovering faster matrix multiplication algorithms with reinforcement learning, Nature 2022 [paper]
- [GPT-2] Language models are unsupervised multitask learners, 2019 [paper]
- [UnifiedQA] UNIFIEDQA: Crossing Format Boundaries with a Single QA System, EMNLP 2020 [paper]
- Lime: Learning inductive bias for primitives of mathematical reasoning, ICML 2021 [paper]
- π₯ [IconQA] IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning, NeurIPS 2021 (Datasets and Benchmarks)] [paper]
- MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving, Findings of NAACL 2022 [paper]
- TAPEX: Table Pre-training via Learning a Neural SQL Executor, ICLR 2022 [paper]
- Insights into Pre-training via Simpler Synthetic Tasks, NeurIPS 2022 [paper]
- Learning from Self-Sampled Correct and Partially-Correct Programs, arXiv:2205.14318 [paper]
- Solving quantitative reasoning problems with language models, arXiv:2206.14858 [paper]
- π₯ [Inter-GPS] Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning, ACL 2021 [paper]
- π₯ [UniGeo] UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression, EMNLP 2022 [paper]
- Generative Language Modeling for Automated Theorem Proving, arXiv:2009.03393 [paper]
- HyperTree Proof Search for Neural Theorem Proving, arXiv:2205.11491 [paper]
- Proof Artifact Co-training for Theorem Proving with Language Models, ICLR 2022 [paper]
- Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers, NeurIPS 2022 [paper]
- [LISA] LISA: Language models of ISAbelle proofs, AITP 2021 [paper]
- From 'F' to 'A' on the NY Regents Science Exams: An Overview of the Aristo Project, arXiv:1909.01958 [paper]
- Injecting Numerical Reasoning Skills into Language Models, ACL 2020 [paper]
- Injecting Numerical Reasoning Skills into Knowledge Base Question Answering Models, arXiv:2112.06109 [paper]
- Linear algebra with transformers, TMLR 2022 [paper]
- Show Your Work: Scratchpads for Intermediate Computation with Language Models, arXiv:2112.00114 [paper]
- π₯ [GPT-3] Language models are few-shot learners, NeurIPS 2020 [paper]
- π₯ [Codex] Evaluating large language models trained on code, arXiv:2107.03374 [paper]
- π₯ [PaLM] PaLM: Scaling Language Modeling with Pathways, arXiv:2204.02311 [paper]
- Calibrate before use: Improving few-shot performance of language models, ICML 2021 [paper]
- Emergent Abilities of Large Language Models, Transactions on Machine Learning Research 2022 [paper]
- Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity, ACL 2022 [paper]
- What Makes Good In-Context Examples for GPT-3? The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures 2022 [paper]
- π₯ [CoT] Chain of thought prompting elicits reasoning in large language models, arXiv:2201.11903 [paper]
- π₯ Self-consistency improves chain of thought reasoning in language models, arXiv:2203.11171 [paper]
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, arXiv:2205.10625 [paper]
- π₯ [Zero-shot CoT] Large Language Models are Zero-Shot Reasoners, preprint arXiv:2205.11916 [paper]
- π₯ [CoT GPT-3 + RL] Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, arXiv:2209.14610, 2022 [paper]
- π₯ Language models are multilingual chain-of-thought reasoners, arXiv:2210.03057 [paper]
- Automatic Chain of Thought Prompting in Large Language Models, arXiv:2210.03493 [paper]
- Large Language Models are few(1)]-shot Table Reasoners, arXiv:2210.06710 [paper]
- Challenging BIG-Bench tasks and whether chain-of-thought can solve them, arXiv:2210.09261 [paper]
- Scaling Instruction-Finetuned Language Models, arXiv:2210.11416 [paper]
- [PaLM, Codex] Autoformalization with Large Language Models, NeurIPS 2022 [paper]
- [Codex] Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs, arXiv:2210.12283 [paper]
- π₯ A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level, PNAS 2022 [paper]
- π₯ Minerva: Solving Quantitative Reasoning Problems with Language Models, NeurIPS 2022 [paper]
- Empirical explorations of the geometry theorem machine, Western Joint IRE-AIEE-ACM Computer Conference 1960 [paper]
- Basic principles of mechanical theorem proving in elementary geometries, Journal of Automated Reasoning 1986 [paper]
- Automated generation of readable proofs with geometric invariants, Journal of Automated Reasoning 1996 [paper]
- My computer is an honor studentβbut how intelligent is it? Standardized tests as a measure of AI, AI Magazine 2016 [paper]
- Learning pipelines with limited data and domain knowledge: A study in parsing physics problems, NeurIPS 2018 [paper]
- Automatically proving plane geometry theorems stated by text and diagram, International Journal of Pattern Recognition and Artificial Intelligence 2019 [paper]
- Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language, JCDL 2020 [paper]