Checking Natural Language Proofs

The task of reviewing a natural language proof in mathematics can be difficult as it requires verifying numerous mental calculations. On the other hand, formal language proofs can automatically be verified by using computers. Can we use large language modelling (LLM) tools to verify that a natural langauge proof is correct? Such a system would be helpful to review mathematical papers, student homework, etc. The problem of checking proofs ought to be easier than the problem of automatic theorem proving.

This also touches on an aspect that is of general importance for adaptation of A.I. : how do we build A.I. tools that we can trust? In our case, how can we trust that the verdict on proof/exam/homework provided by an LLM based system is indeed correct? The problem of checking natural langauge proofs could be a great play-ground to investigate this, as there is an objective/formal notion of whether a statement is true or not.

One approach for checking natural lanaguge proofs could involve translating natural langauge proofs to formal langauge proofs. But how do we trust that the translation is correct? The issue of trust can perhaps be addressed by creating a modular system in which the translation module is indepedently applied to different parts of the proof (e.g. Lemmas), and the end result combined and verified via the formal system. Further there could be a way for humans to randomly inspect parts of the system to see that it is functioning well (see probabilistically Checkable proofs). Finally, asking GPT to reproduce intermidiate steps in a proof it generates (see this video) has shown to improve its accuracy. We could do the same, along with verifying the intermediate steps using the formal system.

Also, how do we take the error messages from the formal lanaguge system and translate them back into natural language to explain why the proof is incorrect?

Subproblems

Can we design a system that can find potential errors in a proof? For instance, we could start with system that just proof-reads english text (no mathematics).
Can we design a system to translate Natural Lanaguage proofs to Machine Language proof

Relevant Resources

About

The task of reviewing a natural language proof can be difficult as it requires verifying numerous mental calculations. On the other hand a formal language proof can be automatically verified by computers. Can we create tools to verify natural language proofs using the latest advances in natural language processing?