CMSA New Technologies in Mathematics Seminar: Peano: Learning Formal Mathematical Reasoning Without Human Data
November 8, 2023 2:00 pm - 3:00 pm
CMSA, 20 Garden St, G10
Address: 20 Garden Street, Cambridge, MA 02138
Gabriel Poesia - Stanford Dept. of Computer Science
Recent progress in game-playing AI has prompted the question: how might we create AIs that master the game of mathematics? Modern interactive theorem provers like Lean or Coq provide most of this analogy by defining states, actions and rewards after given a theorem statement. However, they expose an infinite action space, which makes learning from scratch challenging. I'll introduce Peano, a minimal theorem-proving environment based on dependent type theory that exposes a finite action space. This feature allows an agent to start tabula rasa in a new domain and learn to solve problems. I'll first describe a case study on learning to solve simple algebra problems from five sections of the Khan Academy platform. Reinforcement learning alone fails to progress towards the hardest problems, as solutions in terms of the base action space grow longer with increasing problem complexity. Having the agent induce its own tactics -- higher-level actions that compress solutions found so far -- allows it to make steady progress, solving all problems and guiding it towards human-like solutions. Furthermore, these tactics induce an order to the problems, despite being seen at random during training. The recovered order has significant agreement with the expert-designed Khan Academy curriculum, and second-generation agents trained on the recovered curriculum learn significantly faster. Finally, I'll describe ongoing work on solving the Natural Number Game -- a popular introduction to theorem proving in Lean for mathematicians. The finite action space allows us to train agents by borrowing ideas from curiosity-driven exploration in Reinforcement Learning. Notably, simply trying to find "interesting" consequences of the hypotheses of a theorem -- as measured by the surprisal of a language model -- often leads to proving the theorem itself.