CMSA New Technologies in Mathematics Seminar: Llemma: an open language model for mathematics

CMSA EVENTS

View Calendar
October 25, 2023 2:00 pm - 3:00 pm
Virtually
Speaker:

Sean Welleck - CMU Language Technologies Institute

We present Llemma: 7 billion and 34 billion parameter language models for mathematics. The Llemma models are initialized with Code Llama weights, then trained on the Proof-Pile II, a 55 billion token dataset of mathematical web data, code, and scientific papers. The resulting models show improved mathematical capabilities, and can be adapted to various tasks. For instance, Llemma outperforms the unreleased Minerva model suite on an equi-parameter basis, and is capable of tool use and formal theorem proving without any further fine-tuning. We openly release all artifacts, including the Llemma models, the Proof-Pile II, and code to replicate our experiments. We hope that Llemma serves as a platform for new research and tools at the intersection of generative models and mathematics.

 

https://harvard.zoom.us/j/95706757940?pwd=dHhMeXBtd1BhN0RuTWNQR0xEVzJkdz09
Password: cmsa