Will an AI be able to convert recent mathematical results into a fully formal proofs that can be verified by a mainstream proof assistant by 2025?

Plus

Ṁ8327

Dec 31

chance

ALL

Definitions used:

Under "AI" I consider any fully autonomous computer program that is not given any human input other than the mathematic
Under "recent mathematical results" I consider novel mathematical results that have been published during or after 2010.
Under "fully formal proof" I consider a sequence of proof steps written in a formal mathematical logic such as CIC or ZFC.
Under "mainstream proof assistant" I consider languages such as Isabelle/HOL, Coq, or Lean.

This question will resolve to YES if such an AI can translate and verify at least five recent mathematical results that had not previously been verified in a proof assistant before the start of 2025, and will otherwise resolve to NO.

This question is managed and resolved by Manifold.

#AI

#Math

Get

1,000

and

3.00

9 Comments

34 Holders

124 Trades

Sort by:

bought Ṁ150 YES

https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/ though will need more time to see how far this can get on this market.

AI achieves silver-medal standard solving International Mathematical Olympiad problems

Breakthrough models AlphaProof and AlphaGeometry 2 solve advanced reasoning problems in mathematics

Is one of the conditions on "recent mathematical result" that the result must be originally proven by a human? Or does something like this count? https://arxiv.org/abs/1605.00723

Mathematical Genius

Numbers dance within my head,
Formulas swirl in a deathly bed,
Equations pulsate with life anew,
And yet I still can't divide by two.

Under "AI" I consider any fully autonomous computer program that is not given any human input other than the mathematic

I think what you really want is a desiderata that they haven't guided it more than 'natural'. Like, they may have trained the model on a bunch of human written proofs -> formal language, but they haven't specifically given it this proof or any guidance specific to it.

-----

My view is that this could be done this year if someone tried, and that the main limitation is that (some organization) just hasn't strongly trained on a theorem prover yet. MetaAI did a bit of Lean stuff, but not a big model or anywhere near as generic and powerful as GPT.

Like, GPT-3 and friends knows Lean to a not terrible degree, but it is significantly more confused about Lean than it is JavaScript. This is primarily due to the (far lesser) amount of training data, but (some org) could hire a bunch of programmers

to write new Lean proofs
write new Lean proofs with extra finetuning from the human written versions
rewrite existing proofs in multiple ways to give it better generalization (and, the Lean community kinda likes golfing too much which obscures the meaning even for humans, and so probably makes this even more worthwhile)

Which would be good ways to figuratively level-up the understanding quite a bit.

Then there's the fact that Lean, and other proof assistants, have a context that you use to know what the current variables/proofs/etc are in the scope. Existing GPT and friends are writing Lean without the context, which would actually be very hard to do! Especially since various tactics would change what proof a hypothesis has, with that change only being visible from your knowledge of the tactic or through the context.

Essentially: A project could get far more competent Lean results by training with the context. This would also help guard against changes in tactic behavior (which do happen, and is a part of the confusion).

(The org could also do wackier things, like giving it the ability to query what types are available in a package -> filter them for relevant results to the problem so that it definitely fits in context -> apply them; this avoids hallucinating - which is a problem with Lean code gen currently - and makes it more dynamic)

predictedYES

(context bits)

Sagredo, a recently in-development Lean tactic, mixes GPT-4 with Lean 4 in a sort of dialogue. See: https://leanprover.zulipchat.com/#narrow/stream/219941-Machine-Learning-for-Theorem-Proving/topic/sagredo (there's a video there, but the video is unlisted so unsure if I'm supposed to link it directly)

Sagredo gives the user's theorem statement (hypothesis parameters, and goal) to GPT-4, telling it to continue the proof. Then, whenever it stops it gives GPT-4 the current context state or an error message if there is one and asks it to continue/fix-the-error.

While this isn't enough for the full market, it is a notable example that with a bit of infrastructure you can have the AI automatically proving things in a far better way with the context.

It does have issues because GPT has primarily seen Lean 3 code, not Lean 4 (which is relatively newer), and there was a variety of syntax changes + tactic changes. They typically automatically fix themselves, because they have an initial prompt that details the differences, but it does make the generation take longer.

Though I don't expect this to be an issue you can't get around by smart enough prompting. One method would be to have a program that autocorrects common mistakes, ex: writing rw a, (Lean 3) instead of rw [a] (Lean 4) so you don't have to do another prompting round. (At least I believe the API lets you alter the prompt, so then it never made the mistake at all).

It would be cool if OpenAI did deliberate training or finetuning of a model with Lean 4. I think you could probably get good results from ChatGPT 3.5 or even GPT 3 if it was just trained better on them.

predictedYES

Related questions

Related questions