Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

1kṀ3926

2026

86%

chance

ALL

There are no restrictions on the amount or kind of compute used to *train* the model. Question is about whether it will actually be done, not whether it will be possible in theory. If I judge the model to really be many specific models stuck together to look like one general model it will not count.

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

Size of smallest open-source LLM marching GPT 3.5's performance in 2025? (GB)

4.40

Will a model be trained using at least as much compute as GPT-3 using AMD GPUs before Jan 1 2026?

84% chance

Will a model as great as GPT-5 be available to the public in 2025?

84% chance

Will a language model that runs locally on a consumer cellphone beat GPT4 by EOY 2026?

78% chance

In what year will a GPT4-equivalent model be able to run on consumer hardware?

2026

In what year will a GPT4-equivalent model be able to run on consumer hardware?

2026

Will a GPT-3 quality model be trained for under $1,000 by 2030?

81% chance

Will any open source LLM with <20 billion parameters outperform GPT-4 on most language benchmarks by the end of 2024?

13% chance

Will a GPT-3 quality model be trained for under $10.000 by 2030?

83% chance

GPT-5 trained with >=24k GPUs?

Sort by:

Llamas on pixel 7s https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support (ik ik its not over 13B yet, just sharing progress)

predictedYES

There are people who run 30B Llama on consumer PC successfully and even 65B (but it is extremely slow)

@ValeryCherepanov By "run on a single GPU" I mean the weights + one full input vector can fit on a consumer GPU at once. Otherwise the question would be meaningless - you can always split up matrices into smaller blocks and run the computation sequentially.

This is now extremely close to being resolved by Llama (Llama 13B does not actually beat GPT-3 on every measured benchmark, however, it only comes very close). 72% is way too low though so I guess whoever reads this comment first can collect some free mana in expectation.