At the start of 2025, will it be generally accepted that Google's "best" general LLM is better than OpenAI's "best" general LLM?
I would like to warn people against participating in markets like this one where the resolution criteria is not well defined and the market creator is betting in the market.
@JoeReeve If you are going to bet in your own markets then please update the resolution criteria to something objective.
@LukeHanks boooo. Hate me if I screw you (then report to Manifold and get your fake money back).
This is fun, stop trying to make it serious. Metaculus exists for that.
@JoeReeve I’m also liquidating. I think it’s a pretty important norm to specify resolution criteria where possible.
"Better," is a relative term as these are fundamentally tools and it really depends upon the question, "better for what?" I use both Bard and OpenAI daily and have been applying different tests to them. As far as I can tell, Bard does a great job with translations and in the last week or so it seems to be approaching the creativity problem by delivering you multiple drafts at once, which in my mind more accurately represents to the user what an LLM is really doing, whereas ChatGPT is doing the jazz hands thing and pretending that it's really intelligent, whereas I think we all are pretty familiar now with the probabilistic underpinning of LLM generated output. Google being the, "best," search company is fundamentally focused on accuracy, so Bard is not, "creative," in the sense that if you use Bing, you can set it to, "Creative / Balanced / Precise," it seems to be set permanently to, "Precise," whereas ChatGPT seems to be set permanently to, "Creative."
The way I have been trying to approach the quality of the tools, mostly ChatGPT at this point is by setting up a variety of programming tasks, and then trying to, "break," the LLM by finding an interesting and funny edge case by first finding a programming task that it can accomplish, and then push it past those limits and turn it into a market.
I'm gonna make so much fake money... https://twitter.com/heybarsee/status/1656557778142392320
@JoeReeve because... Google claimed they have something really good, trust me guys, in a demo?
The one reasonable challenge I hear to Google overtaking OpenAI is "Google is ineffective and can't actually get stuff done". This makes it clear to me that they're actually figuring out how to do cross-silo work again. Very bullish on this.
AFAICT, the things you need to train good/better models are:
- data
- compute
- good distributed computing talent
- some knowledge of SOTA model training
- the ability to get shit done
Google have more data, compute, and distributed computing talent than anyone on earth. DeepMind has enough model training knowledge to get by.
This signals to me that Google is actually figuring out how to get cross-discipline stuff done.
@JoeReeve Oh. I forgot. The final thing that is needed to make great LLMs....
RLHF, otherwise known as:
- Users
- Analytics on what those users are doing
Name one business or organization that has more of either of these.