Is OpenAI's Q* real?

140

1.3kṀ24k

Jan 1

87%

chance

ALL

According to a recent Reuters article, OpenAI has a model called Q* which is believed by some at the company to be a big step towards AGI. Researchers were reportedly impressed by its ability to solve math problems.

This market resolves YES if OpenAI is in fact in possession of a model that can consistently solve math problems that currently-released models like GPT-4 cannot and has aced math tests given to it by researchers. It would also resolve YES if the model solves math problems that current models can already solve, but does so with significantly fewer resources (e.g., less training data, less computation) and has aced tests by researchers.

To count, it must actually be the model referred to as Q* in the Reuters article, so it wouldn't count if OpenAI later makes a model with these capabilities, or if it happens coincidentally that OpenAI had a model with these capabilities, but none of Reuters' sources knew about it.

Q* doesn't have to actually be as big of a breakthrough in AI as the researchers think it is to count as being real - it just has to have some level of increased capability as described above. It also doesn't have to actually be related to Sam Altman's firing.

This resolves to YES/NO once there is clear and convincing evidence that Q* does/doesn't exist (and have the capabilities mentioned above). I will take discussion in the comments into account when determining whether there is enough evidence. If there is no clear and convincing evidence either way by the time this market closes, and it seems very unlikely that we will get more evidence, then I may resolve it to a probability based on how likely it seems to be (for this, I would take into account the market price, but I wouldn't just do resolves to MKT because that's a terrible idea). If it does still seem like we could get more evidence, I'll extend the closure date.

Get

1,000

to start trading!

People are also trading

Did the board of OpenAI recieve a letter about the Q* AI 'breakthrough'?

58% chance

Is Q* the A star pathfinding search combined with Q-learning, as claimed in a "tin hat time" tweet?

25% chance

What does the Q in Q* stand for?

Will it appear that the openai.com GPT-4.5 turbo blogpost that appeared in DuckDuckGo results was real?

5% chance

Is OpenAI a trustworthy organization?

POLL

Will xAI rank above OpenAI at EOY?

23% chance

Do OpenAI leadership actually believe they could develop AGI?

83% chance

Does the Q in Q* stand for either Q-Learning or Q-Values

73% chance

Will OpenAI exist in Jan 2027?

85% chance

Is the "Game of Thrones" theory of what happened at OpenAI broadly accurate?

Sort by:

https://openai.com/index/openai-o1-mini-advancing-cost-efficient-reasoning/

shows dramatically increased performance on math problems, from ~10% to ~75%.

Shouldn't this resolve YES?

@PlasmaBallin ^

@SergeyDavidoff Is there confirmation that this is the model referred to as Q* or related to it (e.g. Q* being a prototype)?

i don’t think so and they’ll probably never confirm it but it’s pretty likely

@PlasmaBallin Not from OpenAI themselves, but Reuters did link "Q*" and "Strawberry": https://www.reuters.com/technology/artificial-intelligence/openai-working-new-reasoning-technology-under-code-name-strawberry-2024-07-12/

Then there was reporting that "Strawberry" is coming out in the next week or two, and then a model with the same capabilities as were rumored for Q* came out in the predicted timeframe under the marketing name o1.

The link seems pretty clear, but an official confirmation from OpenAI is probably not forthcoming.

And the name "Q*" seems to be a combination of Q-learning and STaR, which would also map very well to the o1 capabilities, but that too is only speculation with no official confirmation.

@SergeyDavidoff
Q-learning learns a Q-function that maps state-action pairs to expected future rewards: Q: S x A -> R. However, LLMs are fine-tuned using Reinforcement Learning with Human Feedback (RLHF), which directly updates the policy (mapping inputs or states to actions, in this case, text responses) to maximize the reward. The policy is represented as P: S -> A. There’s no need to explicitly construct a Q-function to fine-tune an LLM using RLHF, as the focus is on improving the model's policy directly through feedback.

bought Ṁ30 YES

https://x.com/sama/status/1821207141635780938

Has some recent news about Q* come out?

yes

https://www.reuters.com/technology/artificial-intelligence/openai-working-new-reasoning-technology-under-code-name-strawberry-2024-07-12/

I'm unranking this market because I really don't want it to be eligible for cash prizes if that ends up happening, given that the resolution is likely to be ambiguous.

Shouldn't resolve yet (all he said was "we're not ready to talk about that yet") but Sam Altman confirmed the existence of Q* on Lex Fridman's pod.

@JosephBarnes Correct. But if we never hear about it again, that would still be a very good reason to think it never existed. Remember that Altman was accused of not being "consistently candid."

@JosephBarnes In fact given the fact that he obviously wanted to give credence to a rumor that was certainly false in every meaningful sense, I would resolve this market NO, right now, given all that we know at this point.

However, market author is unlikely to do that.

bought Ṁ100 YES

https://arxiv.org/pdf/2403.09629.pdf

A reasoning breakthrough called Quiet-STaR seems like a pretty big coincidence if not the same thing

@dominic Given the authors aren't affiliated with OpenAI, I'd say it's either a coincidence or a deliberate choice to capitalise on the confusion. Though I'd guess coincidence. "STaR" already existed, putting "Quiet" in front of it isn't that big a coincidence. And the leakers about Q* used an asterisk, where STaR seems to be written in full, I'm not seeing it abbreviated as an asterisk ("star" is how asterisks written in mathematical notation are often pronounced, I've never heard the reverse - of someone getting the word "star" and writing it as an asterisk).

This paper doesn't sound to me like how Q* was being described. Q* was supposed to be able to solve simple problems with significantly reduced resources. The abstract for the Quiet-STaR abstract, by contrast, talks about how the increased computational cost of their approach is a challenge that they talk about addressing. I read this as "yeah it is more computationally expensive but we made some optimisations to reduce the cost somewhat". Perhaps it is true that with these optimisations LLMs doing this extra invisible thinking do scale better, but it doesn't sound like the breakthrough Q* was described as.

And yeah, basically it's pretty unlikely to be the same thing because it's not coming from OpenAI - how would that work?

@chrisjbillington I agree with a lot of what you’re saying. This WIRED article: https://www.wired.com/story/fast-forward-clues-hint-openai-shadowy-q-project/ does talk about “vast computational resources,” though, so I’m not sure if it’s a given that Q* is a way to reduce computation. The quote about Q* also mentions grade school math, which could be referring GSM8k. And most speculatively, two of the authors are associated with “Notbad AI,” something I’ve never heard of - could that somehow be associated with OpenAI?

@dominic I think it’s unlikely to be Q* but it seems like there’s a chance.

bought Ṁ350 NO

@dominic

Notbad AI

Lol, google reveals no evidence of such an organisation existing. No idea.

It would also resolve YES if the model solves math problems that current models can already solve, but does so with significantly fewer resources (e.g., less training data, less computation) and has aced tests by researchers.

@PlasmaBallin note that this is a very murky judgement call. Aside from pronouncements from the opaque, non-open OAI, we will absolutely not know how much resources the model uses!
This is too bad, as the question itself is very interesting, but the market as posed would not answer it.

https://www.theverge.com/2023/11/29/23982046/sam-altman-interview-openai-ceo-rehired

@Nikola yeah, that means it is either real or he wants people to think it is real, which comes to the same thing. Sold my no.

They did not, however, break AES-192. The chance of that is zero.

@Nikola Real *and* straight? You must be joking me 🙄

@DavidBolin I don't get that at all, it's a "no comment", which is reasonably likely to be a general policy of Sam's rather than only saying "no comment" when something is real.