
Elon Musk is talking big: https://x.com/tsarnick/status/1815493761486708993. Says that Grok 3 will come out in December and 'should be' the most powerful AI in the world.
Resolves to YES if Grok 3 is, at the time of its release, plausibly the most powerful AI in the world according to my best judgment. Has to be at least as strong as all models publicly available at the time.
Resolves to NO if it is not the most powerful.
(Resolves NO if no such model is released by 7/23/25, to ensure this doesn't go on forever.)
As of 7/23/2024 Claude Sonnet 3.5 is IMO most powerful AI, but GPT-4o would also resolve to YES based on its position at #1 on Arena and other ways in which some people prefer it. Gemini 1.5 Pro or Advanced would not qualify, but would have counted prior to Sonnet 3.5 and GPT-4o.
(I will not take clarifying questions on my criteria here, it will be my subjective take on 'is this plausibly the best LLM I can access right now.')
Update 2025-05-01 (PST): - Reasoning models are a different class of AI and do not count for the purposes of resolving this market. (AI summary of creator comment)
I do think both interpretations are reasonable, and I could argue both sides. I understand both cases, although I would still be inclined to make the same decision again.
But I have learned that once you make a decision like this, you HAVE TO stick with it, reversing yourself makes things go crazy, even if you decide you made the wrong initial decision, and the only thing you can do after that is turn it over to the mods or stick with what you said.
Given it is 5-0 thumbs up on an accusation that my actions are disingenuous (and I've been outright accused of LYING among other things, seriously WTAF) here despite the market being where it was 2 days before the ruling, honestly, which I REALLY REALLY don't appreciate, I don't need this trouble. I hereby ask the mods to take over this question so I can wash my hands of it, and they can do whatever they decide is best.
Hope everyone's happy now. Enjoy.
What are we waiting for? This is "Plausibly" the best, in the way gpt40 and claude 3.5 are both "best"
Are we actually waiting for something? deciding?
@FergusArgyll Mods are discussing / deciding on the resolution, yes. It might take a few days to see how good grok is seen to fare and whatnot
@FergusArgyll Fine, I'll bite. Yes, moderators are fully aware of the situation. We are trying to have as many moderators as possible weigh in, rather than having one person handle it or "the first 3 that show up".
No mods wanted to write anything here for fear of influencing trading. Bayesian isn't afraid to write because he's actively trading already and can't be involved in the resolving part anyway.
As far as a timeline....it's unlikely to resolve within 1 day but very likely to resolve within 1 month. If I had to guess, a week.
I think it is appropriate to close trading now -- everyone has had a chance to get a lot of predicting in and without having a single creator ready to unilaterally resolve the market, it feels very messy to leave trading open where the stray comments of any single moderator are highly likely to have undue influence on trading.
@Eliza can you confirm that in this market there are no traders that have insider knowledge on the status of the moderators' discussions? When you say that @Bayesian is actively trading because he can't be involved in the resolution, does that also mean that he does not have access to groups where other moderators are discussing this resolution? Thanks
@SimoneRomeo I can't resist someone asking questions about auditing behavior on Manifold.....(I want to note that trading did close 5 hours ago, so this is only looking at the past.)
At a certain time on 7 January 2025, the market creator asked moderators to handle the market resolution. Shortly after, the market was added to the moderation queue to keep track of its status.
After that point, all moderators and admin users on the site would have known the market was likely to be resolved by moderators. They could have had any amount of private information sharing between themselves on or off the platform. In general, I trust (and 'we' as a community, I hope) moderators to act with honor in any situation where they have non-public information gained through their positions.
---
A cursory review of the trading activity on the market shows the following privileged users had positions as of the time of the resolution delegation:
@Bayesian - about 6k No shares from Nov 2024 through Jan 2024
@dreev - 61 No shares from July 2024
@Gabrielle - 1333 No shares from Jan 2024
@Ziddletwix - 435 Yes shares from Jan 2024
@ian - 54 No shares from July 2024
@jacksonpolack - 137 No shares from July 2024
---
In the time since the resolution delegation:
@Gabrielle (7 Jan), @Ziddletwix (7 Jan), and @dreev (18 Feb) all sold their entire collection of shares in a single trade.
@ian and @jacksonpolack have not traded in the market at all and still hold their shares.
@Jacy had a flurry of trading activity on 17 February resulting in a 5.3k Yes position
@SG - traded on 16, 17, and 19 February resulting in a 1.1k Yes position
@Bayesian traded numerous times and now has a position of 28k Yes shares
I didn't see any other privileged users trading, but I may have missed them because this was a manual summary and not an automated report.
---
does that also mean that he does not have access to groups where other moderators are discussing this resolution
I can confirm that all of the members mentioned above would have been able to view places where moderators discussed the potential resolution of this market.
no traders that have insider knowledge on the status of the moderators' discussions
I'm not aware of any other traders beyond the ones listed above, but as I mentioned above, if there are any users colluding with moderators for trading on this market, that would be a breach of site norms.
@Eliza thanks about the detailed and transparent reply. This market definitely went through a suboptimal resolution process and moderators were thrown into it. We can't change it now. I'd recommend in the future to ensure there's a separate discussion group were only non-traders have access to, in order to handle similar cases.
I'd recommend in the future to ensure there's a separate discussion group were only non-traders have access to, in order to handle similar cases.
In theory this would be neat, in practice it's unlikely. There's no clean/concrete process designating which mods are responsible for each market. Manifold is small & scrappy & mods are basically just volunteers (janitors?) who have spent a lot of time on the site. Almost all mod queue issues are just resolved by "the mod who volunteers first". Rare, extra-ambiguous cases occasionally involve a volunteer 3-mod panel (with discussion almost always beginning once the market is closed). This is a messier, unique-ish case where additional discussion will be required. But it's hard to build a policy (let alone technical features) around fairly unique cases, & I think Manifold would have to grow a fair amount more for the mod team workflow to be so much more formalized than it currently is.
I do think it's bad if mods trade on insider info about a potential mod resolution. I cannot speak for Bayesian but I would guess his logic was that he didn't personally feel he had any material/relevant info about resolution (? dunno).
if that was a more common issue, it might be worth trying to mitigate, but it would be hard to fix, given current constraints.
@Ziddletwix if markets that require more than one mod are discussed only after market closure and traders don't join the discussions actively, there's no major need to have separate discussion rooms. Ideally there would be clear guidelines on how to handle these situations. We're not betting real money but these issues may inadvertently end up eroding trust.
https://x.com/nrehiew_/status/1892365604964757562
Apparently there may have been some foul play with the graphs.
Just to point out obvious: reasoning models are NOT any different than all other LLMs.
They are literally the same model with different fine-tuning.
Separating these two shows 0 understanding of “reasoning models”
@mathvc By that standard they are also just as bad as the non-reasoning versions (if you examine the intermediate chain of thought you will find all sorts of nonsense that doesn't get into the final output.)
@Guilhermesampaiodeoliveir per description, GPT-4o would have resolved YES based on getting #1 in the arena. Grok3 got #1 in the arena.
@PhilosophyBear It might be good to sanity check with our own benchmarks. In case of shenanigans with the voting? Including perhaps people voting up Grok's answers for being refreshingly non-corporate-sounding or whatever.
(My reading of the spirit of this question is whether Grok 3 is at least tied with the best models such that people like Zvi would choose to at least have it in rotation when consulting multiple LLMs on something. Hitting the top of the arena leaderboard doesn't quite feel dispositive to me. Maybe it depends on how long it holds on to first place?)
@Guilhermesampaiodeoliveir I don’t think it’s that bizarre and the description is easy to check 🤷♂️
@LiamZ I believe you, i just disn't really want to check, but i think it is bizarre, like, vibometer? Really.
@Guilhermesampaiodeoliveir the market was about whether it would turn out to be “plausibly” accurate and mentioned two different models that would simultaneously resolve YES based, yes, partly on vibes.