Released = available to some portion of the public (including a subset of subscribers or a limited number of API developers from members of the public).
New model = Either announced by the company as a new model, is clear from numbering/naming it is a distinct model, or able to be selected from some sort of menu as a distinct model. Something like "o1 fancy" would count as while it is part of o1 it can be considered a distinct model in this market.
We plan to make one market a month like this. In the future it will be free response once that market type is supported by sweepstakes. Usually, this market will be created before the start of each month as a short-term market.
Please note that any model which came out this month prior to market creation will not count (a new model from the company should come out for the answer to resolve to yes). This includes, but is not limited to:
-OpenAI o1 and o1 pro
-Meta Llama 3.3
-Grok Aurora
Must be released before December 31st 11:59pm PST. If it is announced but not yet released to any members of the public it will not count.
FYI I will be taking a look at this shortly and making clarifications and resolving things if appropriate. The lack of a resolution on a certain answer doesn't mean we have decided that whatever was released doesn't count. It just means I haven't had time to review it yet.
Yes, this is not ideal, I apologise. For next month we may have to reconsider the operalization of these markets as AI companies are not making it easy lol.
@Manifold is it only llms that count for Meta? They released some other things here: https://ai.meta.com/blog/meta-fair-updates-agents-robustness-safety-architecture/
@Fay42 Yeah for this answer they should be a language model. In the future I will add more types for each company like I did with OpenAI
@Manifold phi-4 is planning to drop on hf sometime in the next week and is already out on azure https://x.com/peteratmsr/status/1867375567739482217
@Manifold but is that "similar" to MAI-1? MAI-1 was supposed to be 500B, this is only 14B. plausibly doesn't count as "similar" but plausibly does as well, idk
@Fizz I'm very surprised how many companies are willing to do big releases right before holidays. I feel bad for the devs