Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025? | Manifold

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

6

1kṀ774

2026

29%

chance

1D

1W

1M

ALL

Resolves YES if Meta releases weights of an LLM trained on at least 60T bytes of data (roughly equivalent to the 15T tokens used to train the Llama 3.1 models) in 2025 which does not use standard fixed-vocabulary tokenization.

A qualifying model must be released under a license roughly as permissive as Llama 3.1.

This market was spurred by recent research from Meta showing a proof-of-concept for a tokenizer-free LLM. A qualifying model from Meta does not need to use the patching technique from this paper as long as it's not using tokenization.

https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/

Technical AI Timelines

Meta (Facebook)

Get

1,000

to start trading!

People are also trading

OpenAI to release model weights by EOY?

Will OpenAI release a tokenizer with more than 210000 tokens before 2026?

Will the next major LLM by OpenAI use a new tokenizer?

How much FLOP will be used to train the best language model with freely available weights on July 1, 2025?

Will Meta ever deploy its best LLM without releasing its model weights up through AGI?

Will anyone train a TokenFormer model at scale before 2026?

Will OpenAI release a tokenizer with vocab size > 150k by end of 2024?

Will researchers extract a novel program from the weights of an LLM into a Procedural/OO programming language by 2026?

Will Meta AI's MEGABYTE architecture be used in the next-gen LLMs?

Will an uncensored open-source LLM model comparable to 2023 GPT4 be available to the public by the end of 2025?

Related questions

OpenAI to release model weights by EOY?

Will OpenAI release a tokenizer with more than 210000 tokens before 2026?

Will the next major LLM by OpenAI use a new tokenizer?

How much FLOP will be used to train the best language model with freely available weights on July 1, 2025?

Will Meta ever deploy its best LLM without releasing its model weights up through AGI?

Will anyone train a TokenFormer model at scale before 2026?

Will OpenAI release a tokenizer with vocab size > 150k by end of 2024?

Will researchers extract a novel program from the weights of an LLM into a Procedural/OO programming language by 2026?

Will Meta AI's MEGABYTE architecture be used in the next-gen LLMs?

Will an uncensored open-source LLM model comparable to 2023 GPT4 be available to the public by the end of 2025?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules