Why is Claude 3.5 Sonnet such a good model for its size?
Plus
6
Ṁ3672026
80%
Pretraining data composition
60%
Doesn't use any scale.ai training data
49%
Offline policy learning RLHf
45%
Task vectors like golden gate Claude
Claude sonnet (3.5) is a relatively small model (estimated to be 5e24 FLOPs). Yet it beats larger models on GPQA, LMSYS, and many other industry standard benchmarks. While it can’t be known that this market can resolve, it’s possible that academics and OSS will learn in the coming years what was done to achieve this high quality model.
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Sort by: