Did OpenAI use MUP for zero shot hyper-parameter transfer in GPT-4?
Basic
5
Ṁ169
2025
81%
chance

Maximal Update Parameterization is technique published last year by Yang et al. at Microsoft. https://arxiv.org/abs/2203.03466

Get
Ṁ1,000
and
S3.00
Sort by:
predictedYES

@firstuserhere interesting that it is in the bibliography, although the reference in the first image is from a different section of the report with its own bibliography (that [16] actually refers to "DALL·E 2 Preview - Risks and Limitations.").

So the muP paper is in the bibliography, but not referenced anywhere.

@Stefan yep, and even then it's not actually used in gpt-4, the report only mentions the red team to have used the paper?

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules