Will Anthropic open-source the training code of their SAE interpretability effort?
4
Ṁ465
2028
14%
this year, fully
31%
this year, significantly incomplete
19%
next year
22%
not before 2028
14%
Other

We mean the code used for producing Scaling Interpretability blog post.

Get Ṁ1,000 play money