Will mechanistic/transformer interpretability [eg Neel Nanda] end up affecting p(doom) more than 5%?
Mini
2
Ṁ352223
36%
chance
1D
1W
1M
ALL
Get Ṁ600 play money
Related questions
Related questions
Will mechanistic interpretability be essentially solved for GPT-2 before 2030?
28% chance
Will Eliezer Yudkowsky publicly claim to have a P(doom) of less than 50% at any point before 2040?
36% chance
Will mechanistic interpretability have more academic impact than representation engineering by the end of 2025?
67% chance
Will MIRI meaningfully affect p(doom) by more than 5%?
37% chance
Will my p(doom) be above 10% in 20 years (2043)?
31% chance
Will agent foundations [eg Scott Garrabrant] end up affecting p(doom) more than 5%?
55% chance
Will a model costing >$30M be intentionally trained to be more mechanistically interpretable by end of 2027? (see desc)
57% chance
Will my p(doom) be above 10% in 10 years (2033)?
61% chance
Will mechanistic interpretability be essentially solved for the human brain before 2040?
19% chance
Are Mixture of Expert (MoE) transformer models generally more human interpretable than dense transformers?
52% chance