Will mechanistic/transformer interpretability [eg Neel Nanda] end up affecting p(doom) more than 5%? | Manifold

Will mechanistic/transformer interpretability [eg Neel Nanda] end up affecting p(doom) more than 5%?

Mini

2

Ṁ35

2223

36%

chance

1D

1W

1M

ALL

Get Ṁ600 play money

Related questions

Will mechanistic interpretability be essentially solved for GPT-2 before 2030?

Will mechanistic interpretability have more academic impact than representation engineering by the end of 2025?

Will my p(doom) be above 10% in 20 years (2043)?

Will a model costing >$30M be intentionally trained to be more mechanistically interpretable by end of 2027? (see desc)

Will mechanistic interpretability be essentially solved for the human brain before 2040?

Will Eliezer Yudkowsky publicly claim to have a P(doom) of less than 50% at any point before 2040?

Will MIRI meaningfully affect p(doom) by more than 5%?

Will agent foundations [eg Scott Garrabrant] end up affecting p(doom) more than 5%?

Will my p(doom) be above 10% in 10 years (2033)?

Are Mixture of Expert (MoE) transformer models generally more human interpretable than dense transformers?

Related questions

Will mechanistic interpretability be essentially solved for GPT-2 before 2030?

Will Eliezer Yudkowsky publicly claim to have a P(doom) of less than 50% at any point before 2040?

Will mechanistic interpretability have more academic impact than representation engineering by the end of 2025?

Will MIRI meaningfully affect p(doom) by more than 5%?

Will my p(doom) be above 10% in 20 years (2043)?

Will agent foundations [eg Scott Garrabrant] end up affecting p(doom) more than 5%?

Will a model costing >$30M be intentionally trained to be more mechanistically interpretable by end of 2027? (see desc)

Will my p(doom) be above 10% in 10 years (2033)?

Will mechanistic interpretability be essentially solved for the human brain before 2040?

Are Mixture of Expert (MoE) transformer models generally more human interpretable than dense transformers?