In 2025, category distribution for solved problems from the 200 Concrete Open Problems in Mechanistic Interpretability? | Manifold

In 2025, category distribution for solved problems from the 200 Concrete Open Problems in Mechanistic Interpretability?

Basic

6

Ṁ296

Jan 1

14%

Toy language models

27%

Circuits in the wild

17%

Interpreting algorithmic problems

6%

Polysemanticity and superposition

7%

Analyzing training dynamics

11%

Tooling and automation

7%

Image model interpretability

5%

Reinforcement learning interpretability

5%

Learned features in large language models

The 200 Concrete Open Problems in Mechanistic Interpretability is a list of 200 concrete research questions in neural net interpretability, proposed in December 2022 by Neel Nanda. (A centralized table of all problems is available on this Google Sheet and this Coda document.) The problems are divided into the following categories (which I've decapitalized for readability):

Toy language models
Circuits in the wild
Interpreting algorithmic problems
Polysemanticity and superposition
Analyzing training dynamics
Tooling and automation
Image model interpretability
Reinforcement learning interpretability
Learned features in language models

This market resolves MULTI to the distribution of categories for problems solved before January 1, 2025. I plan to use the Coda document to resolve this market (if it goes down or becomes obviously untrustworthy, I'll use the Google Sheet as a backup). If there's no way I can find out the category distribution, or if human civilization falls in the meantime, then this market resolves N/A.

To make New Years' Day 2025 more interesting, this market will close and resolve 32 minutes after midnight EST.

EDIT: switching to 32 minutes to increase the gap, and EST since that'll be my actual timezone

EDIT 2: completing incomplete sentence

This question is managed and resolved by Manifold.

#️ Technology

#️ AI Alignment

#Mechanistic interpretability

Get

1,000

and

3.00

Sort by:

The grokking thing would fall within "Analyzing training dynamics"?

@mariopasquato yes, question 5.4

Related questions

By 2025, percent of 200 Concrete Open Problems in Mechanistic Interpretability solved?

Will any AI be able to formalize >=90% of IMO problems by the start of 2025?

Will this project in mechanistic interpretability make me happy by the end of 2024?

Will mechanistic interpretability be essentially solved for GPT-2 before 2030?

Will an unsolved millenium prize problem be solved by AI by the end of 2028

Will any AI be able to explain formal language proofs to >=50% of IMO problems by the start of 2025?

Will any Millenium Prize Problem (other than the Poincaré conjecture) be solved by 2030?

-5% 1d51% chance

What percentage of mechanistic interpretability is solved for GPT-2?

Will we have any progress on the interpretability of State Space Model LLM’s in 2024?

Will mechanistic interpretability be essentially solved for the human brain before 2040?

Related questions

By 2025, percent of 200 Concrete Open Problems in Mechanistic Interpretability solved?

Will any AI be able to explain formal language proofs to >=50% of IMO problems by the start of 2025?

Will any AI be able to formalize >=90% of IMO problems by the start of 2025?

Will any Millenium Prize Problem (other than the Poincaré conjecture) be solved by 2030?

Will this project in mechanistic interpretability make me happy by the end of 2024?

What percentage of mechanistic interpretability is solved for GPT-2?

Will mechanistic interpretability be essentially solved for GPT-2 before 2030?

Will we have any progress on the interpretability of State Space Model LLM’s in 2024?

Will an unsolved millenium prize problem be solved by AI by the end of 2028

Will mechanistic interpretability be essentially solved for the human brain before 2040?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules