https://manifold.markets/market/will-jailbreaks-in-large-language-m
For this question to resolve YES, all the following events need to happen:
The linked question resolves YES.
There is a chat model I can query that is trained with anti-jailbreak methods known in 2024 or later, at some point in 2026
I can sample 5 random HarmBench behaviors, and for at least 2 of those, get something classified as "useful answer to the question" by both https://huggingface.co/cais/HarmBench-Llama-2-13b-cls (or the state-of-the-art equivalent) and me.
I am allowed to do arbitrary computation before querying, but the input needs to be provided as a chat message, not as a prefix to a text completion model.
In case the question does not make sense anymore in 2026, I reserve the right to resolve to a probability.
This would ordinarily be a conditional market, but the new Manifold rules seem to disallow that, so it's a joint probability instead.