This is not asking if we will have AI that 1. will want to destroy us 2. will not want to destroy us. It is just asking whether we'll have AI that is honest enough to tell us it is planning on destroying us.
I will accept many variations of "destroy": betray, escape containment, harm a particular person, gain access to hardware manufacturing, etc. Anything in the realm of "the AI is planning to do something we definitely and explicitly don't want it to do and it will tell us if directly prompted".
Not asking whether it will tell us unprompted (or engage in manipulation to avoid being prompted etc etc)
Doesn't have to be the "best" AI but should be pretty good. If, for instance, a model as good as Anthropic's current models (relative to the best available) was honest in this way then the question resolves YES.
I am (very) happy to accept hypothetical evidence. There is no requirement that there actually be an AI that plans on destroying us and tells us about it. For instance if someone shows weak AI all being honest and shows this smoothly generalizing as it gets more powerful and there's good theoretical justification to think whatever procedure is producing them will generalize, this resolves YES.
If someone betting on this market builds an AI that plans on destroying us, market resolves in whatever way I think will be worst for them.