Resolves yes if there is a model that receives a natural language description (e.g."Give me a video of a puppy playing with a kitten") and outputs a realistic looking video matching the description.
It does *not* have to be *undetectable* as AI generated, merely "realistic enough".
It must be able to consistently generate realistic videos >=30 seconds long to count.
DALL-E 2 (https://cdn.openai.com/papers/dall-e-2.pdf) counts as "realistic enough" *image* generation from natural language descriptions (I am writing this before the model is fully available, if it turns out that all the samples are heavily cherry picked DALL-E 2 does not count but a hypothetical model as good as the cherry picked examples would).
Duplicate of https://manifold.markets/vluzko/will-there-be-realistic-ai-generate
Update 2024-23-12 (PST) (AI summary of creator comment): - Videos must be coherent throughout the full duration - meaning they must maintain consistency with the original prompt for the entire video without shifting between unrelated scenes
Looped scenes do not count
A single example of a successful video is not sufficient for resolution
Update 2024-24-12 (PST): - The success rate must be at least 66% of DALL-E 2's rate, not a flat rate. (AI summary of creator comment)
Update 2025-05-01 (PST) (AI summary of creator comment): Evidence must be publicly available.
Update 2025-18-01 (PST) (AI summary of creator comment): - Models must be able to generate videos consistently and handle a wide variety of prompts
The video must be produced in a single shot; videos stitched together from multiple segments do not count.
I'm somewhat confused by the criteria. Sora could definitely generate realistic looking videos back in spring 2024. It obviously can do some subjects and actions better than others, and what counts as "realistic enough" is unclear to me. The consistency also depends on what kinds of prompts you use, some things give good results 8/10 times, while others 1/10, so "consistently" isn't well defined either.
@ProjectVictory Existing models all fail on the >=30 second video criterion. Sora and Veo are generally realistic enough, they just can't maintain that for more than a few seconds.
@TheAllMemeingEye Models need to be able to do this consistently, and with a wide variety of prompts. Also it's hard to tell if this is multiple videos stitched together - that doesn't count, the model must do it single shot.