TL;DR: Will anyone generate a video of a clock using Sora such that it accurately counts the number of seconds passed during 20 seconds?
Background
OpenAI recently released a new text to video model called Sora. The model is good at generating realistic looking videos but many of the videos it generates have a weird flow of time (e.g. slow-motion, going backwards). Is it possible to generate videos with a realistic flow of time?
The most rigorous way to measure time in the real world is by using a clock, so to really prove that Sora can generate an accurate flow of time, we will use the presence of a clock that can count the number of seconds passed as our criteria.
Resolution
This question will resolve YES if I see a video generated by Sora that contains a clock which is accurate for 20 seconds. Resolves NO otherwise.
A "clock" can be a digital clock or an analog clock.
It can even be something which is not typically called a clock as long as it's an object which can be inspected at two parts of the video to determine how many seconds has passed between them. For example an hourglass with seconds passed marked or someone making markings on a wall every second.
If the clock also measures units other than seconds (e.g. hours, minutes) then we will only care about the accuracy of the part that counts seconds.
The 20 seconds will start the first time the clock changes to a new second in the video.
To test if a video has an accurate clock I will, to the best of my ability, find every timestamp where the clock first counts to a number of seconds. This can be seen as a list
[(4,10.421), (5, 11.391), (6, 12.485), ...]
where every second is paired with the timestamp. I will then check pairwise for every element in the list if the distance between timestamps is consistent with the clock being accurate.Some error is allowed but the error has to be less than one second. If the clock shows T1 at timestamp a and T2 at timestamp b, then we should have: | |a-b| - |T1-T2| | < 0.5
The clock has to be in view for the whole 20 seconds.
At least the part that shows the current number of seconds that have passed.
Any objects in the video which are not clocks are irrelevant. If there are multiple clocks then every clock has to be accurate for the same 20 second duration.
The clock doesn't have to start at 0.