Will I have >5 minute conversational video call with a fully synthetic AI character or digital avatar, with synced audio and video, which is somewhat lifelike (even if not perfect) and with conversational latency roughly comparable to human conversation before July 2025?
Other considerations:
The video should be at 720p resolution or better. The audio should sound lifelike and natural without many artifacts
The video and audio must be fully synthetic. They cannot be deepfakes or alterations of existing footage, though it's alright if they're prompted by pictures or short audio snippets.
There should be decent lip syncing. Even if I can see artifacts or flaws, the quality should be high enough that the median American wouldn't notice if they weren't paying much attention.
The latency can be slightly higher than an average human conversation, but it should be better than or roughly equivalent to the convenience of a normal video call with a spotty connection.
The conversational quality of the model should be equivalent to GPT-4o or better
The product, if it exists, must be available to American citizens, although it's fine if there are KYC requirements or a paywall under $100 to use.
https://labs.heygen.com/guest/interactive-avatar/vicky Surprisingly good given that we're still 8 months to market close. I think the latency is within the range of a YES resolution, but image, audio, and conversational quality aren't there yet. The lip syncing is also quite good, and close to the threshold for a YES resolution imo.
https://loopyavatar.github.io/ pretty good audio to video, probably good enough quality to count (but not in an integrated system with audio generation, looks like it requires all audio up front)