DALL-E 3 successfully makes images that look photorealistic, or in a certain style. It doesn't always understand the prompt, meaning it may draw something different than what the creator intended. But given a certain image, it could easily pass for a photo or drawing made by a human.
Same criteria for music. It must be able to make songs that sound like they could have been popular human-created songs. Instrumental only is fine, it doesn't need to be able to do lyrics. But it does need to be able to handle an entire song, not just a clip a few seconds long.
Resolves according to my judgement, I won't bet. I like a lot of instrumental music, so if there exists an AI music generator that I like enough to replace my current human playlists, that would likely be good enough. If there isn't one, and the reason isn't because it's just too expensive for me, that means this'll probably resolve NO.
Suno released v4 today with "Better audio, sharper lyrics, and more dynamic song structures": https://suno.com/blog/v4
These generators are getting pretty good; while there are still some issues around exact style and following more detailed instructions, DALL-E 3 also has similar issues. (It can't even draw a pentagon.) So I'm learning towards resolving YES.
(I think Jacy was expecting me to have more refined musical tastes than I usually do; my current style of listening is to pick some random instrumental Youtube compilation and let it autoplay, and the stuff I can compose on Suno is good enough that I think I'd find it equally pleasant as background music.)
Pushing me away from a YES resolution is the fact that it seems these AI songs are still very short (~2 minutes), and don't have a proper ending, they just cut off. They also tend to be pretty consistent in their pacing, without changes in cadence or style. I think the best AI music I've heard yet is from the Fooming Shoggoths, which would definitely be enough to qualify, but that had a lot more human involvement.
How about we do a blind test like Jacy suggested? Here's a proposal, I'm open to modification:
Henri (or another volunteer for the YES team) creates 5 prompts to give to any combination of AI music generators of their choice. For each prompt they're allowed to generate up to 3 songs and pick the best one. They're also allowed to use any simple options that come built-in to the generator, like setting the style or clicking an "extend" button. They're not allowed to do anything more complicated like regenerate a chosen part of the song or edit it using an external program.
Jacy (or another volunteer for the NO team) finds 5 songs that are entirely human-created, with no AI-assistance. They can't be any longer than 4 minutes each, and I'll disallow anything super weird or creative that's trying to stand out from what an AI would do, like 4'33 or a song that uses only video game sound effects. It must be "normal music". It also can't be anything I've heard before and would recognize.
All music from both teams must be entirely instrumental with no voices, and all 5 should be in significantly different styles. I'm given a file with all 10 songs in a random order and no clues to which is which. I get to listen to them as many times as I want before making my final guess for each. If I get at least 9 of them right, this stays open, otherwise it resolves YES.
@IsaacKing That's a clever resolution mechanism to operationalize this!
I do wonder, though, if Dalle-3 would actually fail this if you tried the same thing? Like, some of the things you could use to detect that the music is AI generated is more about some limits that Dalle-3 kind of has analogous limits to? For instance, it creates images at a certain resolution (and typically squares - though the API might be able to do otherwise?). And maybe there would be sort of key things to look for that would make it so you'd manage to categorize the AI images, especially if the human-created images were specifically picking cases where Dalle is known to be weaker?
I don't have much of a stake in this question, so I'm not pushing back very hard here! But that's what I was thinking about when you described it.
@ChrisPrichard Yeah, that's a fair point. The aspect ratio is easily changed by cropping the picture, but resolution is a real limitation. I aimed to be fair to Suno by limiting the songs to 4 minutes, which I think is similar to a resolution limit. But yeah, I think the current ones are definitely very close.
@IsaacKing I like the blind test, but I'd also venture that if a YES resolution is warranted, then presumably you should replace your current human playlists. My guess is that if you do that with May 2024 music generators and start listening to them as often as you listen to those current human playlists, you will quickly tire of it.
On the test, could you first share a playlist of instrumental music you like so both 'sides' could approximately match that? There are many different types of instrumental music, and few people enjoy all of them. In that case, I'd be willing to put in the time to gather 5 such songs.
@Jacy The reason I haven't started listening to AI music is primarily logistic; I'm not aware of any service that will constantly play hours of new music for me for free like Youtube does. If I were able to test it out it's possible I'd discover I liked it less than Youtube, not sure. I can only tell so much from short clips.
Do my personal preferred genres matter? I think the test is equally fair as long as both parties are aiming for the same thing. If the intention is that you think I'd be better at distinguishing human from AI if it's a genre I'm familiar with, that might be true, but seems a little unfair, since I'm not an artist or art connoisseur and don't have that advantage when looking at DALL-E's outputs. (Though I did play some "human or AI" art games with my partner who is a professional artist, and she wasn't significantly better at them than me.) Also I think it would also be less time-intensive for both parties if they're able to draw from the largest pool of possible music rather than trying to match something more narrow.
@HenriThunberg, are you interested as well?
@Jacy I'll put time into responding with suggestion on test modifications. Want to get back constructively when I do. But yes, interested!
But wanted to quickly write that I think one of the things I am most skeptical about is the fairness of picking a genre that Isaac is most famoliar with. My experience on AI music is that it's easier there than elsewhere, and doesn't seem like it should be part of the test.
Will get back with more thoughts.
@Jacy @IsaacKing shall we try this? I am down to put some time into making this test happen within the next 1-2 weeks.
Isaac, could you confirm that if this test resolves NO, we take another stab at it at EOY? Otherwise I'd prefer to hold off further until the models supposedly have evolved to their high point.
@HenriThunberg I was thinking we would do this at the end of the year if ever. Isn't that a bad deal for NO holders if the outcomes are only a possible YES resolution and then possibly a drop in market price if the test fails? Presumably I should prefer a high market price anyway because I get more cheap NO shares.
I also think the non-test criteria "if there exists an AI music generator that I like enough to replace my current human playlists..." is a significantly higher bar than a test like this, so shouldn't I prefer the test to never be run in the first place?
@IsaacKing I am quite confident that your complaints below have been resolved with both v3 of Suno (which I don't think you've yet commented on for this post) and/or Udio.
Song length: Suno makes 2 minute clips by default, and you can extend them if you want. Udio clips are made to be stitched together beyond 30 sec, and very user-friendly to do so.
Consistently instrumental: Both apps have options to remove lyrics, that both seem very reliable to me.
Guitar solo example: All my 2x2 first generations on Udio/Suno gave me decent examples of this on my first try.
Nonsensical lyrics: I don't think they're great, but not non-sensical. Don't know to what extent this improved by a lot between Suno v2 and v3. Regardless, both services offer the option of adding your own lyrics which to a very large degree solves this.
Since this all relies on your opinion, I think this market would really benefit from new concrete
goalposts, in the style of your comments below. Alternatively, a YES resolution now if you don't have such complaints. Otherwise, this risks becoming much more of a "What will Isaac think" market rather than a "What are AI music generator capabilities by EOY 2024" prediction market.
@HenriThunberg I think you're underestimating the bar that @IsaacKing has for a YES resolution. This is fairly concrete: "I like a lot of instrumental music, so if there exists an AI music generator that I like enough to replace my current human playlists, that would likely be good enough. If there isn't one, and the reason isn't because it's just too expensive for me, that means this'll probably resolve NO."
Despite the impressive capabilities of 2024 models, I really doubt any of them are doing well enough to replace Isaac's current human playlists. Do you think that's even plausible? Personally, I also really don't think they "sound like they could have been popular human-created songs." I'm confident blinded tests right now would have no trouble distinguishing human- and AI-created songs.
@Jacy I don't think we're necessarily in disagreement about where things stand, your comment largely makes sense to me. But yes, I think there would already be two feasible examples for AI playlists replacing current ones at least for focus music or background dinner party elevator jazz: A) a human could already curate a playlist of purely AI-generated songs that would be good enough.
B) automatically rank tracks on the platform by popularity within a certain genre.
To this I'd like to add that I think the current generators are more impressive than I would have expected in 2024. E.g. inpainting (select a portion of the clip that you want to make new generations for, with new instructions) which was just launched by Udio seems to be a type of DALL-E 3 functionality not considered in this market. I wonder whether that helps resolution at all.
Anyway, what most concerns me is that the goalposts we're aiming for are quite vague and getting clearer tests (like Isaac's original complaints above) would help with that.
https://twitter.com/elevenlabsio/status/1788628171044053386
I feel like this should count.
@IsaacKing any chance we could get a YES resolution on this already? Would be great to liberate some mana for charity donations this week 🙌
Of course also happy to hear a NO opinion on Suno and Udio not being good enough for your criteria, I personally think they are (and have obviously betted thereafter).
New text-to-song app has been released - though their servers got overloaded with new sign ups..
Keen to hear your thoughts @IsaacKing
Instrumental, "high-energy, lively, infectious brass, sax, and drum music that blends modern jazz, funk, dance, electronic and hip-hop"
Generated entirely with Suno v3
Alright, I’m doubling down.
Same criteria for music. It must be able to make songs that sound like they could have been popular human-created songs. Instrumental only is fine
Suno's newest version easily makes songs that are convincing and straight up slap now.
@IsaacKing Can I suggest you check out Suno V3 and let us know your thoughts?
@OneGuy @IsaacKing link to Suno for you to experiment and share your feedback with this market. Not sure if V3 is free or paid though - https://app.suno.ai/create/