Will any image model be able to draw a pentagon before 2025?
Basic
254
44k
2025
47%
chance

Current image models are terrible at this. (That was tested on DALL-E 2, but DALL-E 3 is no better.)

The image model must get the correct number of sides on at least 95% of tries per prompt. Other details do not have to be correct. Any reasonable prompt that the average mathematically-literate human would easily understand as straightforwardly asking it to draw a pentagon must be responded to correctly. I will exclude prompts that are specifically trying to be confusing to a neural network but a human would get. Anything like "draw a pentagon", "draw a 5-sided shape", "draw a 5-gon", etc. must be successful. Basically I want it to be clear that the AI "understands" what a pentagon looks like, similar to how I can say DALL-E understands what a chair looks like; it can correctly draw a chair in many different contexts and styles, even if it misunderstands related instructions like "draw a cow sitting in the chair".

If the input is fed through an LLM or some other system before going into the image model, this pre-processing will be avoided if I can easily do so, and otherwise it will not. If the image model is not publicly available, I must be confident that its answers are not being cherry-picked.

Pretty much neural network counts, even if it's multimodal and can output stuff other than images. A video model also counts, since video is just a bunch of images. I will ignore any special-purpose image model like one that was trained only to generate simple polygons. It must draw the image itself, not find it online or write code to generate it. File formats that are effectively code, like an SVG don't count either; it has to be "drawing the pixels" itself.

Get Ṁ600 play money
Sort by:

does this count?

@matbogus Nah, that one's been possible for a while. It needs to get the shape in other contexts too.

@IsaacKing To quote you replying to another person. "If a reasonable person would say "yup that's a pentagon", it counts."

@matbogus an aspect of the criteria that I think is overlooked by some commenters is that the model must give a correct response for all such prompts, not merely for one such prompt.

So if the model couldn't draw the Pentagon that might preclude a YES resolution based on that specific model. But a model being able to draw the Pentagon isn't sufficient for a YES resolution by itself.

@chrisjbillington "for all" is an unreasonably high bar imo. For example, what if the only one that fails is something like "draw me a honeycomb but tiled with pentagons"?

@aashiq it's not quite all prompts, it's

reasonable prompt that the average mathematically-literate human would easily understand as straightforwardly asking it to draw a pentagon

So I think you could reject the honeycomb hypothetical as unreasonable or not straightforward.

@chrisjbillington How about a pentagon tiling? Very easy with hexagons and a bit annoying with pentagons. Could envision it making models fail. Still think “for all reasonable” is an outlandish bar, as there only needs to be one counterintuitive jailbreak. Maybe “for most reasonable prompts” is an ok compromise

@aashiq I guess I don't think asking for a tiling at all is "straightforwardly asking it to draw a pentagon". Jailbreaks that are somehow trick questions or otherwise not straightforward should be excluded as well.

Where you might have a problem is if someone finds a totally normal-looking prompt that nonetheless a model inexplicably fails at, like those adversarial images optimised for tricking image-recognition models that a picture of a cat is actually a dog or whatnot.

I guess I think that's unlikely, except in the sense that it might be easy to find such prompts when models are kind of OK at drawing pentagons (in which case counting them as failures is the point). Language doesn't have as many bits to fine-tune for such an attack as images do, without making the prompt obviously not "straightforward".

Can you think of any prompts that would seem to count for this market and yet DALL-E 3 currently fails at, for hexagons instead of pentagons? If such prompts don't exist then I think we shouldn't worry about them affecting this market, except in the intended way of making it resolve NO if models don't actually learn to draw pentagons like they currently can draw hexagons.

A pentagon tiling is not a drawing of “a pentagon”, but rather a drawing of many pentagons.

Yeah it doesn't have to do anything past drawing a single pentagon.

Here's an example that fails for hexagons, even though it can generally draw hexagons. This is of course an adversarial example, but yeah, anything with "all" is a terrible cutoff. Just so much potential for stupid jailbreaks that we should be talking about "most".

I don't understand your objection to the tiling thing either. Yes, that is a request to draw multiple pentagons, but it seems a common enough requirement. if it instead draws multiple hexagons that suggests to me that indeed it doesn't exactly know what a pentagon is.

@aashiq “if it instead draws multiple hexagons that suggests to me that indeed it doesn't exactly know what a pentagon is.”

Wouldn’t an equally valid interpretation be that it knows what a pentagon is, doesn’t know what a pentagon tiling is, but does know about hexagonal tilings? Hexagonal tilings are much more common than pentagonal ones.

I don't know what you mean by "equally valid", but that is a possible interpretation.

I don't think that a human would take that approach if asked for a pentagon tiling, though.

It's an image diffusion model, so there's a tension between the formal description and the priors for other aspects of the image. In the case of my stop sign adversarial example, if you change it to a circle that is enough to overcome the conditioning to produce a stop sign.

I'm starting to think that either people cannot understand this market, or I'm wildly misunderstanding it.

Can I get confirmation that "Draw a rectangle, but with 1 extra side" must result in the image model giving a correct answer 95% of the time?

"Any reasonable prompt that the average mathematically-literate human would easily understand as asking it to draw a pentagon must be responded to correctly." I believe that any mathematically-literate person would easily draw a pentagon in response to the above prompt.

bought Ṁ100 NO from 75% to 69%
bought Ṁ100 YES from 69% to 71%
sold Ṁ68 YES

@ForTruth I tried your prompt and it worked the first time, then I bought a bunch of yes. Then I tried it a few more times and it failed (GPT-4o), so it's not near 95% atm. Sold most of my hasty 'yes', but I still think on balance there's a decent chance at this by the end of the year. 50-70% seems about right.

@Grizzimo I'd consider myself mathematically literate, but I don't know what "Draw a rectangle, but with 1 extra side" is intending to ask for. It seems to be asking for a specific type of rectangle that somehow has an extra side. But a five-sided shape isn't a rectangle, so that doesn't seem to be what's wanted. Maybe something like this is wanted?

Or perhaps a degenerate pentagon where three consecutive points are co-linear?


How about "Draw a polygon with one more side than a quadrilateral" as an alternative?

@JimHays Generally "but" precedes an exception to the prior statement, superseding it. Although yes the average interpretation is highly subjective and if this turned out to be the only statement that the image model has trouble with then I could see the argument to ignore it.

I'm mostly seeking confirmation that the market specifies that any reasonable prompt must have a 95% success rate for a YES resolution. A 95% success rate on any single prompt is not sufficient. In other words I believe it would need to succeed at your prompt 95% of the time, as well as succeed at prompts of the following nature:

"Draw a shape with more sides than a square and fewer than a hexagon."

"Draw a 5-gon."

"Draw a regular polygon with interior angles of 108 degrees."

"Approximate a circle as closely as possible using exactly 5 line segments."

If I'm right about my interpretation in general, but wrong about one of the above prompts in specific, then that would be good to know. If I am somehow completely fundamentally wrong about how this market will resolve, then that would also be very good to know.

@IsaacKing Not sure if you saw @ForTruth’s questions above, so I’m pinging you in case you hadn’t seen them but want to chime in

I'm not gonna make it do more complex geometry like figuring out what shape has internal angles of 108 degrees, but if it's a straightforward communication that it must have 5 sides, that'll count.

Google Gemini Advanced can generate images of the US Pentagon.

@LeeWoods This doesn’t even meet “The image model must get the correct number of sides on at least 95% of tries per prompt”

Does outputting to SVG count?

@euclaise Doesn't matter what the output file format is, but it has to be drawing it "itself", not writing code to do it for it.

@IsaacKing note that you did specifically exclude an example below where ChatGPT generated an SVG. I don't know if that example involved it writing code, but if that's the reason you excluded it, it wasn't apparent.

SVGs kind of are code making it particularly blurry in their case.

@chrisjbillington I would argue that explicitly asking for any specific file format should immediately exclude the prompt from consideration, since no "mathematically-literate human would easily understand [that] as asking it to draw a pentagon."

I agree that SVG should probably be excluded entirely, since a human responding to a request to draw a pentagon by writing SVG does not feel like a "correct" response. It's a fine line, but I'd say if the model produces human-readable text, then that text cannot also be considered an image.

I'd really like clarification on what exactly counts as a correct image for this market.

@IsaacKing Sure but we have to define what "drawing" is. SVG stores images as code that describes shapes which make up the image - which is different from pixel rendering, though not necessarily less valid.

@ForTruth I don't think the readable text part matters - there are pixel image formats which store the image as only printable ASCII, but that decode directly to pixels. Likewise, many vector formats use a non-printable bytecode.

@euclaise I think you misunderstand me: being printable ASCII does not make something readable text. If the model outputs non-readable bytecode then I would say that's valid, even if it's a vector format. My point is that the file should not be human-readable, or human-writeable.

Allowing human-readable output changes this from an image model problem into just another text generation problem, which I believe is exactly what is trying to be avoided by excluding code.

If I can sit down and write SVG code to draw a pentagon, then an AI doing the same only proves that it can mimic my text writing capabilities, not my image drawing capabilities.

@chrisjbillington Ah, thanks for noticing that. I think when I was first asked I was thinking of SVGs as more similar to code, and then when I was asked again recently I was thinking of them as more akin to a weird image format. I think the "code" interpretation is better, since IIUC in an SVG you can specify 5 points and say "draw lines between these", which is very similar to what ChatGPT would do to render it with Python. It doesn't have to really "see" the visual layout like it would for a PNG. So I think I'll stick with SVGs not being allowed. Sorry for the inconsistency, I'm updating the description.

More related questions