Will I consider Nat Friedman's tweet about AIs writing code to have been a wild exaggeration?

Ṁ2862

Dec 31

33%

chance

ALL

Nat Friedman, former GitHub CEO and AI hype enthusiast made this tweet claiming "This is going to be an insane year for AIs writing code" which falls into the kind of vague, gesture-y and unfalsifiable AI hype proclamations that I'm quite skeptical about and find rather annoying.

I want to challenge my biases though, so I'm making this market to register my deep skepticism, and to see if I will have been right or wrong to be so by the end of the year. I am a software engineer so I'm familiar with and use most AI code tools including gpt-4 and copilot and find them of some marginal utility, say 3/10 on a subjective, vibes-based scale.

I will resolve this market as YES if by the end of the year, it turns out Nat was bullshitting and there's not a step change in new AI tools that I consider "insane" or a significant improvement on current Copilot for example (say 6/10 on my subjective scale). Considering I might be biased, I will also allow my judgement to be influenced by the consensus of opinions of other programmers or commenters.

I will resolve this market NO if there is a clear step up in new AI tools that showcase clearly superior abilities to current tools, or are "insane" in some consensus observable way. If said AI tools are not publicly available but there's clear evidence of their existence in some other domain, I will also bias to resolving NO. I can't think of any reasons to resolve this N/A besides force majeure reasons, since I'm bound to have an opinion one way or another

I'm open to better wordings of this market or more concrete ways to quantify my opinion

#Artificial Intelligence

Get Ṁ1,000 play money

7 Comments

Sort by:

Update for @traders ...I have been a solid YES all year but I started using Claude 3.5 around ~2 months ago both at work and in pet projects and it's mindblowingly great. I'd say it has roughly halved the time i spend working on tickets, and allowed me to ship a chrome extension and (non-trivial) mobile app over one month, with relatively ease and only nominal input by me. Maybe these are relatively commoditized UI tasks of the sort LLMs might be good at, but this was not clearly the case 5 months ago. I plan to test drive Cursor over the course of this month and see what the hype is but I'm no longer so sure I would resolve YES if this was to resolve today.

@diadematus try Manicode!

bought Ṁ5 NO

what do you think about the gains on SWE-bench? I haven't tried any coding agents and I'm not sure which ones are even publicly available. But wouldn't be surprised if climbing this benchmark ends up tracking something real.

@JoshYou I really don't care about these sorts of benchmarks, just my subjective experience using these tools in my daily work (as described in the market)

Apparently even "Devin" hype video was apparently a lie - https://www.youtube.com/watch?v=tNmgmwEtoWE

bought Ṁ10 YES

Devin hype is dying down, and I have decided that the trend of flashy demos that go nowhere is more bullish for my skepticism, not less.

https://www.cognition-labs.com/blog - pretty good but not 'insane' imo...i'm a little less confident of my skepticism now though

Related questions

Related questions