I'm thinking of something like https://mentat.ai/, but that actually works.
I will provide a paragraph or so describing the change I want made. Then it should create a GitHub PR, which I will review and leave only a few comments before merging. The whole process should take less than 30 minutes. This should work fairly reliably.
I tried this yesterday and it failed haha:
https://github.com/manifoldmarkets/manifold/pull/2694
See more discussion in my post:
I bought yes because I've seen GitHub's Copilot Workspace already do promisingly well in my brief tests. By mid-2025, I can definitely see it being good enough to do real work on some codebases (especially if you have a good test suite).
if James doesn’t get accepted into AI grants, then there will be something better as an alternative, otherwise manicode will be coding features for us in a year
Will you @JamesGrugett provide additional repo-level, AI-specific documentation as you describe in https://manifold.markets/JamesGrugett/will-manicode-be-accepted-into-ai-g ?
From a reading of the question description text, I'd say that shouldn't be allowed: description mentions mentat.ai and "provide a paragraph or so"--both of which suggest no such AI-specific handholding.
Hi, great question!
When I created this market, I didn't imagine I would be building my own AI agent for coding.
Regarding human-created context on the codebase, I do think that should be allowed! Adding a bit of documentation seems like fair game. If, however, the context were specifying in detail how to make the coding changes for the specific feature, that would seem unfair.
Also, I think a little bit of back-and-forth with the AI should be allowed, since I did specify you could leave some comments, and that it should take under 30 minutes.
I think manicode does not yet qualify, since I'm not sure it would work 90% of the time, without manual intervention or extended back-and-forth.
Thanks for clarifying.
To be frank, the fact that you are literally designing your own AI presumably optimized for Manifold Github functionality wildly changes the odds on this question. Obviously can't know what projects will spin up over the course of the year (so fair play), but the phrasing of this question came off to me as pointing at 3rd party, general AI agents rather than Manifold-bespoke AI agents.
I understand. I will try to raise the bar of expectations if it feels like manicode is especially good at the manifold codebase compared to others. I don't really think this will be the case though.
While it is not coding, AI code review could be helpful. Take for example https://coderabbit.ai. It does a pretty nice summary as well as code review. They are also free for open source so you could try them out.
Here is an example that shows how it could be useful: https://github.com/jsonresume/jsonresume.org/pull/131#issuecomment-2236198926
Too subjective for me to bet much on. Expectations will shift as much or more than capabilities over the next year.
I think that in a year we'll see some outstanding successes when the feature is straightforward and uses a common pattern (i.e. add some CRUD route handlers to a REST API for a popular server framework).
But for more complicated things, and for codebases which go off the beaten path a bit, we'll still see broken PRs and code which superficially looks right but has an unusual number of subtle bugs.
In a year, I don't know if this market will resolve based on asking it to do something easy or hard, where the difficulty for a human might not correlate to difficulty for an AI-bot in a easily predictable way.
My general bias is that, with experience, a programmer will learn to avoid pitfalls of any tool, making the tool more useful over time, even without the tool changing at all.
I have a clear idea of what I'm looking for. It needs to be able to make good changes to the codebase for a variety of small-ish requests, which often involve some refactoring along the way. (Leaving code better after the change than before would be a good sign!)
I think this qualifies as a harder objective in your characterization. I'm totally on board with the idea that even now AI coding agents could become more useful operating within a more limited framework.
You've explored this a bit already -- do you know if any AI coding agents integrate with CI/CD to build & test the code they write? It seems like that could go a long way towards fixing the "code only superficially looks correct" issue.
If a first agent could write a comprehensive set of unit tests and end-to-end tests (including performance goals for desired level of scale), then it seems like you could let a second agent take as many implementation attempts as it needs to reach those goals.
That doesn't help with the broader "is AI generated code clean enough to directly incorporate into my codebase?" issue though. I suspect that we'll go through a period of "AI writes custom libraries to do a specific task. Humans don't mess with them, they just use them." That's not very different from how we treat compilers. If we want to alter the library, we'll tweak our requirements and let the AI generate it again, possibly using the old library for reference.
It's a good idea! Especially with languages that have types as another layer of checking.
MentatBot seemed to make lots of errors that could be tested, but they do say that testing approaches is a key part of how it works: https://mentat.ai/blog/mentatbot-sota-coding-agent
@JamesGrugett Is this just a ploy to get us to buy more mana so we can bet this up to 99%?