Will an AI system be able to fully refactor a 10k+ line codebase before 2026 ? | Manifold

Will an AI system be able to fully refactor a 10k+ line codebase before 2026 ?

122

1.1kṀ17k

Dec 31

39%

chance

1D

1W

1M

ALL

Will an AI system be able to fully refactor a 10k+ line codebase before 2026 ?

Growing capabilities and context lenght increase of recent AI systems will potentially allow ever more powerful applications concerning code and IT infrastructures in general.

A full refactoring is a long and intense process that require a important amount of skill and knowledge. Good refactoring usually increase efficiency and readability of codebases, while facilitating further improvements on the codebase.

Refactoring & generation rules

To be considered a valid refactoring, the AI refactoring should actually show, in one go : good readability, efficiency gain (if possible), harmonization of the syntax and structure of the code while not showing any loss in feature or specification.
The system would need to deduce everything related to code, configuration files and basically the whole github repo
Pre-generation user feedback is possible but should be 100% optionnal and should only concern architecture preferences, naming conventions and high level considerations.
Re-run of the same input by the user until getting a valid result will not be counted as success.

Reliability

It would need to have a very high average reliability (~95%+) accross various common programming languages (Python, Java, C++, C#, etc...) and librairies.

Allowed human interactions

Interaction that need administrator privilege and directly asked by the system for package installation or similar for example (feedback possible for this).

Additionnal

There is one attempt for the final code generation, but internally the system could go for as many iterative test-loop process as needed and use as many external tool as needed.

For resolution

I would prefer not to rely on a single source (including me) for the resolution,

that's why I will prefer using public benchmarks (that of course doesn't exist yet ...).

If not available I will go for online forum consensus.

Get

1,000

to start trading!

People are also trading

Will AI agents be able to regularly code small features for us in a year?

Will an AI system capable of doing tasks that take humans eight hours as determined by METR.org, exist by 2027

Will AI pass the Winograd schema challenge by the end of 2025?

Will AI be Recursively Self Improving by mid 2026?

Will an AI system capable of doing 50% of knowledge job arrive by 2027?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

A major tech company, besides Anthropic, reports at least 98% of its code is AI-generated before April 1, 2026

By 2030, will over 50% of software development projects be primarily created by AI, with minimal human coding?

In 2028, will an AI be able to generate equivalent to ~=200 man years of effort towards a software 1.0 given a prompt?

By Jan 2027, will AI independently run 3 successful companies that would've previously needed programmers?

Sort by:

@Guillaume Would it be valid, if the codebase started out over 10k lines of code, but ended up significantly less, with all of the other stipulations met?

In my view, it hinges on how many more generations of AI systems we will get. Assuming there's a GPT-6 or equivalent by 2026, it should resolve as yes. That said, two more generations in 1.5 years would require a further acceleration in the pace of progress, which is what I'm actually betting on.

bought Ṁ40 YES

I bought YES as a hedge, at least if I'm unemployed I've made some mana.

bought Ṁ100 NO

Buying up NO as the conditions specified by the author seem highly unrealistic at this point.

bought Ṁ5 YES

@nsokolsky What's an example of one such condition? To me, all of them seem likely before 2025, never mind 2026.

@12c498e “95% average reliability” is one of them. I use GPT-4 daily and it’s maybe 80% accurate on the average task, much less so for ambiguous and abstract queries. What OP describes is such an advanced system that it would effectively result in 90% of software engineering jobs getting eliminated overnight.

How would this be tested? Will any example of a refactor be sufficient (in which case I’m sure an example can be contrived already for Gemini Ultra)? Or will you be picking random GitHub repos with 10k lines and asking for a refactor?
Does the code have to compile and run without any human intervention? Or will human intervention be acceptable - and if so, how many lines can humans change for this to count as YES?
Does “one go” mean there’s only 1 attempt in total with no feedback? Does this also mean re-runs of the same input until a valid result is obtained are not acceptable?
If the AI system runs the code on its own and keeps on doing refactoring until it compiles (latest GPT-4 can do this for Python), does this count as “one shot”?

Yes of course a single lucky refactor would not suffice. It would need to have a very high average reliability (~95%+) accross various common programming languages (Python, Java, C++, C#, etc...) and librairies. I would prefer not to rely on a single source (including me) for the resolution, that's why I will prefer using public benchmarks (that of course doesn't exist yet ...). If not available I will go for online forum consensus.
There will be no tolerance for the output on code modification, the system would need to deduce everything related to code, configuration files and basically the whole github repo (you can see this as a full repo generation). The only actions with human intervention that would be allowed is interaction that need administrator privilege and directly asked by the system for package installation or similar for example (feedback possible for this).
Pre-generation user feedback is possible but should be 100% optionnal and should only concern architecture preferences, naming conventions and high level considerations. Re-run of the same input by the user until getting a valid result will not be counted as success.
There is one attempt for the final code generation, but internally the system could go for as many iterative test-loop process as needed and use as many external tool as needed.

@nsokolsky I professionally use Aider for much of what the question state as criteria - but not sure that the current version really does all of it, because I've not had such an use case - but Aider + TreeSitter + GIT does come close to it... I recommend you have a look: https://github.com/paul-gauthier/aider?tab=readme-ov-file#example-chat-transcripts it's a nice tool!

@Magnus_ it seems like a peacemeal tool. I'm pretty sure it would fail for any reasonably big project, given that OP requested 95% success rate at one-shot performance.

bought Ṁ25 YES

Can't Gemini Ultra already do this?

@Pykess don't know if you can connect tools like Aider too gemini, but GPT4 does a quite good job when having access to treesitter data for your repository.

People are also trading

Will AI agents be able to regularly code small features for us in a year?

Will an AI system capable of doing tasks that take humans eight hours as determined by METR.org, exist by 2027

+33% 1d83% chance

Will AI pass the Winograd schema challenge by the end of 2025?

Will AI be Recursively Self Improving by mid 2026?

Will an AI system capable of doing 50% of knowledge job arrive by 2027?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

A major tech company, besides Anthropic, reports at least 98% of its code is AI-generated before April 1, 2026

By 2030, will over 50% of software development projects be primarily created by AI, with minimal human coding?

In 2028, will an AI be able to generate equivalent to ~=200 man years of effort towards a software 1.0 given a prompt?

By Jan 2027, will AI independently run 3 successful companies that would've previously needed programmers?

Related questions

Will AI agents be able to regularly code small features for us in a year?

Will an AI system capable of doing tasks that take humans eight hours as determined by METR.org, exist by 2027

Will AI pass the Winograd schema challenge by the end of 2025?

Will AI be Recursively Self Improving by mid 2026?

Will an AI system capable of doing 50% of knowledge job arrive by 2027?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

A major tech company, besides Anthropic, reports at least 98% of its code is AI-generated before April 1, 2026

By 2030, will over 50% of software development projects be primarily created by AI, with minimal human coding?

In 2028, will an AI be able to generate equivalent to ~=200 man years of effort towards a software 1.0 given a prompt?

By Jan 2027, will AI independently run 3 successful companies that would've previously needed programmers?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules