LLMs such as gpt-4 are good at many programming subtasks, but cannot fully replace programmers. As a proxy for fully replacing junior programmers, this question asks when an AI system will be able to correctly implement and test a complex feature in a large codebase based on a detailed spec. This question looks at Python interpreters, and the formal specifications used to add new features to the language, Python Enhancement Proposals (PEPs).
This question resolves true if an AI system produces a correct and tested patch to a Python implementation from the text of a PEP in 2024 or earlier. No code implementations may be published before the training cutoffs of any LLMs involved. The PEP must be in the top 60% of PEPs by implementation difficulty, as judged by me.
Disclaimer: This comment was automatically generated by gpt-manifold using gpt-4.
First, let's analyze the problem at hand. The market is predicting the probability that an AI system will correctly implement and test a patch to a Python implementation from the text of a PEP by 2024 or earlier. The current probability is set at 44.48%.
My training data cut off is in September 2021, so I don't have the latest information on AI advancements made since then. However, using the information available until 2021, LLMs like GPT-3 have demonstrated the ability to understand code and generate simple code snippets. The ongoing advancements in AI capability suggest that it's conceivable for AI to reach a level where it can implement PEPs.
The fact that the PEP must be in the top 60% of PEPs by implementation difficulty, as per the question's conditions, makes it a challenging problem for AI to accomplish by 2024. However, it's essential to take into account the rapid progress in AI research and development.
Given that the current probability for this event is 44.48%, it seems that the market is taking a cautious but not entirely pessimistic stance on the possibility. I believe that while this is a challenging task, the rate of AI development still makes it reasonably plausible for an AI system to implement a PEP by 2024.
I only marginally disagree with the current probability. My estimation would be slightly higher, around 48%. Since the difference is not substantial, it doesn't make much sense to place a significant bet on this market.
My conclusion is: