Given a description of a backend (code and/or specification) and a general description of a front-end (e.g. list of essential features, desiderata, style, etc), AI code generator should be able to generate front-end code, in a widely used programming language such as JS or TS, based on a widely used framework such as React or Vue.
Criteria:
code must be written completely by AI with no human interventions besides providing relevant information about the backend and a general description of a front-end
should work with any kind of a web front-end task a senior front-end / full-stack software engineer is expected to be able to implement
code must confirm to the specification and meet quality standards of an expert human senior developer proficient in a given language and problem domain
front-end must work according to commonly acceptable UI/UX standards
at least 2000 non-trivial lines of code
a task is considered to be performed successfully when all important parts of functionality are implemented; minor cosmetic defects or missing niceties are acceptable
a code generator should have at least 80% success rate
"should work with any kind of a web front-end task a senior front-end / full-stack software engineer is expected to be able to implement"
Senior front-end / full-stack software engineers are occasionally expected to be able to implement literally impossible tasks, so this arguably should already resolve NO.
https://v0.dev/ might be the most advanced I know of, but it's still a long shot till this question would resolve yes
Could you give an upper / lower bound for scale, apart from LOC?
Like, a random note-saving app without editing, deletion, or sign-in the lower bound? Would the same with sign-in count? Or would a ToDo list app with edits, deletes, and sign-in be the lower bound? Or something like a ToDo list app with tags and reordering and recurring items?
@1a3orn The question is basically "Can AI replace front-end web devs?", so lower bound should be similar to what people develop commercially, sign-in and so on are required.
Something similar to a TODO list app as you described might be good for a lower bound, except that an actual TODO list app won't qualify because it's a common tutorial topic so we won't be able to tell which parts are just copied from the training set.
@vluzko Not particularly. I know that people had some success with GPT-4, but initially released model with 8k context is definitely not sufficient due to limited size. (Although maybe somebody can prompt it to generate front-end piece by piece, but AFAIK nobody yet succeeded doing that.) Is 32k context enough? It's a bit hard to check as a detailed descriptions of a back-end of a non-trivial app are hard to find.