📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems are rapidly automating core engineering tasks in AI research, reaching near-saturation on key benchmarks. However, research activities still involve residual elements that are less automated, prompting a reevaluation of future AI R&D strategies.
Recent empirical evidence shows that AI systems now automate the majority of engineering tasks involved in AI research, with benchmarks reaching near-saturation levels. This development suggests that the engineering component of AI R&D is effectively automated, while research itself remains only partially automated.
Six key benchmarks measuring AI capabilities in core research skills—such as reproducing research, participating in Kaggle competitions, and designing GPU kernels—have all shown rapid improvement, approaching or reaching saturation within 16 to 21 months. For example, the CORE-Bench, which tests the ability to reproduce research papers, has improved from 21.5% to 95.5% and is considered effectively ‘solved’ by its authors. Similarly, the MLE-Bench, assessing performance on Kaggle competitions, has advanced from 16.9% to 64.4%, placing AI near mid-tier human performance.
These trajectories indicate that the engineering aspects of AI R&D—such as reproducing experiments and optimizing hardware—are now largely within AI’s automated capabilities. Conversely, Clark’s analysis highlights that research, which may involve creative and strategic thinking, remains less automated, though some argue that research may itself be a form of large-scale engineering. The current evidence suggests that the residual research tasks could be automated faster than previously thought, especially if research is viewed as an extension of engineering work.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

CLAUDE AI UNLEASHED From First Prompts to Pro: The Complete Guide to Claude AI for Writing, Research, Coding, and Business (The Claude AI Mastery Series)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

GPU-Accelerated Computing with Python 3 and CUDA: From low-level kernels to real-world applications in scientific computing and machine learning
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

Computational Visual Media: 13th International Conference, CVM 2025, Hong Kong SAR, China, April 19–21, 2025, Proceedings, Part II (Lecture Notes in Computer Science)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

AI for Scientific Discovery (AI for Everything)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI R&D Strategy and Innovation Pace
The rapid automation of engineering tasks in AI research could drastically reduce the time and cost associated with AI development, potentially shifting the innovation landscape. Organizations may need to reconsider how they allocate resources, as the bottleneck may shift from engineering to the more elusive, creative aspects of research. This shift could accelerate progress but also raises questions about the future role of human researchers and the nature of scientific discovery.
Progress in AI Capabilities and Benchmark Saturation
Over the past 16 to 21 months, multiple independent benchmarks—covering research reproduction, Kaggle competition performance, and kernel optimization—have shown consistent improvement, approaching or reaching their measurement limits. These benchmarks serve as proxies for AI’s ability to handle core research and engineering tasks, indicating a significant shift in AI’s practical capabilities.
Historically, AI’s role in research has been limited by the complexity and creativity involved. However, recent advances suggest that many of these tasks are now automatable, with some experts, including Clark, arguing that research itself may be reducible to engineering at scale. The question remains whether the residual research tasks involve fundamentally different skills or if they are simply more complex engineering problems.
“The pattern across multiple benchmarks indicates that AI can now automate vast swaths of AI engineering, with research remaining the residual challenge.”
— Thorsten Meyer
Unresolved Questions About Research Automation Speed
It is still unclear how quickly the residual research tasks—those involving creativity, strategy, and novel hypothesis generation—can be automated. While engineering appears to be effectively automated, the extent to which research itself can be reduced to engineering at scale remains an open question. The structural relationship between research and engineering may blur, but definitive timelines or thresholds are yet to be established.
Next Steps in Monitoring AI Research Automation Progress
Researchers and organizations will continue to track benchmark developments and explore whether research tasks can be further automated or if new benchmarks are needed. Attention will also focus on the evolving role of human researchers, the development of AI tools for creative scientific work, and potential shifts in research workflows over the next 32 months. Further empirical studies and strategic assessments are expected to clarify the residual gap between engineering and research automation.
Key Questions
What are the main benchmarks indicating AI automation progress?
The main benchmarks include CORE-Bench for research reproduction, MLE-Bench for Kaggle competition performance, and various kernel optimization tasks, all showing rapid advancement toward saturation.
Does this mean AI can now fully automate scientific research?
Not yet. While engineering tasks are nearing full automation, the automation of creative and strategic research activities remains uncertain and is an area of active investigation.
What are the implications for human researchers?
If engineering becomes fully automated, human researchers may focus more on hypothesis generation, strategic planning, and creative problem-solving, though the transition timeline is still uncertain.
Will this accelerate AI development timelines?
Potentially yes. Automating engineering tasks could reduce development costs and timeframes, but the overall pace depends on how quickly residual research tasks can be automated.
What are the risks of overestimating automation capabilities?
Overestimating automation could lead to strategic misalignments or complacency. It remains critical to monitor ongoing developments and validate automation claims through empirical benchmarks.
Source: ThorstenMeyerAI.com