📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Recent evidence shows AI systems are nearing full automation of core engineering tasks in AI research. However, the automation of AI research itself remains incomplete, with some aspects still requiring human insight. This development could reshape AI R&D workflows and institutional strategies.
Recent benchmarks and research analyses indicate that AI systems are approaching the point where they can fully automate core engineering tasks in AI research, while the process of conducting research itself remains partly human-driven, according to experts.
Multiple independent benchmarks—CORE-Bench, MLE-Bench, and kernel design advances—show AI systems have achieved near-saturation levels in automating core engineering skills essential to AI research. For example, CORE-Bench, which measures the reproduction of research papers, reached 95.5% reliability in December 2025, with the benchmark’s author stating it is ‘solved.’ Similarly, MLE-Bench, evaluating Kaggle competition performance, hit 64.4% in February 2026, approaching mid-tier human performance.
These developments suggest that the bottleneck in AI research is shifting from engineering to the research process itself. While AI can now handle dependencies, run experiments, and optimize kernels with minimal human oversight, the creative and conceptual aspects of research—such as hypothesis formulation and problem framing—are still less automated. Experts like Thorsten Meyer interpret this as a structural shift: engineering tasks are increasingly automated, but research remains the residual challenge, although this residual may diminish as research itself becomes more engineering-like.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

CLAUDE AI UNLEASHED From First Prompts to Pro: The Complete Guide to Claude AI for Writing, Research, Coding, and Business (The Claude AI Mastery Series)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

AI Engineering: Building Applications with Foundation Models
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

Architecting Data and Machine Learning Platforms: Enable Analytics and AI-Driven Innovation in the Cloud
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

Machine Learning Production Systems: Engineering Machine Learning Models and Pipelines
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI R&D and Institutional Strategies
This rapid progress in automating engineering tasks in AI research could dramatically reduce costs, accelerate discovery cycles, and shift the competitive landscape among AI labs and corporations. Organizations might need to reconsider their investment priorities, focusing more on research innovation and less on engineering infrastructure, as the latter becomes largely automated. However, the incomplete automation of research processes leaves open questions about the future role of human researchers and the potential for AI to fully automate scientific discovery.
Recent Advances in AI Automation and Benchmark Progress
Over the past 18 months, several benchmarks have demonstrated AI’s capability to automate core research tasks. CORE-Bench, which assesses research reproduction, improved from 21.5% in September 2024 to 95.5% in December 2025. Similarly, Kaggle competition performance, measured by MLE-Bench, advanced from 16.9% to 64.4% in roughly 16 months. These patterns indicate a rapid saturation of engineering skills relevant to AI research, with multiple independent measures converging on this trend. Meanwhile, advances in kernel design—such as automated GPU kernel optimization—are moving from research papers into production use, further illustrating the shift toward automation in AI infrastructure.
“The pattern across these benchmarks indicates that AI can today automate vast swaths, perhaps the entirety, of AI engineering.”
— Thorsten Meyer
Unresolved Questions About AI Research Automation
While engineering tasks are nearing full automation, it remains unclear how much of the research process—such as hypothesis generation, conceptual innovation, and strategic decision-making—AI can automate. Experts like Clark and Meyer acknowledge that some aspects of research may be inherently non-automatable or require human insight, and the rate at which these residual tasks will be automated is uncertain. Additionally, the institutional and ethical implications of such automation are still under discussion.
Next Steps in AI Automation and Research Development
Over the coming 32 months, focus will likely shift toward understanding and enhancing AI’s capabilities in the research phase, including creative problem-solving and strategic planning. Researchers and organizations may begin to experiment with AI-led research initiatives, potentially leading to new models of scientific discovery. Monitoring how automation impacts research quality, novelty, and ethical considerations will be critical as these developments unfold.
Key Questions
How close are AI systems to fully automating all aspects of AI research?
Current benchmarks suggest that engineering tasks are nearly fully automated, but the automation of research itself—such as hypothesis generation and conceptual innovation—remains incomplete. The timeline for full automation is uncertain and likely depends on future technological and institutional developments.
What are the risks of automating AI research?
Potential risks include reduced human oversight, ethical concerns about autonomous research, and the possibility of AI-driven research diverging from human values or priorities. These issues are actively debated among researchers and policymakers.
Will automation replace human researchers entirely?
While automation may significantly reduce the need for human involvement in engineering tasks, the complete replacement of human researchers is unlikely in the near term. Human insight remains crucial for conceptual innovation and strategic decision-making, although this may evolve as AI capabilities advance.
How will organizations adapt to these changes?
Organizations may shift their focus toward leveraging AI for research innovation, investing in AI-human collaboration models, and reevaluating their research workflows to maximize efficiency and creativity in an increasingly automated environment.
What ethical considerations arise from automating AI research?
Automating research raises questions about accountability, transparency, and the potential for unintended consequences. Ensuring that AI-driven research aligns with ethical standards and societal values will be a key challenge for the community.
Source: ThorstenMeyerAI.com