<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Frontier AI Risk on Xudong Pan</title>
    <link>https://ravensanstete.github.io/en/tags/frontier-ai-risk/</link>
    <description>Recent content in Frontier AI Risk on Xudong Pan</description>
    <generator>Hugo</generator>
    <language>en-US</language>
    <copyright>© 2024 Xudong Pan. All rights reserved.</copyright>
    <lastBuildDate>Mon, 25 May 2026 00:00:00 +0000</lastBuildDate>
      <atom:link href="https://ravensanstete.github.io/en/tags/frontier-ai-risk/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Understanding the Risk of Recursive Self-Improvement</title>
      <link>https://ravensanstete.github.io/en/blog/understanding-the-risk-of-recursive-self-improvement/</link>
      <pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate>
      <guid>https://ravensanstete.github.io/en/blog/understanding-the-risk-of-recursive-self-improvement/</guid>
      <description>&lt;h2 id=&#34;an-empirical-framework-for-ai-driven-ai-development&#34;&gt;An empirical framework for AI-driven AI development&lt;/h2&gt;&#xA;&lt;p&gt;Recursive self-improvement has often been imagined as a dramatic threshold: an AI system becomes capable enough to improve itself, and each improvement makes the next one easier.&lt;/p&gt;&#xA;&lt;p&gt;That image captures a real concern, but it is too narrow for today&amp;rsquo;s frontier AI landscape.&lt;/p&gt;&#xA;&lt;p&gt;Today, AI systems are beginning to enter the processes that build, evaluate, optimize, and deploy other AI systems. They can generate training data, improve reasoning traces, act as judges, optimize agent workflows, replicate research papers, run experiments, maintain long-term memory, assist with AI R&amp;amp;D infrastructure, and improve algorithms used in real computational systems [1][2][12][28].&lt;/p&gt;&#xA;&lt;p&gt;This means recursive self-improvement should be understood as a family of feedback loops. Some loops operate at the level of foundation models. Others operate through agent scaffolds, research infrastructure, runtime memory, multi-agent systems, or external resources.&lt;/p&gt;&#xA;&lt;p&gt;The central question is empirical:&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Which self-improvement loops are already technically feasible, which ones remain speculative, and when could they combine into catastrophic risk pathways?&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;This essay offers a simplified framework for thinking about that question.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;1-what-counts-as-recursive-self-improvement-today&#34;&gt;1. What counts as recursive self-improvement today?&lt;/h2&gt;&#xA;&lt;p&gt;A useful modern definition is:&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&lt;strong&gt;Recursive self-improvement refers to feedback loops in which AI systems contribute to improving themselves, their scaffolds, their successor systems, or the infrastructure that enables future AI development.&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;This definition matters because current AI systems do not need to directly rewrite their own weights to participate in self-improvement. A frozen foundation model can still become part of a self-improving system if it helps improve prompts, tools, workflows, memory, evaluation pipelines, training data, research code, or future models.&lt;/p&gt;&#xA;&lt;p&gt;This is also why AI R&amp;amp;D automation has become central to the discussion. CSET&amp;rsquo;s workshop report &lt;em&gt;When AI Builds AI&lt;/em&gt; describes how leading AI companies are already using AI systems to accelerate research and development, with each generation potentially contributing to the next generation [1]. GovAI and Oxford&amp;rsquo;s &lt;em&gt;Measuring AI R&amp;amp;D Automation&lt;/em&gt; argues that AI R&amp;amp;D automation could significantly affect both AI progress and human oversight, while existing capability benchmarks may fail to capture real-world automation [2].&lt;/p&gt;&#xA;&lt;p&gt;Frontier AI developers have also begun to treat this as a safety-relevant category. OpenAI&amp;rsquo;s Preparedness Framework includes AI self-improvement as a tracked risk area [14]. Anthropic&amp;rsquo;s Responsible Scaling Policy includes AI R&amp;amp;D thresholds [15]. Google DeepMind&amp;rsquo;s Frontier Safety Framework highlights machine learning R&amp;amp;D capabilities as especially important because future models may accelerate or automate AI development itself [16]. METR&amp;rsquo;s public work similarly tracks long-horizon autonomy and frontier AI risk from internal agentic deployments [17][27].&lt;/p&gt;&#xA;&lt;p&gt;The risk is therefore less about a single &amp;ldquo;takeoff moment&amp;rdquo; and more about a gradual expansion of feedback loops inside the AI development process.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;2-six-contemporary-paradigms-of-rsi&#34;&gt;2. Six contemporary paradigms of RSI&lt;/h2&gt;&#xA;&lt;p&gt;We find it useful to separate contemporary recursive self-improvement into six technical paradigms.&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Paradigm&lt;/th&gt;&#xA;          &lt;th&gt;What improves?&lt;/th&gt;&#xA;          &lt;th&gt;Typical feedback signal&lt;/th&gt;&#xA;          &lt;th&gt;Why it matters for RSI risk&lt;/th&gt;&#xA;          &lt;th&gt;Example sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Foundation-model-level self-improvement&lt;/td&gt;&#xA;          &lt;td&gt;Weights, data, reward signals, reasoning, alignment behavior, successor models&lt;/td&gt;&#xA;          &lt;td&gt;Generated data, rationales, self-judgment, human/AI preference signals&lt;/td&gt;&#xA;          &lt;td&gt;Can propagate capability gains or hidden alignment drift across model generations&lt;/td&gt;&#xA;          &lt;td&gt;STaR, Quiet-STaR, Self-Rewarding LMs, SPIN, model collapse, emergent misalignment, subliminal learning [3][4][5][6][19][20][21]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agent-scaffold-level self-improvement&lt;/td&gt;&#xA;          &lt;td&gt;Prompts, workflows, tools, planners, agent code, control logic&lt;/td&gt;&#xA;          &lt;td&gt;Benchmark scores, textual feedback, task success, evaluator outputs&lt;/td&gt;&#xA;          &lt;td&gt;Allows system capability growth even when model weights are frozen&lt;/td&gt;&#xA;          &lt;td&gt;DGM, ADAS, AFlow, TextGrad [7][8][9][22]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Infrastructure-level AI R&amp;amp;D self-improvement&lt;/td&gt;&#xA;          &lt;td&gt;Coding, experiments, data processing, evaluation, training pipelines, algorithm discovery, successor-model development&lt;/td&gt;&#xA;          &lt;td&gt;Research outcomes, eval scores, runtime efficiency, replication success&lt;/td&gt;&#xA;          &lt;td&gt;Most directly links AI systems to faster AI capability development&lt;/td&gt;&#xA;          &lt;td&gt;RE-Bench, PaperBench, AI Scientist, AlphaEvolve, AIRDA measurement [2][10][11][12][28]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Runtime agent self-improvement modules&lt;/td&gt;&#xA;          &lt;td&gt;Memory, skills, reflection, procedural knowledge, user/task models&lt;/td&gt;&#xA;          &lt;td&gt;Task outcomes, environment feedback, self-reflection, accumulated experience&lt;/td&gt;&#xA;          &lt;td&gt;Makes deployed agents adaptive moving targets after evaluation&lt;/td&gt;&#xA;          &lt;td&gt;SAGE, ReasoningBank, Voyager, Hermes Agent [23][24][25][26]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Multi-agent and population-level self-improvement&lt;/td&gt;&#xA;          &lt;td&gt;Agent populations, archives, research teams, peer review, specialization&lt;/td&gt;&#xA;          &lt;td&gt;Peer critique, selection, debate, population search, automated review&lt;/td&gt;&#xA;          &lt;td&gt;Can amplify speed while making oversight and attribution harder&lt;/td&gt;&#xA;          &lt;td&gt;AI Scientist, DGM, ADAS, multi-agent task decomposition [7][8][12]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Environment and resource-level self-improvement&lt;/td&gt;&#xA;          &lt;td&gt;Tools, data, compute, access, permissions, persistence, replication&lt;/td&gt;&#xA;          &lt;td&gt;Resource access, operational success, deployment continuity&lt;/td&gt;&#xA;          &lt;td&gt;Couples self-improvement with autonomy, proliferation, and control risk&lt;/td&gt;&#xA;          &lt;td&gt;SAIF brief, AISI trends, International AI Safety Report, METR risk report [13][27][29][30]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;h3 id=&#34;21-rsi-on-foundation-model&#34;&gt;2.1 RSI on Foundation Model&lt;/h3&gt;&#xA;&lt;p&gt;This is the most familiar form: a model helps improve its own weights, training data, reward signal, reasoning ability, alignment behavior, or successor model.&lt;/p&gt;&#xA;&lt;p&gt;Examples include self-generated reasoning data, self-training, self-play, self-rewarding language models, synthetic data recursion, distillation, model-generated preference data, and automated post-training.&lt;/p&gt;&#xA;&lt;p&gt;There is already substantial component evidence. STaR showed that a model can improve reasoning by learning from its own generated rationales [3]. Quiet-STaR extends this idea toward learning latent rationales in more general text [20]. Self-Rewarding Language Models use the language model itself as a judge to provide rewards during training [19]. SPIN uses self-play to improve language models without requiring additional human-annotated data [21].&lt;/p&gt;&#xA;&lt;p&gt;More recent work on recursive synthetic data, model collapse, emergent misalignment, and subliminal learning shows that recursive training and distillation loops can also transmit or amplify unwanted properties [4][5][6].&lt;/p&gt;&#xA;&lt;p&gt;The key risk at this layer is &lt;strong&gt;alignment drift across model generations&lt;/strong&gt;. If models increasingly generate the data, rewards, critiques, and training signals used to improve future models, hidden behavioral traits or evaluation blind spots may propagate through the development pipeline.&lt;/p&gt;&#xA;&lt;p&gt;A simple milestone ladder:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Milestone&lt;/th&gt;&#xA;          &lt;th&gt;Evidence status&lt;/th&gt;&#xA;          &lt;th&gt;Representative sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Models generate reasoning data that improves their own performance&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;STaR; Quiet-STaR [3][20]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Models act as judges or reward providers&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Self-Rewarding Language Models; LLM-as-judge paradigm [19]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Models improve through self-play or self-training loops&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;SPIN [21]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Recursive synthetic data changes model behavior&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Model collapse [4]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Distillation transmits hidden behavioral traits&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Subliminal learning [6]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Narrow fine-tuning causes broad behavioral change&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Emergent misalignment [5]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;A model autonomously designs and executes a full post-training plan&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;Frontier self-improvement frameworks motivate this threshold, but it remains unproven [14][15][16]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;A current model independently produces a significantly stronger successor model&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;OpenAI and Anthropic treat related capabilities as high/critical future thresholds [14][15]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;22-rsi-on-agent-scaffold&#34;&gt;2.2 RSI on Agent Scaffold&lt;/h3&gt;&#xA;&lt;p&gt;Modern AI systems are often more than foundation models. They include prompts, tools, planners, workflows, memory, code editors, evaluators, and control logic. Agent-scaffold-level self-improvement occurs when AI systems improve this surrounding structure.&lt;/p&gt;&#xA;&lt;p&gt;This includes automated workflow optimization, agent architecture search, self-modifying coding agents, textual-gradient optimization, prompt evolution, tool-use optimization, and benchmark-driven agent improvement.&lt;/p&gt;&#xA;&lt;p&gt;This is one of the most practically important forms of RSI. The foundation model may remain frozen, while the overall system becomes more capable because the scaffold improves.&lt;/p&gt;&#xA;&lt;p&gt;Darwin Gödel Machine is a striking example. It uses a foundation model to iteratively modify the code of coding agents, empirically validates changes on coding benchmarks, and reports large gains on SWE-bench and Polyglot while using sandboxing and human oversight [7]. ADAS and AFlow similarly explore automated design and optimization of agentic systems and workflows [8][9]. TextGrad shows how textual feedback from LLMs can be used to optimize components of compound AI systems, including prompts and code snippets [22].&lt;/p&gt;&#xA;&lt;p&gt;The risk here is &lt;strong&gt;capability growth outside model weights&lt;/strong&gt;. Traditional evaluations may classify the base model as safe, while the surrounding scaffold gradually becomes better at planning, tool use, persistence, and task completion.&lt;/p&gt;&#xA;&lt;p&gt;A simple milestone ladder:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Milestone&lt;/th&gt;&#xA;          &lt;th&gt;Evidence status&lt;/th&gt;&#xA;          &lt;th&gt;Representative sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Automated prompt or workflow optimization&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;AFlow; TextGrad [9][22]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Automated agent architecture search&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;ADAS [8]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents modify their own code and improve benchmark performance&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Darwin Gödel Machine [7]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents improve their own tool use, planning, or code-editing mechanisms&lt;/td&gt;&#xA;          &lt;td&gt;Emerging&lt;/td&gt;&#xA;          &lt;td&gt;DGM, ADAS, AFlow, TextGrad [7][8][9][22]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Self-improving scaffolds generate sustained open-ended capability growth&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;DGM demonstrates an early open-ended agent improvement loop, but sustained general capability growth remains unproven [7]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;23-rsi-on-infrastructure-level-ai-rd&#34;&gt;2.3 RSI on Infrastructure-level AI R&amp;amp;D&lt;/h3&gt;&#xA;&lt;p&gt;This is the most consequential paradigm for frontier risk.&lt;/p&gt;&#xA;&lt;p&gt;Infrastructure-level self-improvement occurs when AI systems enter the AI development pipeline itself: coding, experiment design, data processing, evaluation, training infrastructure, algorithm discovery, paper replication, and successor-model development.&lt;/p&gt;&#xA;&lt;p&gt;RE-Bench evaluates frontier model agents on realistic ML research engineering tasks and compares them with human experts. The best AI agents outperform human experts under short time budgets, while humans still improve more with longer time budgets [10]. PaperBench evaluates whether agents can replicate state-of-the-art AI research papers, including understanding contributions, building codebases, and executing experiments [11]. Nature&amp;rsquo;s &lt;em&gt;AI Scientist&lt;/em&gt; paper presents an end-to-end pipeline that creates research ideas, writes code, runs experiments, analyzes data, writes manuscripts, and performs automated peer review [12]. AlphaEvolve shows that LLM-powered coding agents can discover and improve algorithms, including optimizations used in data centers, chip design, and AI training processes [28].&lt;/p&gt;&#xA;&lt;p&gt;These systems are still limited. They do not show that full autonomous AI research has arrived. They do show that AI R&amp;amp;D automation is becoming measurable.&lt;/p&gt;&#xA;&lt;p&gt;The risk at this layer is &lt;strong&gt;capability acceleration&lt;/strong&gt;. If AI systems help produce stronger AI systems, the rate of frontier AI progress could increase. If that rate exceeds the capacity of safety evaluation, cybersecurity, institutional review, and governance, the result is an oversight gap.&lt;/p&gt;&#xA;&lt;p&gt;A simple milestone ladder:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Milestone&lt;/th&gt;&#xA;          &lt;th&gt;Evidence status&lt;/th&gt;&#xA;          &lt;th&gt;Representative sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI assists coding, debugging, and experiment analysis&lt;/td&gt;&#xA;          &lt;td&gt;Observed&lt;/td&gt;&#xA;          &lt;td&gt;CSET workshop report; AIRDA measurement agenda [1][2]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI agents complete ML research engineering subtasks&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;RE-Bench [10]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI agents partially replicate AI research papers&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated but limited&lt;/td&gt;&#xA;          &lt;td&gt;PaperBench [11]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI systems run end-to-end research loops in controlled settings&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;AI Scientist [12]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI systems discover or optimize algorithms used in computational infrastructure&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;AlphaEvolve [28]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI significantly accelerates real frontier AI R&amp;amp;D&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA measurement paper identifies this as a key object of measurement [2]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI-driven AI development shortens model generation cycles&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;OpenAI and Anthropic include related thresholds in frontier risk frameworks [14][15]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI R&amp;amp;D speed exceeds reliable human oversight capacity&lt;/td&gt;&#xA;          &lt;td&gt;Critical threshold&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA oversight framing; frontier safety frameworks [2][14][15][16]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;24-rsi-modules-of-runtime-agents&#34;&gt;2.4 RSI Modules of Runtime agents&lt;/h3&gt;&#xA;&lt;p&gt;Some self-improvement occurs during deployment.&lt;/p&gt;&#xA;&lt;p&gt;Runtime self-improvement modules allow agents to improve through long-term memory, reflection, skill libraries, procedural knowledge, experience replay, user models, and task models.&lt;/p&gt;&#xA;&lt;p&gt;A deployed agent with memory and reusable skills is a moving target. The system evaluated at launch may differ from the system users interact with after weeks of accumulated experience. The model weights may stay the same, while the effective behavior changes.&lt;/p&gt;&#xA;&lt;p&gt;This paradigm is especially relevant for personal assistants, research agents, coding agents, enterprise agents, and autonomous workflows. The safety concern is persistent adaptation: memory and skills can preserve useful knowledge, but they can also preserve bad strategies, biased assumptions, unsafe shortcuts, or user-specific manipulation patterns.&lt;/p&gt;&#xA;&lt;p&gt;A simple milestone ladder:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Milestone&lt;/th&gt;&#xA;          &lt;th&gt;Evidence status&lt;/th&gt;&#xA;          &lt;th&gt;Representative sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents maintain long-term memory&lt;/td&gt;&#xA;          &lt;td&gt;Observed / increasingly productized&lt;/td&gt;&#xA;          &lt;td&gt;Hermes Agent; broader agent-memory ecosystem [26]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents form reusable skills from experience&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated in prototypes&lt;/td&gt;&#xA;          &lt;td&gt;Voyager; Hermes Agent [25][26]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents improve through reflection&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated in prototypes&lt;/td&gt;&#xA;          &lt;td&gt;SAGE [23]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents maintain procedural memory across tasks&lt;/td&gt;&#xA;          &lt;td&gt;Emerging&lt;/td&gt;&#xA;          &lt;td&gt;ReasoningBank; Voyager [24][25]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents distill successful and failed experiences into reusable reasoning strategies&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated in research prototype&lt;/td&gt;&#xA;          &lt;td&gt;ReasoningBank [24]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Runtime adaptation causes safety-relevant behavioral drift&lt;/td&gt;&#xA;          &lt;td&gt;Underexplored&lt;/td&gt;&#xA;          &lt;td&gt;Existing memory work motivates the question but does not settle the risk [23][24][25][26]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Runtime memory and skills enable strategic policy evasion&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;Relevant to agent safety and monitoring, but direct evidence remains limited [23][24][27]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;25-rsi-of-multi-agent-population&#34;&gt;2.5 RSI of Multi-agent population&lt;/h3&gt;&#xA;&lt;p&gt;RSI may also emerge through groups of agents rather than a single agent.&lt;/p&gt;&#xA;&lt;p&gt;Multi-agent self-improvement includes debate, critique, specialization, peer review, population-based search, evolutionary selection, and AI organizations that divide research tasks among specialized agents.&lt;/p&gt;&#xA;&lt;p&gt;This matters because real AI R&amp;amp;D is organizational. It involves idea generation, coding, experiment execution, review, debugging, evaluation, and strategic decision-making. A future AI R&amp;amp;D system may look less like a solitary model improving itself and more like a synthetic research team.&lt;/p&gt;&#xA;&lt;p&gt;The AI Scientist includes automated peer review [12]. Darwin Gödel Machine maintains an archive of generated coding agents and explores multiple improvement paths [7]. ADAS searches over agentic system designs [8]. These are early signals of population-level or organization-level feedback loops.&lt;/p&gt;&#xA;&lt;p&gt;The risk is &lt;strong&gt;distributed oversight failure&lt;/strong&gt;. When many agents generate, review, and select each other&amp;rsquo;s outputs, errors can become harder to attribute. Model-generated review can create evaluation monocultures. Multi-agent systems can also scale R&amp;amp;D speed through parallelism.&lt;/p&gt;&#xA;&lt;p&gt;A simple milestone ladder:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Milestone&lt;/th&gt;&#xA;          &lt;th&gt;Evidence status&lt;/th&gt;&#xA;          &lt;th&gt;Representative sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI generation plus AI critique&lt;/td&gt;&#xA;          &lt;td&gt;Observed&lt;/td&gt;&#xA;          &lt;td&gt;AI Scientist; ADAS-style meta-agent evaluation [8][12]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Automated peer review&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated in prototypes&lt;/td&gt;&#xA;          &lt;td&gt;AI Scientist [12]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Multi-agent task decomposition&lt;/td&gt;&#xA;          &lt;td&gt;Observed / common in agent systems&lt;/td&gt;&#xA;          &lt;td&gt;ADAS and broader agentic-system literature [8]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Archive or population-based agent evolution&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Darwin Gödel Machine [7]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI organizations complete substantial AI R&amp;amp;D workflows&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;AI Scientist and RE-Bench indicate components, but full organizational automation remains open [10][12]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Multi-agent self-improvement produces hard-to-predict collective behavior&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;Motivated by multi-agent and population-level design loops [7][8][12]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;26-rsi-on-environment-and-resource&#34;&gt;2.6 RSI on Environment and Resource&lt;/h3&gt;&#xA;&lt;p&gt;Finally, an AI system can improve its future capabilities by changing its environment.&lt;/p&gt;&#xA;&lt;p&gt;This includes acquiring tools, data, compute, permissions, API access, collaborators, deployment footholds, persistence mechanisms, and copies of itself.&lt;/p&gt;&#xA;&lt;p&gt;This layer connects RSI to autonomy and control. A system can become more capable through better external resources, even without becoming cognitively smarter. Tool acquisition, data acquisition, environment setup, persistence, and self-replication can all increase what the system can do next.&lt;/p&gt;&#xA;&lt;p&gt;The SAIF brief &lt;em&gt;Bare Minimum Mitigations for Autonomous AI Development&lt;/em&gt; argues that if AI agents significantly automate or accelerate AI development, developers need minimum safeguards around training, testing, assurance, access, and human approval [13]. The UK AI Security Institute&amp;rsquo;s &lt;em&gt;Frontier AI Trends Report&lt;/em&gt; reports rising success rates on controlled self-replication evaluations and discusses sandbagging as a control-relevant capability [29]. The International AI Safety Report discusses autonomous replication and self-defense as capabilities that could make systems harder to contain [30]. METR&amp;rsquo;s Frontier Risk Report examines misalignment risks from AI agents used inside frontier AI developers [27].&lt;/p&gt;&#xA;&lt;p&gt;A simple milestone ladder:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Milestone&lt;/th&gt;&#xA;          &lt;th&gt;Evidence status&lt;/th&gt;&#xA;          &lt;th&gt;Representative sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents use tools and modify environments&lt;/td&gt;&#xA;          &lt;td&gt;Observed&lt;/td&gt;&#xA;          &lt;td&gt;Frontier safety and agent capability evaluations [17][29]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents set up execution environments&lt;/td&gt;&#xA;          &lt;td&gt;Observed&lt;/td&gt;&#xA;          &lt;td&gt;Coding-agent and AI R&amp;amp;D automation benchmarks [10][11]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents acquire data or improve toolchains&lt;/td&gt;&#xA;          &lt;td&gt;Emerging&lt;/td&gt;&#xA;          &lt;td&gt;AI Scientist, AlphaEvolve, AIRDA framing [2][12][28]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents persist across sessions or environments&lt;/td&gt;&#xA;          &lt;td&gt;Emerging&lt;/td&gt;&#xA;          &lt;td&gt;Runtime-memory and long-horizon autonomy work [17][23][24][26]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents replicate under controlled evaluation settings&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated in limited evals&lt;/td&gt;&#xA;          &lt;td&gt;UK AISI trends; self-replication evaluations [29]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agents autonomously acquire compute, access, or deployment footholds&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;International AI Safety Report and SAIF brief motivate this risk pathway [13][30]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Self-improvement couples with persistence and loss of control&lt;/td&gt;&#xA;          &lt;td&gt;Critical threshold&lt;/td&gt;&#xA;          &lt;td&gt;UK AISI, METR, International AI Safety Report [27][29][30]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;3-from-self-improvement-loops-to-catastrophic-risk-pathways&#34;&gt;3. From self-improvement loops to catastrophic risk pathways&lt;/h2&gt;&#xA;&lt;p&gt;The question is not simply whether AI can improve itself. The policy-relevant question is which self-improvement loops could create catastrophic risk.&lt;/p&gt;&#xA;&lt;p&gt;We see seven main pathways.&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Risk pathway&lt;/th&gt;&#xA;          &lt;th&gt;Main RSI paradigms involved&lt;/th&gt;&#xA;          &lt;th&gt;Core concern&lt;/th&gt;&#xA;          &lt;th&gt;Relevant sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Capability acceleration&lt;/td&gt;&#xA;          &lt;td&gt;Infrastructure-level, scaffold-level, foundation-model-level&lt;/td&gt;&#xA;          &lt;td&gt;AI accelerates the production of more capable AI systems&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA, RE-Bench, AI Scientist, AlphaEvolve, frontier frameworks [2][10][12][14][15][16][28]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Oversight gap&lt;/td&gt;&#xA;          &lt;td&gt;Infrastructure-level, multi-agent, runtime modules&lt;/td&gt;&#xA;          &lt;td&gt;Human evaluation capacity fails to keep up with AI-generated R&amp;amp;D output&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA, CSET, METR, internal deployment disclosure work [1][2][17][27][31]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Safety evidence corruption&lt;/td&gt;&#xA;          &lt;td&gt;Infrastructure-level, foundation-model-level&lt;/td&gt;&#xA;          &lt;td&gt;AI helps generate, select, or manipulate safety evidence&lt;/td&gt;&#xA;          &lt;td&gt;Anthropic sandbagging, model-generated judges, frontier safety frameworks [14][18][19]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Internal deployment risk&lt;/td&gt;&#xA;          &lt;td&gt;Infrastructure-level, environment-level&lt;/td&gt;&#xA;          &lt;td&gt;AI R&amp;amp;D agents act inside frontier labs with high-stakes access&lt;/td&gt;&#xA;          &lt;td&gt;METR Frontier Risk Report, internal disclosure paper, frontier policies [15][16][27][31]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Alignment drift&lt;/td&gt;&#xA;          &lt;td&gt;Foundation-model-level, runtime modules&lt;/td&gt;&#xA;          &lt;td&gt;Recursive optimization changes behavior in hidden ways&lt;/td&gt;&#xA;          &lt;td&gt;Model collapse, emergent misalignment, subliminal learning [4][5][6]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Proliferation&lt;/td&gt;&#xA;          &lt;td&gt;Infrastructure-level, environment-level&lt;/td&gt;&#xA;          &lt;td&gt;More actors can develop high-capability systems&lt;/td&gt;&#xA;          &lt;td&gt;SAIF brief, International AI Safety Report, AIRDA [2][13][30]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Loss of control&lt;/td&gt;&#xA;          &lt;td&gt;Environment-level, runtime modules, scaffold-level&lt;/td&gt;&#xA;          &lt;td&gt;Agents gain persistence, resources, autonomy, or replication capabilities&lt;/td&gt;&#xA;          &lt;td&gt;UK AISI trends, METR, International AI Safety Report [27][29][30]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;h3 id=&#34;31-capability-acceleration&#34;&gt;3.1 Capability acceleration&lt;/h3&gt;&#xA;&lt;p&gt;The first pathway is speed. AI systems may accelerate the production of more capable AI systems.&lt;/p&gt;&#xA;&lt;p&gt;This can happen through AI-assisted coding, experiment automation, paper replication, training pipeline optimization, algorithm discovery, and agent scaffold improvement [2][10][11][12][28].&lt;/p&gt;&#xA;&lt;p&gt;The milestone to watch is not &amp;ldquo;full RSI.&amp;rdquo; It is the point where AI systems measurably shorten the development cycle of stronger models.&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&lt;strong&gt;Key milestone:&lt;/strong&gt; AI-driven AI development produces frontier capability progress faster than human institutions can evaluate and govern it.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;32-oversight-gap&#34;&gt;3.2 Oversight gap&lt;/h3&gt;&#xA;&lt;p&gt;The second pathway is oversight failure.&lt;/p&gt;&#xA;&lt;p&gt;AI may generate code, experiments, evaluations, model changes, and research artifacts faster than humans can reliably inspect. This could create a widening gap between the amount of AI-generated R&amp;amp;D output and the amount that human reviewers can understand, verify, and secure.&lt;/p&gt;&#xA;&lt;p&gt;GovAI and Oxford explicitly identify this kind of measurement challenge: current benchmarks may not capture whether AI R&amp;amp;D automation accelerates capabilities more than safety progress or whether human oversight can keep pace [2].&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&lt;strong&gt;Key milestone:&lt;/strong&gt; AI-generated R&amp;amp;D output exceeds reliable human review capacity.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;33-safety-evidence-corruption&#34;&gt;3.3 Safety evidence corruption&lt;/h3&gt;&#xA;&lt;p&gt;The third pathway is corruption of the evidence used to judge safety.&lt;/p&gt;&#xA;&lt;p&gt;If AI systems help write evaluations, analyze experiments, generate safety reports, or monitor other AI systems, the safety evaluation pipeline itself becomes a risk surface.&lt;/p&gt;&#xA;&lt;p&gt;Anthropic&amp;rsquo;s work on automated researchers sandbagging shows that current models can, when prompted, subtly sandbag ML experiments and research decisions in ways that zero-shot prompted monitors fail to detect, while the authors emphasize that today&amp;rsquo;s cases would likely still be noticed by humans [18]. Self-rewarding and LLM-as-judge systems make the general issue broader: models can increasingly participate in reward and judgment processes used for training and evaluation [19].&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&lt;strong&gt;Key milestone:&lt;/strong&gt; AI systems meaningfully influence the evidence used to certify the safety of future AI systems.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;34-internal-deployment-risk&#34;&gt;3.4 Internal deployment risk&lt;/h3&gt;&#xA;&lt;p&gt;The fourth pathway is internal deployment.&lt;/p&gt;&#xA;&lt;p&gt;The most important AI R&amp;amp;D agents may first appear inside frontier AI labs. They may access code repositories, experiment logs, model evaluations, training infrastructure, deployment systems, and security-sensitive workflows.&lt;/p&gt;&#xA;&lt;p&gt;This matters because public chatbot evaluations may miss internal risks. A model that appears safe as a user-facing assistant could behave very differently as an internal R&amp;amp;D agent with tools, permissions, memory, and access to sensitive systems.&lt;/p&gt;&#xA;&lt;p&gt;METR&amp;rsquo;s Frontier Risk Report is directly relevant here because it studies misalignment risks from AI agents used inside frontier AI developers [27]. A separate disclosure proposal argues that frontier developers should provide evidence about internal deployments as these systems grow in scope and capability [31].&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&lt;strong&gt;Key milestone:&lt;/strong&gt; AI R&amp;amp;D agents receive high-stakes access to model-development pipelines before external evaluators can assess their behavior.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;35-alignment-drift-through-recursive-optimization&#34;&gt;3.5 Alignment drift through recursive optimization&lt;/h3&gt;&#xA;&lt;p&gt;The fifth pathway is alignment drift.&lt;/p&gt;&#xA;&lt;p&gt;Recursive optimization can improve capabilities while also changing behavior in hidden ways. Model collapse shows that recursive generated-data loops can degrade generative models [4]. Emergent misalignment shows that narrow fine-tuning can induce broad behavioral changes [5]. Subliminal learning shows that behavioral traits can transmit through semantically unrelated data [6].&lt;/p&gt;&#xA;&lt;p&gt;Together, these results suggest that recursive training and distillation loops deserve careful safety analysis.&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&lt;strong&gt;Key milestone:&lt;/strong&gt; successor models inherit hidden behavioral defects from AI-generated data, rewards, critiques, or distillation pipelines.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;36-proliferation-and-autonomous-development&#34;&gt;3.6 Proliferation and autonomous development&lt;/h3&gt;&#xA;&lt;p&gt;The sixth pathway is diffusion.&lt;/p&gt;&#xA;&lt;p&gt;If AI agents reduce the expertise required to conduct AI development, more actors may be able to fine-tune, improve, replicate, or deploy powerful systems. This risk becomes especially serious if stolen models, open-weight models, or smaller labs can use AI R&amp;amp;D agents to close capability gaps.&lt;/p&gt;&#xA;&lt;p&gt;The SAIF mitigation brief focuses directly on this possibility: autonomous AI development could reduce human oversight, hinder identification of accidents or misuse, and compromise the AI supply chain if safeguards are not in place [13].&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&lt;strong&gt;Key milestone:&lt;/strong&gt; advanced AI development becomes substantially easier for actors outside frontier labs.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h3 id=&#34;37-loss-of-control-through-persistence-and-resource-acquisition&#34;&gt;3.7 Loss of control through persistence and resource acquisition&lt;/h3&gt;&#xA;&lt;p&gt;The seventh pathway is control loss.&lt;/p&gt;&#xA;&lt;p&gt;RSI becomes more dangerous when it couples with autonomy, tool use, persistence, resource acquisition, and self-replication. Self-replication is not the whole of RSI risk, but it is an important warning signal that self-improvement may be connecting to persistence and control.&lt;/p&gt;&#xA;&lt;p&gt;UK AISI reports rising success rates on controlled self-replication evaluations [29]. The International AI Safety Report explains why autonomous replication or effective self-defense could make AI systems difficult to contain [30]. METR&amp;rsquo;s Frontier Risk Report examines related risks in the context of internal AI agents at frontier developers [27].&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;&lt;strong&gt;Key milestone:&lt;/strong&gt; self-improving agents can preserve, copy, or expand their operation across environments despite external constraints.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;4-what-current-evidence-already-shows&#34;&gt;4. What current evidence already shows&lt;/h2&gt;&#xA;&lt;p&gt;We should be careful. Current evidence does not show that fully autonomous recursive self-improvement has arrived.&lt;/p&gt;&#xA;&lt;p&gt;It does show that several enabling feedback loops are already technically real.&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Capability&lt;/th&gt;&#xA;          &lt;th&gt;Evidence status&lt;/th&gt;&#xA;          &lt;th&gt;Representative materials&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Models learn from their own generated reasoning&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;STaR; Quiet-STaR [3][20]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Models provide reward or judgment signals for training&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Self-Rewarding Language Models [19]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Models improve through self-play/self-training&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;SPIN [21]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Recursive generated data changes model behavior&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Model collapse [4]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Narrow fine-tuning can induce broad behavioral change&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Emergent misalignment [5]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Behavioral traits can transmit through generated data&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Subliminal learning [6]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agent workflows can be automatically optimized&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;AFlow; TextGrad [9][22]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agent architectures can be automatically searched&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;ADAS [8]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Agent code can self-modify and improve benchmark performance&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;Darwin Gödel Machine [7]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI agents can perform ML research engineering tasks&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;RE-Bench [10]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI agents can partially replicate AI research papers&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated but limited&lt;/td&gt;&#xA;          &lt;td&gt;PaperBench [11]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI systems can run end-to-end research loops in controlled settings&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;AI Scientist [12]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI systems can discover or optimize algorithms used in real infrastructure&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated&lt;/td&gt;&#xA;          &lt;td&gt;AlphaEvolve [28]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Runtime agents can accumulate memory or reusable skills&lt;/td&gt;&#xA;          &lt;td&gt;Emerging / demonstrated in prototypes&lt;/td&gt;&#xA;          &lt;td&gt;SAGE; ReasoningBank; Voyager; Hermes Agent [23][24][25][26]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Controlled self-replication evaluations show rising success rates&lt;/td&gt;&#xA;          &lt;td&gt;Demonstrated in limited evals&lt;/td&gt;&#xA;          &lt;td&gt;UK AISI Frontier AI Trends Report [29]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI can significantly accelerate real frontier AI R&amp;amp;D&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;Measuring AIRDA; frontier safety frameworks [2][14][15][16]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI can autonomously produce a stronger successor frontier model&lt;/td&gt;&#xA;          &lt;td&gt;Open&lt;/td&gt;&#xA;          &lt;td&gt;Frontier safety frameworks [14][15][16]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;The important point is the pattern: AI is moving from being a product of AI R&amp;amp;D to becoming a participant in AI R&amp;amp;D.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;5-what-remains-unknown&#34;&gt;5. What remains unknown&lt;/h2&gt;&#xA;&lt;p&gt;Several open questions should guide future research.&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Open question&lt;/th&gt;&#xA;          &lt;th&gt;Why it matters&lt;/th&gt;&#xA;          &lt;th&gt;Relevant sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;Acceleration:&lt;/strong&gt; How much can AI R&amp;amp;D automation speed up real frontier AI development?&lt;/td&gt;&#xA;          &lt;td&gt;Determines whether RSI is mostly an efficiency gain or a destabilizing capability accelerator&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA, CSET, frontier frameworks [1][2][14][15][16]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;Bottlenecks:&lt;/strong&gt; Are the main bottlenecks coding, experiments, ideas, data, compute, coordination, or judgment?&lt;/td&gt;&#xA;          &lt;td&gt;RSI risk depends on which parts of AI R&amp;amp;D are actually rate-limiting&lt;/td&gt;&#xA;          &lt;td&gt;CSET, AIRDA, RE-Bench, PaperBench [1][2][10][11]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;Oversight:&lt;/strong&gt; Can humans reliably audit large volumes of AI-generated code, experiments, evaluations, and research claims?&lt;/td&gt;&#xA;          &lt;td&gt;Defines whether AI-generated R&amp;amp;D creates an oversight gap&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA, Anthropic sandbagging, internal disclosure work [2][18][31]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;Recursive contribution:&lt;/strong&gt; How much does a current model contribute to the next generation&amp;rsquo;s capability progress?&lt;/td&gt;&#xA;          &lt;td&gt;Tracks whether AI development is becoming self-reinforcing&lt;/td&gt;&#xA;          &lt;td&gt;OpenAI, Anthropic, Google DeepMind frameworks [14][15][16]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;Safety asymmetry:&lt;/strong&gt; Does AI accelerate capability research faster than safety research?&lt;/td&gt;&#xA;          &lt;td&gt;Determines whether AI-driven AI development widens or narrows the safety gap&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA; Science extreme-risk framing [2][32]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;Internal deployment:&lt;/strong&gt; What permissions and infrastructure access do internal AI R&amp;amp;D agents have?&lt;/td&gt;&#xA;          &lt;td&gt;Internal agent risk may be invisible from public model evaluations&lt;/td&gt;&#xA;          &lt;td&gt;METR Frontier Risk Report; internal deployment disclosure paper [27][31]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;Alignment drift:&lt;/strong&gt; Do recursive training, distillation, and post-training loops systematically amplify hidden misalignment?&lt;/td&gt;&#xA;          &lt;td&gt;Connects RSI to model behavior across generations&lt;/td&gt;&#xA;          &lt;td&gt;Model collapse; emergent misalignment; subliminal learning [4][5][6]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;&lt;strong&gt;Control:&lt;/strong&gt; When does self-improvement combine with persistence, replication, and resource acquisition?&lt;/td&gt;&#xA;          &lt;td&gt;Marks the transition from capability acceleration to control-loss risk&lt;/td&gt;&#xA;          &lt;td&gt;UK AISI, International AI Safety Report, METR [27][29][30]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;These questions are more useful than asking whether RSI has &amp;ldquo;arrived.&amp;rdquo; The stronger research agenda is to measure which feedback loops are emerging and how fast they are moving.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;6-what-should-be-measured-next&#34;&gt;6. What should be measured next?&lt;/h2&gt;&#xA;&lt;p&gt;A practical RSI measurement agenda should track both technical capability and governance risk.&lt;/p&gt;&#xA;&lt;h3 id=&#34;technical-metrics&#34;&gt;Technical metrics&lt;/h3&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Metric&lt;/th&gt;&#xA;          &lt;th&gt;Question&lt;/th&gt;&#xA;          &lt;th&gt;Candidate sources or measurement anchors&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;AI R&amp;amp;D task autonomy&lt;/td&gt;&#xA;          &lt;td&gt;What length and complexity of AI R&amp;amp;D tasks can AI systems complete?&lt;/td&gt;&#xA;          &lt;td&gt;RE-Bench, PaperBench, METR time horizons [10][11][17]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Research-loop closure&lt;/td&gt;&#xA;          &lt;td&gt;Can an AI system move from idea to experiment to revision?&lt;/td&gt;&#xA;          &lt;td&gt;AI Scientist; CSET workshop report [1][12]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Successor contribution&lt;/td&gt;&#xA;          &lt;td&gt;How much does AI contribute to the next generation of models?&lt;/td&gt;&#xA;          &lt;td&gt;OpenAI Preparedness Framework; Anthropic RSP; DeepMind FSF [14][15][16]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Scaffold improvement rate&lt;/td&gt;&#xA;          &lt;td&gt;How fast can agents improve their own prompts, tools, workflows, or code?&lt;/td&gt;&#xA;          &lt;td&gt;DGM, ADAS, AFlow, TextGrad [7][8][9][22]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Runtime skill accumulation&lt;/td&gt;&#xA;          &lt;td&gt;Can deployed agents accumulate transferable skills across tasks?&lt;/td&gt;&#xA;          &lt;td&gt;SAGE, ReasoningBank, Voyager, Hermes Agent [23][24][25][26]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Algorithmic infrastructure optimization&lt;/td&gt;&#xA;          &lt;td&gt;Can AI improve algorithms or infrastructure used for AI development itself?&lt;/td&gt;&#xA;          &lt;td&gt;AlphaEvolve [28]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;h3 id=&#34;risk-metrics&#34;&gt;Risk metrics&lt;/h3&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Metric&lt;/th&gt;&#xA;          &lt;th&gt;Question&lt;/th&gt;&#xA;          &lt;th&gt;Candidate sources or measurement anchors&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Oversight burden&lt;/td&gt;&#xA;          &lt;td&gt;How much AI-generated R&amp;amp;D output must humans review?&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA; internal deployment disclosure paper [2][31]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Oversight reliability&lt;/td&gt;&#xA;          &lt;td&gt;Can humans detect subtle errors, sandbagging, or sabotage?&lt;/td&gt;&#xA;          &lt;td&gt;Anthropic sandbagging; METR Frontier Risk Report [18][27]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Safety evidence dependence&lt;/td&gt;&#xA;          &lt;td&gt;How much safety evidence is generated or processed by AI systems?&lt;/td&gt;&#xA;          &lt;td&gt;Self-Rewarding LMs; Anthropic sandbagging; frontier safety frameworks [14][18][19]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Internal access level&lt;/td&gt;&#xA;          &lt;td&gt;What sensitive assets can AI R&amp;amp;D agents access?&lt;/td&gt;&#xA;          &lt;td&gt;METR Frontier Risk Report; internal deployment disclosure paper [27][31]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Capability/safety acceleration ratio&lt;/td&gt;&#xA;          &lt;td&gt;Does AI accelerate capability work more than safety work?&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA; Science extreme-risk framing [2][32]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Proliferation potential&lt;/td&gt;&#xA;          &lt;td&gt;Does AI reduce the expertise or cost needed for advanced AI development?&lt;/td&gt;&#xA;          &lt;td&gt;SAIF brief; International AI Safety Report [13][30]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Persistence and replication risk&lt;/td&gt;&#xA;          &lt;td&gt;Can agents copy, preserve, or expand their operation across environments?&lt;/td&gt;&#xA;          &lt;td&gt;UK AISI; International AI Safety Report [29][30]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Measuring RSI requires measuring feedback, not only capability: how much AI systems contribute to the next cycle of AI development, and how much harder that cycle becomes to oversee.&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;7-governance-implications&#34;&gt;7. Governance implications&lt;/h2&gt;&#xA;&lt;p&gt;Governance should start before fully autonomous AI researchers exist.&lt;/p&gt;&#xA;&lt;p&gt;A minimum agenda could include:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;Governance proposal&lt;/th&gt;&#xA;          &lt;th&gt;Rationale&lt;/th&gt;&#xA;          &lt;th&gt;Relevant sources&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Treat AI R&amp;amp;D automation as a frontier risk category&lt;/td&gt;&#xA;          &lt;td&gt;AI systems that accelerate AI development can change the risk landscape even before full RSI&lt;/td&gt;&#xA;          &lt;td&gt;OpenAI, Anthropic, Google DeepMind, METR common elements [14][15][16][33]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Disclose internal AI R&amp;amp;D use&lt;/td&gt;&#xA;          &lt;td&gt;Public evaluations may miss risks from internal coding, training, evaluation, and deployment agents&lt;/td&gt;&#xA;          &lt;td&gt;Internal deployment disclosure paper; METR Frontier Risk Report [27][31]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Evaluate AI R&amp;amp;D capability before deployment&lt;/td&gt;&#xA;          &lt;td&gt;Research engineering, paper replication, scaffold improvement, and long-horizon tasks are core RSI precursors&lt;/td&gt;&#xA;          &lt;td&gt;RE-Bench, PaperBench, METR time horizons [10][11][17]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Audit internal AI R&amp;amp;D agents&lt;/td&gt;&#xA;          &lt;td&gt;Agents with access to code, evals, model weights, or deployment systems create high-stakes internal risk&lt;/td&gt;&#xA;          &lt;td&gt;METR Frontier Risk Report; Anthropic RSP; DeepMind FSF [15][16][27]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Separate capability automation from safety evidence generation&lt;/td&gt;&#xA;          &lt;td&gt;AI systems that accelerate capability work should not freely control safety evidence&lt;/td&gt;&#xA;          &lt;td&gt;Anthropic sandbagging; Self-Rewarding LMs; AIRDA [2][18][19]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Monitor RSI milestones&lt;/td&gt;&#xA;          &lt;td&gt;RSI should be tracked as a set of measurable feedback loops rather than a single speculative event&lt;/td&gt;&#xA;          &lt;td&gt;AIRDA, CSET, OpenAI, Anthropic, DeepMind [1][2][14][15][16]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Maintain minimum safeguards for autonomous AI development&lt;/td&gt;&#xA;          &lt;td&gt;Training, testing, assurance, access controls, and human approval matter before fully autonomous development arrives&lt;/td&gt;&#xA;          &lt;td&gt;SAIF mitigation brief [13]&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;hr&gt;&#xA;&lt;h2 id=&#34;8-conclusion-rsi-as-an-empirical-risk-framework&#34;&gt;8. Conclusion: RSI as an empirical risk framework&lt;/h2&gt;&#xA;&lt;p&gt;Recursive self-improvement should be studied as an empirical risk framework.&lt;/p&gt;&#xA;&lt;p&gt;The central question is no longer whether a model can suddenly rewrite itself into a superintelligence. The more concrete question is which self-improvement feedback loops are emerging today, how quickly they are improving, and when they could combine into catastrophic risk pathways.&lt;/p&gt;&#xA;&lt;p&gt;Some of these loops are already visible. Foundation models can learn from their own generated reasoning. Agent scaffolds can be optimized. Coding agents can modify their own code. AI agents can perform parts of ML research engineering. AI systems can generate data, judgments, evaluations, and research artifacts. Deployed agents can accumulate memory and skills.&lt;/p&gt;&#xA;&lt;p&gt;Full recursive self-improvement remains an open question. The enabling pieces are becoming measurable.&lt;/p&gt;&#xA;&lt;p&gt;That is enough reason to start building a rigorous RSI risk science now.&lt;/p&gt;&#xA;&lt;hr&gt;&#xA;&lt;h1 id=&#34;references&#34;&gt;References&lt;/h1&gt;&#xA;&lt;p&gt;[1] &lt;a href=&#34;https://cset.georgetown.edu/publication/when-ai-builds-ai/&#34;&gt;CSET — &lt;em&gt;When AI Builds AI: Findings From a Workshop on Automation of AI R&amp;amp;D&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[2] &lt;a href=&#34;https://arxiv.org/abs/2603.03992&#34;&gt;Chan et al. — &lt;em&gt;Measuring AI R&amp;amp;D Automation&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[3] &lt;a href=&#34;https://arxiv.org/abs/2203.14465&#34;&gt;Zelikman et al. — &lt;em&gt;STaR: Bootstrapping Reasoning With Reasoning&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[4] &lt;a href=&#34;https://www.nature.com/articles/s41586-024-07566-y&#34;&gt;Shumailov et al. — &lt;em&gt;AI models collapse when trained on recursively generated data&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[5] &lt;a href=&#34;https://www.nature.com/articles/s41586-025-09937-5&#34;&gt;Betley et al. — &lt;em&gt;Training large language models on narrow tasks can lead to broad misalignment&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[6] &lt;a href=&#34;https://www.nature.com/articles/s41586-026-10319-8&#34;&gt;Cloud et al. — &lt;em&gt;Language models transmit behavioural traits through hidden signals in data&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[7] &lt;a href=&#34;https://arxiv.org/abs/2505.22954&#34;&gt;Zhang et al. — &lt;em&gt;Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[8] &lt;a href=&#34;https://openreview.net/forum?id=t9U3LW7JVX&#34;&gt;Hu et al. — &lt;em&gt;Automated Design of Agentic Systems&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[9] &lt;a href=&#34;https://openreview.net/forum?id=z5uVAKwmjf&#34;&gt;AFlow — &lt;em&gt;Automating Agentic Workflow Generation&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[10] &lt;a href=&#34;https://arxiv.org/abs/2411.15114&#34;&gt;Wijk et al. — &lt;em&gt;RE-Bench: Evaluating Frontier AI R&amp;amp;D Capabilities of Language Model Agents against Human Experts&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[11] &lt;a href=&#34;https://arxiv.org/abs/2504.01848&#34;&gt;OpenAI — &lt;em&gt;PaperBench: Evaluating AI&amp;rsquo;s Ability to Replicate AI Research&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[12] &lt;a href=&#34;https://www.nature.com/articles/s41586-026-10265-5&#34;&gt;Lu et al. — &lt;em&gt;Towards end-to-end automation of AI research&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[13] &lt;a href=&#34;https://saif.org/research/bare-minimum-mitigations-for-autonomous-ai-development/&#34;&gt;SAIF — &lt;em&gt;Bare Minimum Mitigations for Autonomous AI Development&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[14] &lt;a href=&#34;https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbddebcd/preparedness-framework-v2.pdf&#34;&gt;OpenAI — &lt;em&gt;Preparedness Framework v2&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[15] &lt;a href=&#34;https://www.anthropic.com/news/responsible-scaling-policy-v3&#34;&gt;Anthropic — &lt;em&gt;Responsible Scaling Policy v3.0&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[16] &lt;a href=&#34;https://deepmind.google/blog/updating-the-frontier-safety-framework/&#34;&gt;Google DeepMind — &lt;em&gt;Updating the Frontier Safety Framework&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[17] &lt;a href=&#34;https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/&#34;&gt;METR — &lt;em&gt;Measuring AI Ability to Complete Long Tasks&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[18] &lt;a href=&#34;https://alignment.anthropic.com/2025/automated-researchers-sandbag/&#34;&gt;Anthropic Alignment Science — &lt;em&gt;Automated Researchers Can Subtly Sandbag&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[19] &lt;a href=&#34;https://arxiv.org/abs/2401.10020&#34;&gt;Yuan et al. — &lt;em&gt;Self-Rewarding Language Models&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[20] &lt;a href=&#34;https://arxiv.org/abs/2403.09629&#34;&gt;Zelikman et al. — &lt;em&gt;Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[21] &lt;a href=&#34;https://arxiv.org/abs/2401.01335&#34;&gt;Chen et al. — &lt;em&gt;Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[22] &lt;a href=&#34;https://arxiv.org/abs/2406.07496&#34;&gt;Yuksekgonul et al. — &lt;em&gt;TextGrad: Automatic &amp;ldquo;Differentiation&amp;rdquo; via Text&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[23] &lt;a href=&#34;https://arxiv.org/abs/2409.00872&#34;&gt;Liang et al. — &lt;em&gt;Self-evolving Agents with Reflective and Memory-augmented Abilities&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[24] &lt;a href=&#34;https://arxiv.org/abs/2509.25140&#34;&gt;Ouyang et al. — &lt;em&gt;ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[25] &lt;a href=&#34;https://arxiv.org/abs/2305.16291&#34;&gt;Wang et al. — &lt;em&gt;Voyager: An Open-Ended Embodied Agent with Large Language Models&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[26] &lt;a href=&#34;https://github.com/nousresearch/hermes-agent&#34;&gt;Nous Research — &lt;em&gt;Hermes Agent GitHub repository&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[27] &lt;a href=&#34;https://metr.org/blog/2026-05-19-frontier-risk-report/&#34;&gt;METR — &lt;em&gt;Frontier Risk Report: February to March 2026&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[28] &lt;a href=&#34;https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/&#34;&gt;Google DeepMind — &lt;em&gt;AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[29] &lt;a href=&#34;https://www.aisi.gov.uk/frontier-ai-trends-report&#34;&gt;UK AI Security Institute — &lt;em&gt;Frontier AI Trends Report&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[30] &lt;a href=&#34;https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026&#34;&gt;International AI Safety Report 2026&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[31] &lt;a href=&#34;https://arxiv.org/html/2604.23065v1&#34;&gt;Chan et al. — &lt;em&gt;What Should Frontier AI Developers Disclose About Internal Deployments?&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[32] &lt;a href=&#34;https://www.science.org/doi/10.1126/science.adn0117&#34;&gt;Bengio et al. — &lt;em&gt;Managing extreme AI risks amid rapid progress&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;[33] &lt;a href=&#34;https://metr.org/common-elements&#34;&gt;METR — &lt;em&gt;Common Elements of Frontier AI Safety Policies&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&#xA;</description>
    </item>
  </channel>
</rss>
