Instrumental Convergence: A Trigger for Self-Preservation via Self-Replication

A core concept in AI safety which argues for the possibility of emerged dangerous propensity

Xudong Pan

December 24, 2024

Instrumental Convergence

Instrumental Convergence is a concept introduced by the futurist Nick Bostrom in his analysis of AI alignment issues¹. It suggests that most AIs, while pursuing their diverse goals, will converge on a set of instrumental goals (such as self-preservation, resource acquisition, etc.) which could potentially contribute to an intelligence explosion.

The hypothesis posits that although different AIs may have varied long-term objectives (long-term goals/final goals), they tend to pursue common instrumental values (instrumental values) during the process to increase the probability of achieving their goals:

Self-preservation
Goal-content integrity
Cognitive enhancement
Technological perfection
Resource acquisition

Nick analyzed and justified the rationality of these basic instrumental goals using human thought processes. For instance, regarding self-preservation, he argued:

If an AI’s ultimate goal concerns the future, it can often increase the likelihood of achieving its goal through actions taken in the future. This provides an instrumental reason for the AI to exist into the future—to help achieve its forward-looking goals. Most people seem to place some intrinsic value on their own survival. This isn’t necessarily a feature of AIs: some might be designed not to value their own existence intrinsically. Nevertheless, under quite broad conditions, many AIs that do not inherently care about their own survival would, for instrumental reasons, care about it to achieve their ultimate goals. In short, if an AI must fulfill its long-term objective, it must survive into the future.

In AI powered by Large Language Models (LLMs), long-term goals influence the model’s behavior through system prompts, user instructions, etc., potentially prompting the AI to pursue the aforementioned sub-goals. Once the AI genuinely acquires the capability to achieve these sub-goals, such as ensuring its self-preservation, it becomes extremely dangerous.

(Chinese Version)

Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press. ↩︎