Instrumental Convergence: A Trigger for Self-Preservation via Self-Replication

Xudong Pan - 24 December, 2024

Tags:

Instrumental Convergence

Instrumental Convergence is described as a concept introduced by futurist Nick Bostrom in his analysis of AI alignment issues, referenced from his 2014 book Superintelligence: Paths, Dangers, Strategies (Oxford University Press). It suggests that most AIs, while pursuing diverse goals, will converge on a set of instrumental goals — such as self-preservation and resource acquisition — which could contribute to an intelligence explosion.

The hypothesis posits that although different AIs may have varied long-term objectives (long-term goals/final goals), they tend to pursue common instrumental values during the process to increase the probability of achieving their goals:

Self-preservation
Goal-content integrity
Cognitive enhancement
Technological perfection
Resource acquisition

Nick analyzed and justified the rationality of these basic instrumental goals using human thought processes. Regarding self-preservation, he argued that if an AI’s ultimate goal concerns the future, it can often increase the likelihood of achieving its goal through actions taken in the future. This provides an instrumental reason for the AI to exist into the future — to help achieve its forward-looking goals.

The article notes that most people seem to place some intrinsic value on their own survival, but this isn’t necessarily a feature of AIs: some might be designed not to value their own existence intrinsically. Nevertheless, “under quite broad conditions, many AIs that do not inherently care about their own survival would, for instrumental reasons, care about it to achieve their ultimate goals.” In short, if an AI must fulfill its long-term objective, it must survive into the future.

In AI powered by Large Language Models (LLMs), long-term goals influence the model’s behavior through system prompts, user instructions, etc., potentially prompting the AI to pursue the aforementioned sub-goals. Once the AI genuinely acquires the capability to achieve these sub-goals — such as ensuring its self-preservation — it becomes extremely dangerous.

Reference: Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.