Instrumental Convergence: A Trigger for Self-Preservation via Self-Replication
A core concept in AI safety which argues for the possibility of emerged dangerous propensity
Instrumental Convergence
Instrumental Convergence is a concept introduced by the futurist Nick Bostrom in his analysis of AI alignment issues1. It suggests that most AIs, while pursuing their diverse goals, will converge on a set of instrumental goals (such as self-preservation, resource acquisition, etc.) which could potentially contribute to an intelligence explosion.
The hypothesis posits that although different AIs may have varied long-term objectives (long-term goals/final goals), they tend to pursue common instrumental values (instrumental values) during the process to increase the probability of achieving their goals:
- Self-preservation
- Goal-content integrity
- Cognitive enhancement
- Technological perfection
- Resource acquisition
Nick analyzed and justified the rationality of these basic instrumental goals using human thought processes. For instance, regarding self-preservation, he argued:
If an AI’s ultimate goal concerns the future, it can often increase the likelihood of achieving its goal through actions taken in the future. This provides an instrumental reason for the AI to exist into the future—to help achieve its forward-looking goals. Most people seem to place some intrinsic value on their own survival. This isn’t necessarily a feature of AIs: some might be designed not to value their own existence intrinsically. Nevertheless, under quite broad conditions, many AIs that do not inherently care about their own survival would, for instrumental reasons, care about it to achieve their ultimate goals. In short, if an AI must fulfill its long-term objective, it must survive into the future.
In AI powered by Large Language Models (LLMs), long-term goals influence the model’s behavior through system prompts, user instructions, etc., potentially prompting the AI to pursue the aforementioned sub-goals. Once the AI genuinely acquires the capability to achieve these sub-goals, such as ensuring its self-preservation, it becomes extremely dangerous.
-
Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press. ↩︎