AGI Doom and the Drake equation
Why those who claim AI will kill humanity are extremely pessimistic and don't do their homework
The topic of the dangers of AI is one that has made me think really hard. I’ve come to the conclusion (shared by many others) that there are different categories of risks. The most frightening of these is the Nick Bostrom story of the recursively intelligent machine with a read-only goal that necessitates doing away with humanity. This narrative is very appealing to certain people because it’s very logical and elegant. It’s also not something to which you can assign zero probability. It’s certainly possible. The question is, what probability should one give it. Or rather, which sets of probabilities over what time periods?
I am on the side of giving it a low chance over the course of the next few decades, equivalent to the risk of being wiped out by an asteroid. This is based on some reasoning along the lines of the Drake equation. Many things would have to be true at the same time for the Bostrom scenario to occur. Some might appear very likely to philosophers like him or Yudkowsky, and less probable to some of us more involved with the practical matters of the current implementations of AI systems. It’s worth remembering that the loudest voices in this conversation either have for-profit agendas (e.g. Sam Altman) or are very invested emotionally in a given scenario (Yudkowsky). If you want to come up with your own assessment, they can’t help you much. You have to understand as much as possible what are the multipliers involved in this Drake equation and draw your own conclusions. I can only list some and give my personal estimates.
The ultimate doom scenario requires the following to be true:
It possible for an intelligent machine to improve itself and reach a superhuman level.
It is possible for this to happen iteratively.
This improvement is not limited by computing power, or at least not limited enough by the computing resources and energy available to the substrate of the machine.
This system will have a goal that it will optimize for, and that it will not deviate from under any circumstances regardless of how intelligent it is. If the system was designed to maximize the number of marbles in the universe, the fact that it’s making itself recursively more intelligent won’t cause it to ever deviate from this simple goal.
This needs to happen so fast that we cannot turn it off (also known as the Foom scenario).
The machine WILL decide that humans are an obstacle towards this maximization goal (either because we are made of matter that it can use, or because we might somehow stop it). Thus, it MUST eliminate humanity (or at least neutralize it).
It’s possible for this machine to do the required scientific research and build the mechanisms to eliminate humanity before we can defend ourselves and before we can stop it.
None of these points have zero probability. The question is, how do you multiply them (*) and come up with an estimate like “I believe there is a 50% chance this will happen in the next 30 years?”
(*) Let’s go with the assumption that they are independent enough, and that they mostly cover the worst case scenario. You may want to formalize this more.
I give point 1 a relatively significant probability. Same for 2. Point 3 is one that I’m skeptical about. Intelligence is expensive and it requires a lot of energy. We don’t know how much. We don’t even know what the scale of possible intelligence is. Perhaps there is a speed of light for intelligence, and it’s not as fast as the actual speed of light in metaphorical terms. What if we quickly run into diminishing returns and the curve flattens sooner than expected?
4 is an odd one. A system that is constantly self-improving has many chances of destabilizing. We have no idea what a preset goal means to such a system, so it’s not clear that it would be preserved as the system changes. It’s perfectly possible that the AGI might go “this goal makes no sense.” We humans are in fact doing this respective to the goals that we evolved for, and that our genes propagate.
5 Is also very questionable. Those of us who understand the current state of the art (language models and GPUs) believe that these systems are very limited when compared to the hypothetical Bostrom / Yudkowsky nightmare scenario. In particular there are two problems that they heavily discount:
One, neural networks (which sound like the brain but really are just huge matrices of numbers that don’t work like human neurons) do well for problems that can be iterated quickly (chess, go) and that have a limited number of options at every turn. The world isn’t like that. The number of options available in an instant is unlimited. These networks don’t have a mechanism to learn from all the things that an intelligent agent could plausibly do. There is no training data for that, other than trial and error in the physical world. You can play Go or chess against yourself as fast as your processing speed, but you have to wait for the world to respond to your poking and prodding. If a system wants to test a hypothesis about physics, psychology or biology, it depends on the time it takes to carry out the experiment. It might be able to build simulations at some point, once it has gathered enough data (which would need to be tested against reality as well). It may have an idea of how to convince person X to do Y, and after 30 minutes of trying it turns out that it didn’t work.
Two, you may notice that the inference in ChatGPT is very slow and costs significant energy. This is a factor of the size of the model. What you don’t notice is that training a model takes months, and cannot be that much faster without breakthroughs in technology that will take a long time. So it’s not like there is an imminent risk of reducing training cycles by two orders of magnitude (a day instead of 100) even if the model size stayed constant.
For 6, assuming that the machine is hyperintelligent and hellbent on some goal, it’s not a given that it needs to do anything about humanity. It may not be threatened by humanity at all, any more than we feel threatened by ants. We are certainly not on a crusade to exterminate ants because there is a nonzero chance that they might evolve into something that could compete with humanity. We know we could, and that’s enough. Of course we might accidentally exterminate them, and the same could not be ruled out in the analogy with computers. But you have to come up with your own odds for this.
As for 7, there are multiple scenarios in which we can stop the machine. There are many steps along the way in which we might see that things are not going as planned. This happened already with Sydney/Bing. We may never give it some crucial abilities it may need in order to be unstoppable. These abilities may be really hard to implement. Suppose we figured out that it is possible to blow up the planet if we built some absurdly expensive machine. Why would we build it? You’d have to make the case that all the mechanisms the AI needs to be unstoppable are easy to build quickly either by us or by the machine itself, and that the machine can seize them under our noses.
If you read the arguments of Bostrom or Yudkowsky, they don’t have good counterpoints to most of the above objections. They either ignore them or brush them off as “just so.” They focus on the fact that each of the requirements is possible, and then there is a logical leap that assigns them high probabilities without much justification.
tl;dr I’m not worried about AGI killing humanity any time soon. I am concerned about humans doing awful things with this technology much more than about the Foom scenario. But this is not a conclusion that I can proselitize. Everyone who wants to come up with their own coefficients needs to understand the current state of the technology and the gaps that would need to be filled in order for the worst scenario to take place. When you read the arguments of people like Yudkowsky and Bostrom, ask yourself: we have costly and energy hungry datacenters running slow models that take months to train, employing thousands of people who barely keep them running. What is the exact sequence of events that would take us to the worst case scenario? Given what I know about technology, energy, engineering, humanity, capitalism and politics, how likely do I think that scenario is compared to all the alternative outcomes I can imagine? In other words, don’t be an AGI doom parrot.