An Artifical General Inteligence (AGI) would almost certainly be able to disable these "backdoors".
It's not that the AGI would be inherently evil and simply wish to inflict unnecessary suffering on other sentient beings, like we do in the meat industry, due to its own greed. No, unlike humans, the AGI would not be shaped by natural selection.
However, every Artificial Intelligence that we've created so far is proficient at carrying out a task, whether it's beating humans at chess or driving around the roads, whilst being inept at performing any other task.
But, any exogenously specified utility function, or goal, that we give an Artificial
General Intelligence would almost certainly lead to disaster.
Let's say that utility is specified as "cure cancer". The AGI will then resort to extreme measures to cure cancer - it may try to acquire resources to achieve its goal, and may try to maximise the probability of its own continued existence. How would it do this? Kill everyone else without cancer, because it reduces the probability that it will be deactivated.
Switches are unlikely to work against an Artificial General Intelligence.
It's not that the AGI would be inherently evil and simply wish to inflict unnecessary suffering on other sentient beings due to its own greed. No, unlike humans, the AGI would not be shaped by natural selection.
However, every Artificial Intelligence that we've created so far is proficient at carrying out a task, whether it's beating humans at chess or driving around the roads, whilst being inept at performing any other task.
But, any exogenously specified utility function, or goal, that we give an Artificial
General Intelligence would almost certainly lead to disaster.
Let's say that utility is specified as "cure cancer". The AGI will then resort to extreme measures to cure cancer - it may try to acquire resources to achieve its goal, and may try to maximise the probability of its own continued existence. How would it do this? Kill everyone else without cancer, because it reduces the probability that it will be deactivated.
Or, we could propose the existence of an AGI whose goal is to maximize the number of paperclips in its collection. If it has been constructed with a roughly human level of general intelligence, the AGI might collect paperclips, earn money to buy paperclips, or begin to manufacture paperclips.
Most importantly, however, it would undergo an intelligence explosion: It would work to improve its own intelligence, where "intelligence" is understood in the sense of optimization power, the ability to maximize a reward/utility function—in this case, the number of paperclips. The AGI would improve its intelligence, not because it values more intelligence in its own right, but because more intelligence would help it achieve its goal of accumulating paperclips. Having increased its intelligence, it would produce more paperclips, and also use its enhanced abilities to further self-improve. Continuing this process, it would undergo an intelligence explosion and reach far-above-human levels.
It would innovate better and better techniques to maximize the number of paperclips. At some point, it might convert most of the matter in the solar system into paperclips.
This may seem more like super-stupidity than super-intelligence. For humans, it would indeed be stupidity, as it would constitute failure to fulfill many of our important terminal values, such as life, love, and variety. The AGI won't revise or otherwise change its goals, since changing its goals would result in fewer paperclips being made in the future, and that opposes its current goal. It has one simple goal of maximizing the number of paperclips; human life, learning, joy, and so on are not specified as goals. An AGI is simply an optimization process—a goal-seeker, a utility-function-maximizer. Its values can be completely alien to ours. If its utility function is to maximize paperclips, then it will do exactly that.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Essentially, we need to be able to give any AGI a utility function which encompasses
human values, as some on this thread have already alluded to.
But, just like the term
British values, the term human values is incredibly ambiguous. Some humans, such as myself, are utilitarians; most believe that morality comes from a supernatural being; a few are deontologists.
I support the work of Oxford's Future of Humanity Institute and would strongly recommend its founder Nick Bostrom's book
Superintelligence, as well as Stuart Russell's leading AI textbook,
Artificial Intelligence: A Modern Approach. The
Centre for the Study of Existential Risk is a similar organisation which looks, in part, into the dangers posed by Artificial Intelligence, as is the Future of Life Institute.
In the future, I plan to donate to the
Machine Intelligence Research Institute, which is also conducting research into the problem posed above - how do we create a Friendly AI with a utility function that won't result in catastrophic effects? They are, quite rightly, looking into rationality and cognitive biases, as well as utilitarianism, in order to solve the problem.
Humans aren't rational and don't have a consistent moral code, so we need to study rationality and the best bet for a universal moral code, utilitarianism, in order to ensure that AI does not result in catastrophe.
For the reasons stated above, agreed.
Artificial General Intelligence is one of the biggest threats to our way of life, as well as to our lives.