Hi, I'm Tim Tyler - and today I will be discussing the possibility of
constructing a superintelligent agent that doesn't object to being
Firstly an introduction:
Expected utility maximisers
An expected utility maximiser is a theoretical agent which
considers its actions, computes their consequences and then rates the
outcomes according to a utility function. It performs the action
which it thinks is likely to produce the largest utility - and then
iterates this process.
Expected utility maximisation is a general framework for modelling
rational intentional agents.
Self-improving systems are dynamical systems with specified goals that
attempt to improve their ability to reach their goals as time passes.
Future superintelligences are likely to be accurately modelled as
self-improving expected utility maximisers. However, it is not yet
clear what utility function they are likely to use.
Looking at existing synthetic intelligent agents - such as Deep
Blue - it seems possible that their utility functions may be
Getting the utility function right is important. A powerful
superintelligent agent closely resembles a wish-granting genie - but
as in traditional stories, it is necessary to be careful what you wish
For example, consider what happens if a gold-mining agent is
constructed. The resulting agent might then mine the entire planet,
converting it to rubble in its search of gold atoms. Attempts to
switch it off would be strongly resisted - as though the machine is
reasoning that - if it is turned off - production of gold would slow
down - a terrible state of affairs from the point of view of
maximising gold production.
This type of runaway superintelligence is one type of undesirable
outcome that can arise from a careless choice of utility function -
poor choices in this area can lead to negative outcomes.
There are some issues associated with whether various proposed
architectures will allow us full control over a system's utility
function. However, here, I will assume that we will be able to
engineer systems with whatever utility function we choose.
Since poor choice of utility function can have negative consequences
for humanity, there seem to be various options.
One is to get things right the first time - and build a
superintelligence which is human-friendly - and shares human values.
However, that seems likely to prove to be a challenging engineering
project. If that strategy is taken, it seems likely that other
superintelligence construction projects - which are not so choosy
about the exact details of the utility function will materialise
Another is to provide a mechanism for dynamically updating the utility
function to reflect human desires. That poses some technical problems -
since superintelligent agents will naturally resist modifications to
their utility functions. However, this is essentially the solution that
Asimov originally proposed. Asimov's moral robots simply obeyed humans -
and subsequent commands could override earlier ones.
To explore the possibilities in this area, I will consider a simpler
problem here - the problem of whether we can make a superintelligence
that suspends its activities after a specified time, or on request.
If you can stop a superintelligence that is misbehaving, then you can
probably reprogram it and then start it up again - thereby obtaining a
crude version of a machine intelligence with dynamically configurable
Powerful expected utility maximisers naturally resist being turned
off. Being turned off usually eliminates all chances of obtaining
utility in the future - an extremely negative outcome.
However, by careful engineering of the utility function, it is possible
to engineer systems that don't mind being turned off.
Then he describes an objection, which he attributes to Carl Shulman:
[Footage of Steve Omohundro]
I agree with Steve on very many things, but here I think that the
analysis he presents is sloppy. A correctly-constructed intelligent
agent is not likely to conclude it should not switch itself off
because of its doubts - unless it has good evidence for those
doubts. Steve argues that only a small doubt is enough because the
negative utility of switching yourself off incorrectly is so large.
However, that analysis is not correct. The utility associated with
being turned off is actually one of the parameters under control of
the designers. They can configure this so it is dynamically equal to
the expected utility of being switched on - at all times. In such
cases, the machine will not really have much objection to being
Then, Steve describes an possible way of resolving the percieved
[Footage of Steve Omohundro]
Steve then goes on to poke holes in this argument, saying that such
agents would not correctly conclude that they are living in a
low-utility real world and would instead prefer the delusion of a high
utility simulation - drawing an analogy with Cypher wanting to stay in
Hypothetically, if we grant that conclusion, then that does, in fact
allow a resolution of the original "switching off" problem - simply
make the utility associated with being switched off higher than the
utility of concluding that the world is some kind of illusion or
To me, it seems reasonable to expect that such agents will, in fact,
be built in a manner that makes them value real world utility much
more highly than anything they can obtain via a simulation.
As you can see, I find Steve's analysis of this whole issue
So, without further ado, here is my own analysis:
To give my conclusion up front, I think that engineering a
superintelligent machine that can switch itself off at a specified time
is a reasonably tractable problem.
Engineering a machine that doesn't mind being switched off my humans -
and engineering a machine that switches itself off when it has
completed a specified task - are two more closely-related problems
with very similar solutions.
However, there are some issues in this area. Here is my
assesment of them:
The first problem associated with switching such an agent off is
specifying exactly what needs to be switched off to count as the agent
being being in an "off" state. This is the problem of the agent's
Humans have an intuitive sense of their own identity, and the concept
usually deliniates a fleshy sack surrounded by skin. However,
phenotypes extend beyond that - as Richard Dawkins pointed out in his
book, The Extended Phenotype.
For a machine intelligence, the problem is a thorny one. Machines
may construct other machines, and set these to work. They may
sub-contract their activities to other agents. Telling a machine to
turn itself off and then being faced with an army of its minions
and hired help still keen to perform the machine's original task
is an example of how this problem might manifest istelf.
The agent may be actively motivated to perform subcontracting and to
construct an army of minions. Such activities may magnify its powers.
Also, before such an agent is switched off it may well naturally want
to avoid being switched off - and so might exhibit tendencies to
distribute its sensors and actuators widely by subcontracting tasks
out or similar - as a defense against its centre of operations
Next, there is the meaning of the term "off". Assuming that we can
specify the spatial and temporal extent of the agent, there remains
the problem of what constitutes an "off" state. One intuitive way of
doing that is to assert that the machine must not take any positive
actions after a specified date. Apart from the issue of what
constitutes a "positive action", there is a problem with this - namely
that this is likely to lead to permanent destruction of the machine -
as it makes sure that it will stay permanently switched off.
If the machine is destroyed, there may be side effects of that.
To give an example, side effects might arise if the machine uses a
nuclear bomb to effect its own destruction. Also, a destroyed machine
cannot be recycled and reused.
Finally, there a dilemma - concerning whether such agents should
look ahead to beyond their own switch-off date. Preferences concerning
the state of the world after they are switched off may motivate an
agent to micromanage that subsequent state - e.g. by constructing
minions or by subcontracting. In other words, it motivates the machine
to continue its operations after it is supposed to be switched off.
However, the alternative has disadvantages as well. If there is a
future time after which the agent's preferences are not considered
that simply provides a point in time beyond which problems can be
concealed. For example, if an agent does not value anything that
happens after the year 2100, it will not be properly concerned about
protecting that future environment from the effects of its waste
This can be illustrated by an analogy with the antagonistic
pleiotropy theory of aging. In that theory, there is a selection
pressure to delay the date of expression of deleterious genes - which
ultimately results in organisms exhibiting senescence. Similarly, if a
superintelligent agent can obtain utility by putting its environmental
problems beyond a future barrier which it cannot see beyond, that is
probably what it will do.
Of course, this dilemma only applies to the case when the machine has
some idea of when it will be turned off. If the switch-off date is
down to the whim of humans, it may not have the option of not
considering the future beyond a specified point in time.
How can this list of problems be addressed?
One thing that might help is to put the agent into a quiescent state
before being switched off. In the quiescent state, utility depends on
not taking any of its previous utility-producing actions. This helps
to motivate the machine to ensure subcontractors and minions can be
told to cease and desist. If the agent is doing nothing when it is
switched off, hopefully, it will continue to do nothing.
Problems with the agent's sense of identity can be partly addressed
by making sure that it has a good sense of identity. If it makes
minions, it should count them as somatic tissue, and ensure they
are switched off as well. Subcontractors should not be "switched
off" - but should be tracked and told to desist - and so on.
Problems with the definition of the off state can be partly addressed
by laboriously specifying what constitutes an off state.
Lastly, make sure the machine is not left running for too long without
being turned off and then inspected and reviewed.
Such steps would not remove all risk of a runaway scenario
materialising - but they should be pretty effective.
I think this analysis shows that many concerns over runaway intelligent
machines will prove to be relatively easily avoidable - assuming that
we choose to prioritise safety.
One problem will be that there will be a strong motivation not to
regularly turn off the superintelligent agents - because of how useful