End of September 2019, a fascinating debate happened spontaneously in the comment section of a Facebook post.
Yann LeCun, Chief AI Scientist at Facebook, had posted an article he had co-written with Tony Zador (an American neuroscientist) entitled Don’t Fear the Terminator - Artificial intelligence never needed to evolve, so it didn’t develop the survival instinct that leads to the impulse to dominate others.
Soon a debate ensued between him, Stuart Russel and Yoshua Bengio.
Yoshua Bengio is a professor at the Department of Computer Science and Operations Research at the Université de Montréal and scientific director of the Montreal Institute for Learning Algorithms (MILA).
Yann LeCun and Yoshua Bengio are often referred to as 2 of the 3 godfathers of modern AI and in particular deeplearning (along with Geoff Hinton). These 3 were awarded together the 2018 Turing Award for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.
Here below are their contributions to the debate. (many other people participated but I didn't include them out of clarity, the full debate can be found here). I've retained some contributions from others that are helpful to understand the debate
What seems missing in all these musings is the impartial arbiter of physics. Let's take the simple:
In a fistfight, exactly what advantage does super-intelligence confer?
Then abstracting it out, paying attention to inviolable physical constraints, what advantage does super-intelligence convey over billions of years of evolution?
We can muse about both these things, but there are likely theorems lurking and that's what's needed.
Physics has a way of setting limits on the power of intelligence.
- First, there is a limit on how much computational power you can pack in a given volume (just because of thermal dissipation).
- Second, there is a limit on communication bandwidth per volume (because of energy) and latency (because or light speed).
Hence there is a limit on the amount of computation per unit volume.
More importantly, the smarter the machine, the larger and more power hungry it will need to be, and the more vulnerable it will be to physical attacks.
I don't think we can predict the behavior of an intelligence that will be several orders of magnitude more advanced than the intelligence of the whole humanity combined.
A virus can't come close to predicting the behavior of your intelligence, which is several orders of magnitude more advanced than the combined intelligence of billions of viruses.
But it can still kill you.
The point is that if we can build a super-intelligent AI that ends up threatening us (for some unforeseen reason), we can build another system, with access to the same amount of resources, whose only purpose will be to disable the first one. It will almost certainly succeed. For the same reason a virus can kill you.
It might even kill you if you get in the way.
- 1. Once the robot has brought you coffee, its self-preservation instinct disappears. You can turn it off.
- 2. One would have to be unbelievably stupid to build open-ended objectives in a super-intelligent (and super-powerful) machine without some safeguard terms in the objective.
- 3. One would have to be rather incompetent not to have a mechanism by which new terms in the objective could be added to prevent previously-unforeseen bad behavior. For humans, we have education and laws to shape our objective functions and complement the hardwired terms built into us by evolution.
- 4. The power of even the most super-intelligent machine is limited by physics, and its size and needs make it vulnerable to physical attacks. No need for much intelligence here. A virus is infinitely less intelligent than you, but it can still kill you.
- 5. A second machine, designed solely to neutralize an evil super-intelligent machine will win every time, if given similar amounts of computing resources (because specialized machines always beat general ones).
- Bottom line: there are lots and lots of ways to protect against badly-designed intelligent machines turned evil.
It's not that I don't understand it. I think it would only be relevant in a fantasy world in which people would be smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards.
Here is the juicy bit from the article where Stuart calls me stupid:
<<Russell took exception to the views of Yann LeCun, who developed the forerunner of the convolutional neural nets used by AlphaGo and is Facebook’s director of A.I. research. LeCun told the BBC that there would be no Ex Machina or Terminator scenarios, because robots would not be built with human drives—hunger, power, reproduction, self-preservation. “Yann LeCun keeps saying that there’s no reason why machines would have any self-preservation instinct,” Russell said. “And it’s simply and mathematically false. I mean, it’s so obvious that a machine will have self-preservation even if you don’t program it in because if you say, ‘Fetch the coffee,’ it can’t fetch the coffee if it’s dead. So if you give it any goal whatsoever, it has a reason to preserve its own existence to achieve that goal. And if you threaten it on your way to getting coffee, it’s going to kill you because any risk to the coffee has to be countered. People have explained this to LeCun in very simple terms.” >>
The point is that behaviors you are concerned about are easily avoidable by simple terms in the objective. In the unlikely event that these safeguards somehow fail, my partial list of escalating solutions (which you seem to find terrifying) is there to prevent a catastrophe. So arguing that emotions of survival etc will inevitably lead to dangerous behavior is completely missing the point.
Yes, but why would we be so stupid as to not include brakes?
- Yann LeCun and Tony Zador argue that humans would be stupid to put in explicit dominance instincts in our AIs.
- Stuart Russell responds that it needs not be explicit but dangerous or immoral behavior may simply arise out of imperfect value alignment and instrumental subgoals set by the machine to achieve its official goals.
- Yann LeCun and Tony Zador respond that we would be stupid not to program the proper 'laws of robotics' to protect humans.
- Stuart Russell is concerned that value alignment is not a solved problem and may be intractable (i.e. there will always remain a gap, and a sufficiently powerful AI could 'exploit' this gap, just like very powerful corporations currently often act legally but immorally).
- Yann LeCun and Tony Zador argue that we could also build defensive military robots designed to only kill regular AIs gone rogue by lack of value alignment.
- Stuart Russell did not explicitly respond to this but I infer from his NRA reference that we could be worse off with these defensive robots because now they have explicit weapons and can also suffer from the value misalignment problem.
hmm. not quite what i'm saying.
We fix our children's hardwired values by teaching them how to behave.
We fix human value misalignment by laws. Laws create extrinsic terms in our objective functions and cause the appearance of instrumental subgoals ("don't steal") in order to avoid punishment. The desire for social acceptance also creates such instrumental subgoals driving good behavior.
We even fix value misalignment for super-human and super-intelligent entities, such as corporations and governments.
This last one occasionally fails, which is a considerably more immediate existential threat than AI.
They have to do with designing (not avoiding) explicit instrumental objectives for entities (e.g. corporations) so that their overall behavior works for the common good. This is a problem of law, economics, policies, ethics, and the problem of controlling complex dynamical systems composed of many agents in interaction.
What is required is a mechanism through which objectives can be changed quickly when issues surface. For example, Facebook stopped maximizing clickthroughs several years ago and stopped using the time spent in the app as a criterion about 2 years ago. It put in place measures to limit the dissemination of clickbait, and it favored content shared by friends rather than directly disseminating content from publishers.
There will be mistakes, no doubt, as with any new technology (early jetliners lost wings, early cars didn't have seat belts, roads didn't have speed limits...).
But I disagree that there is a high risk of accidentally building existential threats to humanity.
Existential threats to humanity have to be explicitly designed as such.
This is very much unlike humans, whose objective can only be shaped through extrinsic objective functions (through education and laws), that indirectly create instrumental sub-objectives ("be nice, don't steal, don't kill, or you will be punished").
Sort of like the discriminator in GANs...
- - designing objectives for super-human entities is not a new problem. Human societies have been doing this through laws (concerning corporations and governments) for millennia.
- - the defensive AI systems designed to protect against rogue AI systems are not akin to the military, they are akin to the police, to law enforcement. Their "jurisdiction" would be strictly AI systems, not humans.
They are all trained *in advance* to optimize an objective, and subsequently execute the task with no regards to the objective, hence with no way to spontaneously deviate from the original behavior.
As of today, as far as I can tell, we do *not* have a good design for an autonomous machine, driven by an objective, capable of coming up with new strategies to optimize this objective in the real world.
We have plenty of those in games and simple simulation. But the learning paradigms are way too inefficient to be practical in the real world.
The Facebook story is unremarkable in that respect: when bad side effects emerge, measures are taken to correct them. Often, these measures eliminate bad actors by directly changing their economic incentive (e.g. removing the economic incentive for clickbaits).
Perhaps we agree on the following:
- (0) not all consequences of a fixed set of incentives can be predicted.
- (1) because of that, objectives functions must be updatable.
- (2) they must be updated to correct bad effect whenever they emerge.
- (3) there should be an easy way to train minor aspects of objective functions through simple interaction (similar to the process of educating children), as opposed to programmatic means.