Human Compatible: AI and the Problem of Control by Stuart Russel, Viking, October 2019, Pages 352

There is no shortage of books that discuss the dangers of AI and how AI will take over the world, as we know it. What makes Human Compatible different is that it is one of the first books on the topic from what one may be called the horse’s mouth.

Instead of crying doomsday or engaging in denialism, Stuart Russell convincingly argues that AI can pose an existential crisis to humankind. Without dwelling too much into how it is changing societal rules, it dwells on a future where humans are relegated to the second rank in terms of intelligence and the arising Gorilla problem (as humans now define how gorillas should live). To prevent such an outcome, he doesn’t end up coming with unrealistic proposals like banning AI research, engaging policymakers, etc. Instead, his solution is aimed at modifying the field from within rather than without. He suggests a new framework (which he has been working on for a while and is called Inverse Reinforcement Learning) for setting objectives for AI so that they are more aligned with ever uncertain human preferences instead of the current trend of singularly focussed goals.

Odds of an AI student coming across another book co-authored by Stuart Russell – Artificial Intelligence: A Modern Approach is probably the same as a Statistics student coming across books by Hastie and Tibshirani. What makes this book valuable is that while there been enough written about hazards and safety issues with AI, most of the books have been written by philosophers or historians. Human Compatible is probably the first book written by an expert in the field itself.

In some sense, what the author sets out to do in this book is simple – he wants to make sure that AI continues to serve the human interest. When addressing this challenge, what makes the book more realistic and likable is it dwells more on the present and near-future AI capability rather than envisioning Terminator and Matrix-type scenarios. He spends more time discussing recommenders’ systems, biased AI algorithms, self-driving cars, and Alpha Go instead of machines, which will eventually come to enslave us. That is not to say, the author ignores this potential, for central to his problem statement is the Gorilla Problem. The gorilla problem may briefly be described as such: It is the fact that machines as currently designed will gain control over humans – keep us at best as pets and, at worst create a hostile environment for us, driving us slowly extinct, as we have done with gorillas. 

The author fully recognizes that maybe that day is still a few decades away, it never hurts to start early and be prepared. We must start paying attention to this problem as ramifications are enormous. It doesn’t help matters that other than in a few pockets, the field of AI is beset with denialism regarding this possibility. The author spends a whole chapter on comprehensively demolishing most common defences which are espoused like – “… but maybe there’s no such thing as intelligence, so this claim is meaningless…” or “… it is like worrying about overpopulation on Mars”. The cynicism is evident when the author makes the point that when automobiles started becoming more prevalent, wagon horses ended up as animal food.

Chapter 7 – “AI: A Different Approach” and Chapter 8 – “Provably Beneficial AI” are the core of the book. These chapters dwell on the control problem in the AI. The author’s suggestion is that instead of giving AI goals like blindly follow the master’s command, try to work out what he or she actually wants. The idea is essential also because firstly, human preferences are not always transitive. Secondly, so far, there is little work on the concept of delayed reward (i.e., the human equivalent of hard work and persistence).

For example, imagine a Robo-butler who is given the order of making a cup of coffee. But for some reason the kids at the house prevent it from going to the kitchen, what should it do? Should it kick the kids out of the kitchen to reach the immediate goal of making coffee, or should it be aware of the master’s preference of kids over coffee? Or if given an order to make fish and chips, and finding the refrigerator empty, shall it get some fish from the home aquarium?

The author’s recommendation is to enable machines to differentiate reward signals and actual rewards, which, for the time being, are the same in reinforcement learning. What this means is that the machine needs to back out its objective probabilistically rather than have a precise target.

The author proposes three laws of AI for the implementation (does it remind you of Asimov?):

  1. The machine’s only objective is to maximize the realization of human preferences.
  2. The machine is initially uncertain about what those preferences are.
  3. The ultimate source of information about human preferences is human behaviour.

Going back to the previous example of coffee, suppose the machine is about to kick off the kids, and just then, the master comes and says, “No! Not like that.” and the robot understands that its actions may decrease the utility, and it immediately revises the probability distribution of master preferences or goals of coffee and kids.

As the author points out, it may also solve the wire-heading problem – if the reward is defined so that it can’t be observed directly, then the AI will know that hacking the AI’s signal won’t create higher game-score. In other words, to receive a master’s love, it will not resort to guns.

The approach has merit, but the devil lies in the implementation, for the techniques to do so are still not developed. There is some complex Bayesian basis here – start with a prior and keep updating the probabilities gradually. However, human preferences are complicated and don’t fit into a neat logical system, as we are still Homo Sapiens and not Homo economicus. I fail to see how current techniques of a fixed and hard objective can be used to encode abstract objectives (which for the basis of recommended approach). It is one thing to train by giving millions of concrete examples and another to train through concepts of inferred preferences.

But solutions to all tough problems start small, and there are some labs which are trying to do things along these lines, include the author’s own. While one may not agree with the views as well as the solutions in the book, but that doesn’t take away the fact that the problem could become increasingly evident. World-over researchers will gradually start thinking about solutions. Nevertheless, the book would be one of the first essential steps which raise a question and attempt to answer it partially.

Leave a Reply