Malevolent Artificial Intelligence (Sam Harris is Wrong, Part 4)

The second-ever post on this blog is about the question of malevolent AI. I will now revisit the question with more detail, having had time to refine my thoughts.

I don’t think the question of malevolent AI, more broadly, the “AI alignment problem,” can be simply dismissed. If my view has changed at all since the last post, it is about that statement. At the same time, I believe Sam Harris is too fatalist. Benevolent AI is not impossible. I believe it is just hard.

In the previous post, I gave the paperclip maximizer thought experiment about runaway AI intentions:

You design AI to maximize the production of paperclips. The AI, more intelligence that you can ever imagine, but without human intuition, converts every atom in the universe causes undue havoc – towards the construction of paperclips.

This thought experiment highlights, or should I say, presumes, that an AI will not have what we generally call “common sense.” It seems like an intuitive accusation. How can a computer have common sense?

For example, your home computer is very smart in certain areas, like multiplying large numbers together. But it is very dim-witted, by human standards, in other areas, like questions of intuition.

This is true of all existing computers. However, to say that an AI system will never have common sense is a fallacy of origin. The premise of the AI alignment dilemma is that computers in the future will be inconceivably different than they are now. So why, on the topic of common sense, do we assume that computers will stay the same?

I would even go so far as to say that an artificial general intelligence, an AGI, will by definition have common sense. The jump from mere AI to AGI entails the ability to understand context, infer motivations, and evaluate information holistically. That’s common sense.

Ok, so “common sense” is a property that intelligent computers might have. But who’s to say they will? What if the AI intelligence is just narrowly tailored to creating paperclips, and that’s it?

Well, first of all, I doubt that a machine will be smart enough to destroy human society without also being smart enough to have “common sense.” But the alignment problem still stands: there’s no guarantee that such an AI will care about our morals. It’s a fair concern, but addressing it forces me to get into technical knowledge.

You know, I’ve listened to Sam Harris discuss AI in a lot of his podcast episodes. And so it’s strange, I think, that to the best of my knowledge he has never once used the term “genetic algorithm,” or even “neural network.” Does he know what these concepts are? They’re just, you know, the groundwork of the most sophisticated AI systems. If I sound snarky, remember that this guy has Ted Talks about artificial intelligence (that’s Ted, not TedX), so he’s dressed up as an expert.

Harris’s ignorance is understandable. He’s a philosopher, not an engineer or scientist. He admits as much. His philosophical analysis is fine enough, but he’s speaking authoritatively about what is fundamentally an engineering problem. His ideas aren’t useless, but inherently limited without the computer science knowledge. Harris himself left philosophy (in his academic studies) for neuroscience because of a similar version of this exact problem. He of anyone should appreciate it.

Here is how Harris describes the potential of runaway AI evolution (the singularity): computers make smarter computers, and those make smarter computers, and so on. This is how dilettantes describe runaway AI evolution. This description isn’t that bad, for a non-technical person; it captures the recursive nature of AI evolution. However, this second-order analysis is also not strictly correct.

While in a certain sense, AI algorithms do generate smarter versions of themselves, they currently do so in the same way that parents can give birth to smarter children. See the problem? Mothers don’t “make” their children consciously. Inheritance is semi-random. And their children aren’t necessarily smarter. It’s the environment that does the selection.

We have an AI equivalent, “genetic algorithms,” which mimic the selective pressures of evolution. This and back-propagation are some our strongest tools. Put simply, you can’t just tell a machine to “make a smarter machine.” The reason is simple: a machine doesn’t know what “smart” means. It doesn’t know what intelligence constitutes. The job of breeding more and more intelligent AIs is, in part, figuring out what constitutes intelligence.

Specifically, the type of intelligence we want to optimize for: general intelligence. This is where AIs run into trouble. “General intelligence,” and especially “human-like intelligence” is hard to nail down. In order to properly inculcate it, we need to generate large numbers of tests for that, when the concept itself is quite poorly defined. You could try generating an AI with the sole purpose of judging intelligence, but in order to competently do that, the judge would almost have to be that intelligent already.

IQ tests are fine enough (actually quite good) way to judge humans, but only under the pretense that the people we’re testing are humans in the first place, and display some level of general intelligence already. Everything else can come down to a test of what is essentially pattern-matching.

The most promising way we can train the most advanced AI algorithm is by, more or less, giving it examples of human intelligence, and then setting up systems that automatically judge it on the ability to be similar to the given examples. These examples will need to be continually updated to be more detailed and sophisticated as the AI improves. This will be a largely manual process, so long as humans are the best judges of the type of intelligence they have themselves.

By partaking in this process, we are constantly refreshing a display of what we want the AI to emulate. There will be many opportunities to correct. It is an iterative process, so we will always be able to say, “no, that’s not quite what I meant for you to do, back up, try something else.”

Sam Harris is wrong that, unless we change our approach, we will loose control of intelligent machines before they become smart enough to share our goals. Starting now, and up until the moment the AI has “common sense,” humans will be able to keep the machines aligned with us. I’m not saying it’s easy, but I have confidence that it will be done.