why training a.i. isn’t like training your pets
When we last looked at a paper from the Singularity Institute, it was an interesting work asking if we actually know what we’re really measuring when trying to evaluate intelligence by Dr. Shane Legg. While I found a few points that seemed a little odd to me, the broader point Dr. Legg was perusing was very much valid and there were some equations to consider. However, this paper isn’t exactly representative of most of the things you’ll find coming from the Institute’s fellows. Generally, what you’ll see are spanning philosophical treatises filled with metaphors, trying to make sense out of a technology that either doesn’t really exist and treated as a black box with inputs and outputs, or imagined by the author as a combination of whatever a popular science site reported about new research ideas in computer science. The end result of this process tends to be a lot like this warning about the need to develop a friendly or benevolent artificial intelligence system based on a rather fast and loose set of concepts about what an AI might decide to do and what will drive its decisions.
Usually, when you first work on a project which tries to train a computer to make decisions about items in vast datasets, or drawing conclusions from a training set, then extending these conclusions to evaluate real world data, you don’t start by worrying about how you’re going to reward it or punish it. Those ideas work with a living organism but not a collection of circuits. Evolution created brains which release hormones to make us feel as if we’re floating on a cloud when we take actions necessary for our survival or reproduction, even when those actions are as abstract as getting a promotion at work or making a new friend because the same basic set of reward mechanisms extend into social behavior.
And so, the easiest way to teach an organism is with what’s known as operant conditioning. Wanted behaviors are rewarded, unwanted are punished, and the subject is basically trained to do something based on this feedback. It’s a simple and effective method since you’re not required to communicate the exact details of a task with your subject. Your subject might not even be human, and that’s ok because eventually, after enough trial and error, he’ll get the idea of what he should be doing to avoid a punishment and receive the reward. But while you’re plugging into the existing behavior-consequence circuit of an organism and hijacking it for your goals, no such circuit exists in a machine.
Computers aren’t built with a drive to survive or seek anything, and they’re just as content to sit on a desk and gather dust as they are plowing through gigabytes of data. Though really, content is a bad word since without the chemistry required for emotion, they don’t feel anything. And this is why when creating a training algorithm or an artificial neural network, we focus on algorithm design and eliminating errors by setting bounds to what the computation is supposed to do than in rewarding the computer for a job well done. No reward is needed, just the final output. This is why warning us about the need to program a reward for cooperation into budding AI systems seems rather absurd to say the least.
Sure, a sufficiently advanced AI in charge of crucial logistics and requiring a certain amount of resources to run might decide to stockpile fuel or draw more energy than it needs and outside of its imposed limits. However, it won’t do so because it performs a calculation evaluating how successful it would be in stealing those resources. Instead, its behavior would be due to a bug or a bad sensor deciding that there’s a critical shortage of say, ethanol in some reservoir, and the AI reacting with the decision to pump more ethanol into that reservoir to meet the human-set guidelines. Fix the sensor, or simply override the command and tell it to ignore the sensor, and you stop the feared resource grab. Yes, it’s actually that easy, or at the very least it should be for someone with access to the AI’s dashboard.
There’s so much anthropomorphism in the singularitarians’ concepts for friendly AI, that it seems as if those who write them forget that they’re dealing with plastic, metal, and logic gates rather than a living being with its own needs and wants which may change in strange ways when it starts thinking on its own. It won’t. It will do only what it knows how to do and even something like accepting another task after it ran the previous one has to be coded, otherwise, the entire application runs once and shuts down. This is why web services will have a routine which listens for commands until it’s terminated and directs them to objects which apply human-built logic to the data the service received, then sends back the response after those objects are done, or an error back to the interface. Note how at no point during all this does the programmer send any message praising a web service for doing its job.
If we wanted to reward a computer, we’d need to build an application so it knows how to distinguish praise from scolding and how it should react when it receives either or both depending on where it is in the computational process. Forgoing the standard flow of giving computers discrete tasks and referring them to a logical set of steps on how to carry them out, and trying to develop a system mimicking the uncertainty and conflicting motivations of an organism which may or may not cooperate with us sounds like a project to turn some of our most reliable and valuable tools into liabilities just waiting to backfire on us. Maybe a bit of research into the practical goals of AI development rather than daydreams of moody automatons in the world tomorrow should be a prerequisite before writing a paper on managing AI behavior…