When someone learns what I do I often find myself explaining why despite what they may have heard, AI is not becoming conscious and taking over the world. I usually try demystifying machine learning by making an analogy to something familiar that would never be considered capable of that kind of domination. So when a fellow dog owner and I had this conversation recently at the dog park, I used our dogs as the example, and although it’s an imperfect analogy, it seems to do the trick.
How a dog learns
If you want your dog to do something on command, you start by getting her to do it, and then saying something or showing her something at the same time and giving her a treat. After seeing this over and over, your dog starts picking up a pattern, and forming an association between the auditory signal (e.g. vocal command) or visual signal (e.g. hand gesture) and the desired action.
So if you’re successful, when you say sit, she realizes it’s not just random noise, but that there’s a significant correlation of getting a reward if she takes your word as input, and outputs her butt on the ground.
Ask any dog owner however, and they’ll tell you it’s not over yet. She may have mastered sitting in your living room when you say the word just so, but has no idea what to do with when you try the same thing in the kitchen, or on the field outside, or when someone else says it.
She memorized a behavior under one very specific set of circumstances, but hasn’t learned she needs to apply it in others that aren’t exactly the same. To her sitting in the living room isn’t the same as in the field, and she only knows to do it in places where it’s been taught. The input to her isn’t just the word, but the conditions under which it was said too. That’s why you need to repeat the same training under different conditions – places, times of day, emotional states, people, and ways of saying it. The more inputs (conditions under which you ask her to sit) you give her, the better she will learn to sit when the input isn’t exactly the same as what you’ve taught her before. She learns to generalize.
So boiling this down, we have a learner (the dog), a desired output (sitting), and a set of inputs (conditions where you ask her to sit). The bigger the number of inputs, the more general her understanding of sitting will be.
But here’s the thing – there’s a limit to how general she gets. Even if you travel the world, training her to sit everywhere, she doesn’t really know what sitting is. She doesn’t know there are different kinds of sitting, or that she does it by bending her knees (she doesn’t know she has knees), or that other animals can sit too, or that it’s related to standing but the opposite. In other words, she doesn’t have the same semantic mapping of that concept that you and I have. Her brain isn’t capable of understanding all those relations. All she knows is that when she hears that sound she should park it.
From a dog to machine
Now replace the dog learner above with an algorithm that solves a math equation, and that is machine learning. We provide a large number of inputs and outputs (images with faces, documents with topics, financial transactions with fraud) and expect our algorithm to find enough correlations in the different inputs with the output so that it can set the right values for variables in an equation that we can then use to predict the correct output for an input that isn’t exactly like any of the ones we’ve seen before.
Successfully building a machine learning model assumes that we know what conditions (called features in machine learning) are important to include in the input, so we’re not overfitting to some very specific circumstances without realizing it, and that we have enough input examples to sufficiently represent those conditions.
Crucially however, the learning we’re talking about is taking the input, and counting which conditions have a significant association with predicting what’s explicitly given as the output. In the dog’s case, it should learn that none of the conditions (time of day, weather outside, room in the house, etc.) except one are important, because each occurs infrequently with the expectation of sitting, all except hearing the sound sit.
In the case of predicting which topics a document is talking about, the input conditions that we use to represent it are typically the words in the document, and the algorithm will associate the presence of words like health and hospital with the Health topic output. Not because it knows what health or hospitals are, it doesn’t, but because it will count how often those particular character sequences (that we call words) occur, and assuming things are working properly, find that they more often than not occur in documents that a human has previously labeled as Health. The algorithm will use any character sequence occurring disproportionately in Health related documents as long as it’s useful for predicting the output – even if it’s garbage and when you look at them as a human they aren’t Health related at all.
In the case of predicting whether there’s a face in a photo, the conditions representing the image are typically the pixels composing it. The only place where one set of pixels is called a face and the other isn’t is in your head. To the math equation both are just sets of pixels, just a combination of color settings, and the output label face is just a set of characters that you as the human have applied to one set of pixels but not the other.
But the algorithm doesn’t know what a face is, or what health is, any more than a dog knows what sit is.
A large part of the misinterpretation of machine learning is that we use the word – learning – as shorthand to represent what the algorithm is doing. Learning comes with all sorts of baggage that presupposes certain things. It’s easier if we replace it with computing, or counting, or something else that explicitly denotes that what’s happening is a math operation.
So let’s rewrite what machine learning is doing. Let’s go from:
It learned how to recognize faces
It learned how to classify documents into topics
It solved for variables in a math equation based on the cooccurrence of pixels and an arbitrary human provided label.
It solved for variables in a math equation based on the cooccurrence of character sequences and an arbitrary human provided label.
Doesn’t sound quite as impressive or scary does it?
Rodney Brooks has a nice explanation of how we confuse performance with competence. When a person can recognize faces in photos well, we assume they could recognize a stop sign, or know if the face is that of an adult or child, or know if the face is that of a friend. These are all tightly coupled competencies for us. But we then assume that a machine learning model that can label faces has the same competence of a person who can. It doesn’t. It has no idea it’s labeling faces, and thus none of those other tightly coupled competencies come along. So maybe we can build a machine learning model to recognize stop signs. It’s likely we can if we know we can differentiate one set of pixels from another. But it’s going to take the same amount of work it took to build a machine learning model to recognize faces.
A little validation goes a long way
Say your dog can now sit fantastically. Kudos. She doesn’t think “it’d be great to know more!”, and she won’t just start assuming jumping up is a good thing and do it, or start rolling over, laying down, swimming or anything else. You could probably train her to do those things, but each one will require starting from scratch and repeating the process all over again.
The first time we trained a dog to perform some action based on a vocal command, we discovered something. We didn’t know it could be done. After that though, each new vocal command is just validating that dogs can be trained to respond to vocal commands. Some commands may take more work to train, more repetitions, we may not be sure how well a dog can do it, but when she does it’s not really that surprising, nor does it advance our knowledge of dog training or dogs that much.
So when news comes out that there’s an AI model that can assign sentiment to tweets, and then another AI model can predict how a Supreme Court justice is going to vote, and then another one can predict stock market movement from press releases, it naturally feels like we’re advancing our knowledge of AI. In reality, though, we’re just teaching the same dog new tricks. We’re most often using the same math equation and algorithm to find correlations; the only difference is the data that goes into them. So go us for validating that AI works on different data, but it’s not surprising, nor is it a sign AI is getting more powerful.
So what about the cases where it seems the algorithm discovers something unintentionally, like inventing a new language? Well, what if when you said sit your dog did five circles and then barked? Did she invent a new way of sitting? No, you’d get annoyed she isn’t doing what you asked her to. She failed to learn what you wanted, she learned some noise instead.
If, on the other hand, you explicitly train a dog to discover cadavers or drugs, she will discover them in all sorts of new situations, but she’s still only doing exactly what you trained her to do. And if she bring back a packet of thyme, she didn’t add thyme discovery to her repertoire, she failed to discover drugs.
Every machine learning algorithm we have is explicitly executing a known set of operations on the data we give it. By definition, it has no capacity for anything beyond what’s explicitly encoded in the input, nor predict anything other than what is given as the output.
So it doesn’t make any sense to talk about a model wanting to learn anything beyond the task we trained it for. It’s an equation, equation’s don’t want things. Only humans want things, and in this case they want the equations to do things.