The recent advances in language modeling with GPT-3 got me thinking: at what point does a quantitative change in a machines language generation ability cross a boundary into a qualitative change in our assessment of its intelligence or creativity?
When a sand heap met Eubulides
How many grains of sand can you take from a sand heap until it’s not a heap? Or more personally, how many hairs on your head can you afford to lose before you’re bald, or pounds before you’re thin? Maybe it’s fun to annoy someone by asking one of these Sorites Paradoxes, attributed to the Greek philosopher Eubulides, precisely because they arise when language is imprecise. They expose that words we commonly use without hesitation, like heap, bald, thin, or even intelligent and creative, where we think we know exactly what we mean, actually have boundaries that can be quite vague when you really start to dig into them.
You can think about what’s going on here as a quantitative change: in grains of sand, hair, or weight, leading to a qualitative change that ascribes a property to something, like being a heap, bald, or thin.
[W]e have seen that the alterations of being in general are not only the transition of one magnitude into another, but a transition from quality into quantity and vice versa, a becoming-other which is an interruption of gradualness and the production of something qualitatively different from the reality which preceded it – Hegel
The idea was then taken further by Marx and Engels into the law of passage of quantitative changes into qualitative changes, and finally arrived in the most familiar and widely misattributed form you’ve likely heard:
Quantity has a quality of its own -Various
While it’s not what any of them had in mind, at what point does a quantitative change in a machines language generation ability cross a boundary into a qualitative change in our assessment of its intelligence or creativity?
Language Models and GPT-3
The release of GPT-3 from OpenAI has shown that an incredibly wide variety of language generation applications – from writing fiction to poems to computer code – can be performed by a fairly typical language model scaled up and trained on the largest amount of data yet.
Language models have been used in the NLP community for decades, becoming increasingly more complicated, and relying on more and more data. A language model is a technical term for a mathematical model of language that is produced by an algorithm that uses existing written text to calculate the probabilities of words appearing next to each other, specifically how likely the next word or sequence of words is from a previous sequence of words. After training the language model by computing these probabilities, the model can be used to generate new text: start with a word or phrase as a prompt, and continue calculating the most probable next word for as long as you want.
When built well, they generate syntactically fluent language, although it used to be fairly easy to tell when text was generated from a model – it was clunky, repetitive, and lost coherence within at most a few sentences.
The algorithm used to build GPT-3 is still only trained by predicting the next sequence of words, but it is doing so for a model with 175 billion parameters – several orders of magnitude more than most previous language models – and on a huge amount of data taken directly from the internet (i.e. produced by us); a very impressive engineering feat.
Fluency, fool me once
The most striking aspect of the language produced by GPT-3 is how fluent it is across a variety of genres, how well it stylistically adapts to the given prompt, and how long the coherence of the generated text lasts.
It’s natural to associate the fluency of language with how intelligent the process that generated the language must be. In other words, it’s hard to separate thinking up something to say from being able to say it well. What to say from how to say it. It’s a human bias that helps explain why we’re taken in by a smooth talker before realizing there’s little substance, or vice versa, assume a lack of cognitive capabilities when someone can’t express themselves.
What to say starts by purposefully selecting some concept to represent in language. Whether the concept is an abstract idea in your mind or a spreadsheet table, it is a form of data, and you want to transform it into language as correctly and faithfully as possible. If you express your idea in language well enough to allow the reader to interpret what you’re saying correctly, your language has sufficient adequacy or accuracy.
How to say it comes back to the fluency, whether the language used is understandable, regardless of whatever it is you’re saying. You can write an exceptionally fluent essay on bees, but if you were trying to give someone a quinoa recipe, it’s completely inadequate. A process, whether human or machine, can generate fluent language describing Mars or Elon Musk, and it doesn’t have to have any connection to reality or truth to be comprehensible.
Fluency without adequacy, that’s easy to imagine. Fluency is on the surface, it’s visible. It can be untethered from trying to represent anything specific and still come off fine.
What’s harder to imagine is adequacy without fluency. For me to assess the adequacy of what you’re saying, I need to know that you’re trying to give me a recipe, and not talk about bees. Or I need to trust that whoever (or whatever) wrote the facts about Mars I’m reading knew what they (or it) was talking about. In either case, I need to be able to create an interpretation of the concept and data you’re relaying through language. But in order for me to create an interpretation, you need to first be coherent enough.
Adequacy requires selecting something specific to represent, and being able to compare how well it’s represented. I think that’s why fluency is both easier to artificially manufacture and gives the impression of adequacy. Our cognitive bias is to default to truth. If language is fluent, we understand it; if we understand it, we create an interpretation of what is being said; if we create an interpretation, we assume it’s accurately representing the concept and data it set out to represent. Why else would someone take the time to write it, right? 🙂
Maybe I’m a language model too
When we write or speak, words usually come out of our mouth or our hands without any conscious effort of how they got there. We have an unconscious process for generating the next word, are we similar to a language model in finding the most probable next word from our prior experience with language? Is our ability to write not only fluently but adequately a matter of having several orders of magnitude more parameters in our brains than the current language models, and having seen lots and lots of text?
Certainly the things we say are not always correct, i.e. what we say is not adequate to what we mean. Whether we think it is or not; people make mistakes. I misremember and make things up, how is that different from the language model is doing?
Adaptability is where it’s at
Going one step further, the most impressive part of GPT-3 is likely not the fluency of the language it generates, but the ease with which it can perform different tasks with only a few prompting examples. Most machine learning models are trained to perform a specific, discrete task, like predicting the sentiment of a restaurant review, or answering trivia questions, but GPT-3 has shown an impressive ability to perform many different kinds of language generation without being specifically trained to do so.
Adaptability is a core human trait. We all build models of the world in our minds – of your house, your friends, yourself. You use the model of the world you’ve built from all your prior experience to go into novel situations and make reasonable decisions. Not only do you not forget how to brush your teeth just because the color, size, or shape of the toothbrush changed, but if you have the intent to brush your teeth and there’s no toothbrush around, you can create something that will act like a toothbrush from completely different materials.
Adaptability is closely tied to creativity, the ability to create something new and worthwhile. Adequacy is critical in a legal memo or biography, and it’s relatively easy to judge the adequacy by comparing these fact-based writings to some reality, but what about fiction, poetry, and other forms of creative writing? How useful or measurable is adequacy there? Is fluency sufficient for creativity?
The language produced by even the simplest language models from decades ago can be said to create something new, maybe that’s sufficient to say any such process is being creative, but that doesn’t seem like a satisfactory answer.
If you didn’t know a word of French, but randomly picked words from a French dictionary until you filled 100 pages, and happened to produce a coherent work of fiction, were you being creative? Taken a bit further, if you have a monkey, a typewriter, and infinite time, eventually it will type out any book you can think of, but it’s unlikely you’d call that creative.
Where do we draw the line? It seems like we need to look at the worthwhile aspect of creativity, but how do we measure whether a work of fiction is worthwhile? (that sounds awfully close to asking what the purpose of art is…)
Intent to the rescue?
It seems like the adequacy and creativity questions of language models, including GPT-3, come down to introspection and intent. A typical flow in human conversation can be seen as four steps. First, you intentionally choose what to say (let’s leave free will out of this for now). You start with an intent: a concept or idea of what you want to say. Second, you choose words to transform that intent or concept into language. Third, the listener hears or reader reads the words. Fourth, they interpret the words into a concept in their mind.
You use language for a reason: to transform something conceptual from one form into words. That concept can take the form of a sales report, where your words reference customers, transactions, dollars, profits and losses; or it can be a creative idea for a novel, where you imagine a character, and the words describe a person, their hair color, how they walk, their own thoughts and concepts (I know, meta).
The point is, when you think of words, they represent something in the real world, they refer to objects, whether real or imagined. Words are connected to your other perceptions of the world, and the actions you can take.
When GPT-3 produces sequences of characters, that’s all they are, even though we see them as meaning-carrying words. For GPT-3, the words it produces do not refer to any concept, intent it is trying to represent, or action it is trying to take. There is no concept behind the words. When it produces a poem about Elon Musk on Mars, it has no concept of who Elon Musk is or what Mars or a poem are; no connection to any objects.
Instead of the four steps above, when you read text produced by a language model like GPT-3 it’s different in a very important way. The language model doesn’t have its own intent. It’s not an agent acting in the world. A human has to start by prompting GPT-3 with the seed text. The language model is taking your concept, that you transformed into words, so you’re still doing the first two steps, and continues the second step by generating words that are the most probable to occur next in the sequence.
The human prompter seems more analogous to a teacher prompting an essay topic that the student (GPT-3) needs to write. We as humans are still reading and interpreting a meaning, because for us words actually have meaning and refer to objects, but those references were not intended by the model. The fact we can interpret them is a result of the fluency, not adequacy.
Even for creative writing, there’s a reason why someone wrote a poem or a novel, and one or more concepts they were trying to express. Maybe we need to separate out creativity into the process of introspection, the effort that goes into the proper translation of a concept into language, and the final linguistic expression.
GPT-3 has certainly produced writing that is funny, sarcastic, or makes you think, so it would qualify for the third form of creativity. Since it has no understanding of the words, through no intent of it’s own is it trying to be funny, sarcastic, or make you think. Those are your interpretations, and could even be a result of GPT-3 using large sequence of words it has previously been given directly from people’s writings on the internet.
Many of the examples of its writing are also cherry picked by humans. Maybe it would be unfair to do otherwise, after all, many human attempts at writing fail, but are we then using our human standards and judging, or are we choosing a biased sample produced by a small percentage of monkeys?
The lack of intent to be funny or make you think can similarly be said for writing produced by people, so perhaps the effect on the reader is what matters. If some future language model can produce thousands of novels a day whose storylines and characters resonate with readers and sell, despite the model not having any intent to do so, maybe it will be quaint that I think something critical is missing on the side of the writer.
Intelligence is as intelligence does
It’s clear that with GPT-3’s size and training data it has been able to achieve language generation capabilities that necessitate sharpening some of the questions we need to understand about machine intelligence. It’s taken fluency, adaptability, and perhaps even a form of creativity, to a level we have not seen before in language models. While some qualitative transitions in our interpretations of its writing seem to be justified, it should not be seen as having qualitatively crossed the boundary for the general type of intelligence we associate with people. Without the ability to connect the words it produces to concepts in the world beyond other words, how can it be said to understand, and without understanding what it’s saying, how can something be intelligent?
If in the future this mathematical model of language is coupled with other types of models for vision, action, and other perceptions, we may have something that does have concepts that imbue adequacy to its language. We may also need to be more exact in our definition or what “intelligent” or “intelligence” means, and define different kinds of intelligence. There has certainly been continuous progress in the biological sciences to understand our own and other animal cognitive behaviors, abilities, and limitations. But if the problems of defining precisely what AI is for the last 70 years, and other far simpler seeming terms, like heap, are any indication, precision in our definition may be a moving target. Maybe there’s a range of behavior where it’s truly indeterminate if something is exhibiting intelligence or creativity or not. Or maybe the meaning comes down to how we use the words, and what function they serve in our everyday language. If we think of something as intelligent or creative, then it is.