I am deeply skeptical of the current resource allocation surrounding the current crop of so called generative AI.
Its not that I am an AI skeptic. I not only use AI daily and find it incredibly useful, I am even what you might call an AI maximalist. I believe that transformer based neural networks will be instrumental in revolutionizing many industries and will foment a fundamental change in modern societies on a scale seldom seen in human history.
But I also believe that we are mostly on the wrong track, and that careless language combined with lazy thinking is at least partially at fault.
I posit that there are fundamental misconceptions about the utility and capability of transformer networks that are unfortunately guiding billions, if not trillions of dollars of investment, and countless lost hours of some of our worlds brightest people devoted to projects that will fail spectacularly to deliver on the promise of their inception.
We are building wrong. We are training wrong. We are thinking wrong, and we are speaking wrong.
First, lets address one of the problems we are trying to solve, that cannot be solved, because its a feature, not a bug. “Hallucinations.” The word "hallucination" has its origins in the 17th century, derived from the Latin word hallucinatus, which comes from alucinari, meaning "to wander in the mind”.
We have come to call any seemingly nonsensical, counterfactual, or otherwise useless or misguided output from LLM’s “Hallucination”. But in this context, “hallucination” is not a malfunction, error, or any other kind of unusual operation of the LLM. The model, in fact, is doing exactly as it was built to do, and the objective quality of its output is exactly the same as when it is producing an output that is deemed to be useful.
If the un-useful output is a hallucination, then every output from the model is also a hallucination. To call it a “hallucination” is to anthropomorphize the model, and relate to it as if it were a human, where mental disfunction is the likely explanation for clearly counterfactual or nonsensical thought.
So what -can- we do about undesirable LLM outputs? We can train more accurate models, where there is more data and context included in the training set. We can train for accuracy rather than than “cleverness”. We can reduce the random perturbations of the model in generating text. Nonetheless, models will still sometimes produce suboptimal predictions, so we can have them double check their outputs against external sources, iterate over outputs to create a “thought-like” self dialogue, or any number of other post-output fixes which may be effective at verifying the usefulness of the syntactic output.
Nothing we can do will absolutely ensure that the output is accurate or useful. Ultimately, that will need to be decided by the interactions of the output with the external world. This mismatch of expectations stems from a deeper fundamental misunderstanding of the nature of LLMs, and of transformer networks in general.
“Generative AI” is not generative.
LLMs are fundamentally not creative agents. Somehow, we collectively lost sight of the fact that transformer architecture AI is a fundamentally extractive process for identifying, mining, and applying the semantic relationships in large data sets.
When we ask an LLM why the sky is blue, what we are really asking is rather “what would be the most likely explanation for why the sky is blue to appear in the training set?” If the data we are asking about is actually in the training data set, it is probable that the LLM will give an explanation derived directly from that semantic pattern, alloyed to our question. If an associated theory or fact is not within the dataset, the LLM is likely to produce an answer that is semantically similar to extant answers to semantically similar questions. Either way, the encoded data within the model is being alloyed with our question to derive a set of words and meanings that plausibly constitutes a plausible continuation of {<system prompt> “why is the sky blue?…….”}.
It worth noting that even if this were as limited in utility as this explanation might imply, it would still be an enormous step forward in language processing over what was possible pre-GPT.
Because human cultural data contains a huge amount of inferred information not overtly apparent in the data set - data that LLMs are very adept at teasing out - many smart people have confused LLM output with a generative rather than an extractive process. LLMs no more “generate” text than a well “generates” water. Language models “merely” infer (often unseen or uncharacterized) semantic structures from a gigantic training set, using them to alloy to a seed (prompt) to create their output.
Do not misinterpret this. In practice, LLM’s are much, much more than mere text prediction engines. They are an incredible tool to digest data and extrapolate probable conclusions. Armed with a substantial subset of human cultural output, they are able to produce a facsimile of human interaction while providing access to an enormous body of knowledge and capability. Their potential utility is enormous and largely unexplored.
In fact, LLMs and their brethren constitute a kind of universal algorithm, a sonic screwdriver if you will, with which we can solve any (mostly) solved problem set by merely presenting the problems and enough known solutions so that the hidden algorithmic relationships can be teased out into the model parameters. We don’t need to elucidate or even understand the underlying algorithms to automate solutions.
But it only works on the class of solved problems where we can provide solution examples for training, where the solution, even if unelucidated, is semantically predicted in the training set. Even then, there is no guarantee that any given answer will be conformal or useful - a feedback loop or verification is a critical step.
Current types of AI can’t solve novel problems except by trial and error.
Some problems remain unsolved, but they are hypothetically solvable, and the solution lies within a more or less well defined set of potential solutions.
Insofar as unsolved problems can be characterized as a solved system of generating and testing hypothesis to test for the solution, we may potentially also assail many unsolved problems using LLM’s. Here, we leverage the “creativity” of random data to seed potential solutions. While the result may be truly novel, the creative heavy lifting was done by a random number generator, not some special insight or revelation on the part of the LLM. In this case, the LLM presents as low-cost, scalable labor.
Hypothesize, test, modify, repeat. AI can be our legion of Edisons, but not our Tesla. LLMs will never have a flash of genius or intuit something new. They may tease out hidden data, but they will not generate any new knowledge except by chance.
The key takeaway here is that AI works better where there is a feedback loop inherent in the process. This is very promising for some potential applications.
Even limited AI will change everything.
AI is a tool to explore and mine one of the most valuable and useful resources that humanity has access to. The body of human cultural knowledge and experience is the underlying resource , and AI is the drill with which we pull out the parts we want. We steer the drill with prompting, and we train models to better encompass the entirety or a specialized subset of that cultural knowledge. Not only can it retrieve this knowledge, but it can apply it in context. That capability is fundamentally new, and is the core value proposition of the current type of artificial intelligence.
Subsets of computer programming, many of the most vexing problems in robotics, navigation of constrained data spaces such as translation, tagging, indexing, logging, parsing, data transformations… those are all strong target candidates for transformer architecture automation, but creative thought is not on that list.
AI will not write award winning literature. It does not model the state of mind of the reader, and by design it chooses well known tropes, unsurprising story lines, and predictable outcomes.
Interestingly, LLMs -can- write good children’s stories, where the principal goal is actually to teach the language of story - the characters, tropes, and plots that are the stories of our culture.
AI won’t enthrall and inspire with its creativity or unorthodox plot lines. It won’t write an interesting article, perform an engaging interview, or employ the imagination of the reader except in the most tedious of ways. It will reformulate and recite elements from a thousand different stories to tell a new one, but not in a way that creates something meaningfully new.
Perhaps fortunately, most labor is squarely within the realm of fully solved problems. We need robots to do work that requires almost no thinking for a human, but that rather numbs the mind. That encompasses at least 25% of all labor, perhaps much more if we include unpaid labor. We need OCR that interprets charts, tables, and images. We need forms filled out and questions answered from a pile of documents. We need data classified, entered, and filed. We need to give people access to information through an interface that has no learning curve and anticipates their needs with uncanny accuracy. We need an over-the-shoulder helper and second opinion when reviewing images, briefs, and papers, a skeptic to curb our vanity, and a rubber ducky that can blow through the boilerplate and keep us from having to reinvent the wheel for the nth time.
These are all tasks that AI -can- be trained to do, even if our current training priorities and paradigms have made limited inroads into the problem space. Even without “AGI” (whatever that has grown to mean), this is enough to turn society on its ear, increasing productivity to well beyond the point of utility in many sectors, sparking a third industrial revolution with orders of magnitude decreases in the cost of many classes of labor.
I think all of this will happen, for better or for worse. But most of it will happen after this bubble has burst, after the fanciful aspirations of a brilliant, all knowing super product have faded.
Perhaps someday we will manage to create an actual creative engine, and perhaps LLMs and related technologies will be a stepping stone on this path either in nature or in utility. But that is not the bubble we are in.
Right now we should be focusing on the things AI can already do much, much better than was possible before. As a handy assistant that can find the answers to well defined and knowable questions, automating repetitive cognitive tasks, searching for semantic links and relationships in information, providing an interface to datasets, translating human language instructions into software, endowing robots with language guidance and generalized capability… all of these fields are showing great promise and do not inherently collide with the limits of the technology.
By learning to leverage these strengths, we will perhaps learn how to construct new ones, or perhaps even understand intelligence well enough to take our next significant step.
You so nailed what AI (today) is and is not. This is the best explanation I've found in print anywhere of that which I have been continually expressing to those I know. This description really underscores the point I regularly make that AI (today) should not (and, really, can not) replace human ability to originate. Today's AI is wonderful at improving our human ability to originate by speeding up the process of learning and understanding that feeds our efforts to produce original work. Thank you for this; I'll be sharing it.