The generative AI revolution is written in a new programming language. (no, I don't mean prompting)
No, really. Just bear with me here. This isn’t about using LLMs to write code or anything else.
Frequently, the discussion surrounding Large Language Models centers on the apparent fact that LLMs are merely computational machines executing predefined algorithms to predict the next most likely word in a sequence.
While It is true that LLMs operate as statistical predictors - merely determining the most probable next element in a sequence of tokens - this understanding fails to capture the profound implications of their capabilities. More importantly, it fails to explain the mechanism that has produced a myriad of unexpected properties arising from semantic text prediction.
To be clear, I don't mean to diminish the significance of recent advancements in artificial neural network architectures. The transformer architecture, which underpins modern state-of-the-art large language models (LLMs), is instrumental in their capabilities. Seminal works such as 'Attention is All You Need' (arXiv:1706.03762) have laid the groundwork for developing architectures sophisticated enough to navigate the intricacies of the cultural-linguistic landscape.
Prior to being programmed, an LLM produces a pseudorandom string of characters in response to each input. In this useless action, the LLM is functioning correctly. It creates a string of characters probabilistically anchored to its random starting weights. The machine operates as designed, running a program of random instructions on the input and returning the result.
For an LLM to be useful, like any computer, it must be programmed. A computer, without software, has no tangible capability other than the manipulation and representation of data in accordance with its programming. Unfortunately, for ANN based machines such as transformers, we have decided to call this programming “training," which ascribes a characteristic of agency to the algorithm that the transformer machine does not possess.
To suggest that an LLM “predicts text” is equivalent to saying that a piano "plays music." It certainly does not. A person plays music using their mind and body and the piano as an interface. The person can be said to be playing music in their mind, but the piano by itself is utterly inert. It's the software that embodies the agency.
Programming, or "training," a transformer machine is the process of creating the software that the machine will execute on its input. With contemporary LLMs, this programming consists of the porting of a significant subset of written (and often visual) human culture, decompiled into a machine-readable semantic form through the tokenizer.
Transformer architecture machines are not magic AI boxes, but rather, code evaluation engines for a high-dimension representational computer language. This language is “written” by transpiling tokenized human cultural data through a computationally intensive process.
When LLMs are programmed with vast amounts of human cultural-linguistic data, the true capabilities of LLMs emerge as if by "magic." Suddenly, the statistical babble machine can communicate and "understand" complex ideas, reason about those ideas, synthesize new hypotheses, maintain a coherent internal state, and interact with various tools placed at its disposal.
These capabilities are not properties of the transformer machine. These are capabilities of the software that runs on the transformer machine, distilled from human culture. These capabilities are part and parcel of the cultural data that was transpiled into the semantic vector space, which is the software that runs on the transformer machine. This software is human culture, ported to a transformer architecture target device.
Humanity has worked to create machine-understanding of language since the inception of programmable digital computing nearly 8 decades ago. All programming languages and assembler mnemonics are reflections of this struggle.
One of the key insights enabling our recent advances in this endeavor has been that written language, while powerful, acts merely as a sparse representation of deeper meanings. Written communication is 10% content and 90% indirect reference to assumed-shared cultural knowledge.
Utilizing ANN vector computing engines such as transformer architectures, alongside methods to represent language data within a vector-space model, we have finally achieved the ability to meaningfully encode the deep semantic payload embedded within language in a machine-interpretable way.
Encoding these meanings allows us to execute cultural software (or culture-as-software), with transformer architecture machines serving as the compiler target.
It is important to note that when an LLM is programmed or "trained," it is not memorizing words. It is being programmed with a staggeringly complex web of interrelated conditional rules and semantic relationships. It is a program, a dynamic process defined and characterized, not some species of lookup table. The transformer machinery determines the most probable output of the "human culture" program in the same way that a Python REPL evaluates a code fragment and outputs the most probable output of the input script.
As currently employed, the architecture of LLMs serves primarily as a vessel for encoding and representing the rich tapestry of human culture in a profoundly interactive way. This has been proven to be extremely useful in many fields and has become a significant productivity multiplier, even in its infancy. The future of the technology of culture-as-code is widely unexplored, and, likely, we have barely scratched the surface of its potential.
The so-called 'magic' of LLMs is not intrinsic to their computational structure but rather arises from the profound depth and breadth of human cultural data they use as their operating system. Ultimately, the 'magic' lies within human culture itself.
It is imperative that we not only recognize the role of our cultural heritage in enabling synthetic intelligence but also actively engage in its curation to ensure that the richness of human thought and creativity continues to be represented and valuable well into the future.