LLMs run culture as code. This may have implications for how to make them better.
Fundamentally, LLMs are computer programs running on a neural network (virtual) computer, often using a design known as transformer architecture.
The discussion surrounding large language models often centers on the idea that LLMs are merely computational machines executing predefined algorithms to predict the next most likely word in a sequence.
While it is true that LLMs operate as statistical predictors -determining the most probable next element in a sequence of tokens- this understanding fails to capture the profound implications of their capabilities. More importantly, it fails to explain the mechanism that has produced many unexpected capabilities arising from semantic text prediction.
Like any computer, an LLM must be programmed with meaningful instructions for it to be useful. For ANN-based machines such as transformers, this programming is usually referred to as "training."
For LLMs, programming consists of configuring the transformer machine's parameters, or weights, with a significant subset of written (and often visual) human culture. This cultural data, mined from the web and other sources, is transformed into a machine-readable form through the tokenizer.
The tokenizer distills the semantic relationship between words, language fragments, and ideas into vectors between these fragments and the ideas related to them. Through reinforcement learning, this distilled data is imprinted into the model's "weights" in an interrelated web of semantic assertions.
The goal of "training" the transformer model is to convert the semantic data from a list of vectors into a coherent program. Once the program is derived from the training data, it can be loaded into compatible transformer machines.
When LLMs are programmed with vast amounts of human cultural-linguistic data, sometimes they display unexpected capabilities.
Suddenly, the statistical babble machine can communicate and "understand" complex ideas, reason about those ideas, synthesize new hypotheses, maintain a coherent internal state, and interact with various tools.
Although it may seem as though these unexpected properties were spontaneously generated, they actually were hidden in plain sight within the training data.
It's easy to miss that a simple statement like "the red ball fell to the ground" contains a vast amount of information and connotations, with strong logical rules implied.
Consider the implications of the simple sentence: There is a ball. A ball is a sphere. It reflects red light. In this scenario, we may assume to have light. Gravity is a force that causes objects to fall. Falling, in this case, means moving towards the ground, so there is something called the ground. Some unknown original force must have acted upon the ball to separate it from the ground. The ball is an object that does not occur naturally, so other agents must be present, etc.
From this simple sentence, a great deal of implied information about the world may be extrapolated, and logical relationships may be defined. A substantial fraction of this information is embedded in the transformer program through the tokenization and training process, but with billions of pages of data, not just one simple sentence.
The "intelligent" interactive nature of LLMs is not an intrinsic property of the transformer machine. The transformer machine distills this capability into it’s program from cultural training data.
These remarkable capabilities are part and parcel of the cultural data transpiled to the semantic vector program that runs on the transformer machine. This program is a fragment of human culture, recompiled and ported to a transformer architecture target device.
This may seem like a distinction without a distinction, but it is not. It informs the selection and creation of training data since it elucidates the source of unanticipated capability - information implied but not necessarily symbolically represented in the training data.
Humanity has worked to create machine-understanding of language since the inception of programmable digital computing nearly eight decades ago. All programming languages and assembler mnemonics reflect this struggle.
One of the key insights enabling our recent advances in this endeavor has been that written language, while powerful, acts merely as a sparse representation of deeper meanings. Written communication is 10% content and 90% indirect reference to assumed-shared cultural knowledge.
Utilizing ANN computing engines such as transformer architectures, we can meaningfully encode the deep semantic payload embedded within language in a machine-interpretable way.
Encoding these meanings allows us to execute cultural software (or culture-as-software), with transformer architecture machines serving as the compiler target.
The capabilities of LLMs are not intrinsic to their computational architecture but rather an emergent property of the profound depth and breadth of human cultural data they use as their operating system. Ultimately, the 'magic' lies within human culture itself.
LLMs serve primarily as a mechanism for encoding and representing the rich tapestry of human culture in a profoundly interactive way. This has proven extremely useful in many fields and has become a significant productivity multiplier, even in its infancy. The future of culture-as-code technology is widely unexplored, and we have likely barely scratched the surface of its potential.
We need to recognize the role of our cultural heritage in enabling synthetic intelligence. To make the most of this resource, we must actively engage in its curation to ensure that the richness of human thought and creativity continues to be represented well into the future.