LLMs run culture as code. This may have implications for how to make them better.
Large Language Models as Culture, Compiled
Fundamentally, large language models (LLMs) are computer programs running on virtual machines - neural network architectures, most commonly the transformer design. This architecture, introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al., provides the computational framework for processing and generating language.
The discussion surrounding LLMs often exalts them as nascent intelligences or reduces them to mere statistical engines—machines that predict the next most likely word in a sequence. While the statistical characterization is technically accurate, it obscures the deeper significance of their capabilities and the mechanisms that give rise to their unexpected, often profound, behaviors.
From Statistical Prediction to Semantic Understanding
While it is true that LLMs operate as statistical predictors —determining the most probable next element in a sequence of tokens— this understanding fails to capture the profound implications of their capabilities. More importantly, it fails to explain the mechanism that has produced many unexpected capabilities arising from semantic text prediction.
It is not intuitively clear how such a statistical process can produce systems capable of reasoning, synthesizing hypotheses, and interacting with complex tools.
Like any computer, an LLM must be programmed with meaningful instructions to be useful. For ANN-based machines such as transformers, this programming is usually called "training."
For LLMs, programming consists of configuring the transformer machine's parameters, or weights, with a significant subset of written (and often visual) human culture. This cultural data, mined from the web and other sources, is transformed into a machine-readable form through the tokenizer.
The tokenizer distills the semantic relationship between words, language fragments, and ideas into vectors between these fragments and the ideas related to them. Through reinforcement learning, this distilled data is imprinted into the model's "weights" in an interrelated web of semantic assertions.
The goal of "training" the transformer model is to convert the semantic data from a list of vectors into a coherent program. Once the program is derived from the training data, it can be loaded into compatible transformer machines.
The Role of Training Data: Culture as Code
The goal of training is to convert this semantic data into a coherent, executable program. When an LLM is trained on a significant subset of human cultural-linguistic data, it does more than memorize patterns; it internalizes the underlying structure of human thought and knowledge.
This process is akin to compiling a program, where the source code is human culture itself, and the target platform is the transformer architecture.
Suddenly, the statistical babble machine can communicate and "understand" complex ideas, reason about those ideas, synthesize new hypotheses, maintain a coherent internal state, and interact with various tools.
Although it may seem as though these unexpected properties were spontaneously generated, they actually were hidden in plain sight within the training data.
The resulting model is not merely a statistical artifact but a dynamic representation of cultural knowledge. For example, consider the sentence: "The red ball fell to the ground." On the surface, this is a simple statement. Yet, it carries a wealth of implicit information:
Consider the implications of the simple sentence: There is a ball. A ball is a sphere. It reflects red light. In this scenario, we may assume to have light. Gravity is a force that causes objects to fall. Falling, in this case, means moving towards the ground, so there is something called the ground. Some unknown original force must have acted upon the ball to separate it from the ground. The ball is an object that does not occur naturally, so other agents must be present, etc.
From this simple sentence, a great deal of implied information about the world may be extrapolated, and logical relationships may be defined. A substantial fraction of this information is embedded in the transformer program through the tokenization and training process, but with billions of pages of data, not just one simple sentence.
This depth of meaning is not unique to this sentence. Every utterance in human language is rich with implied knowledge, much of which is never explicitly stated. Research in linguistics and cognitive science, such as the work of Dan Sperber and Deirdre Wilson on relevance theory, highlights that written and spoken language is a sparse representation of much vaster, shared cultural and cognitive frameworks. Through their training, LLMs encode these frameworks into their weights, encoding a dense web of semantic associations as a memetic algorithm.
Emergent Capabilities: The Power of Scale and Context
LLMs' "intelligent" behavior—their ability to communicate, reason, and interact—is not an intrinsic property of the transformer architecture. Instead, it emerges from the sheer scale and diversity of the training data. These remarkable capabilities are part and parcel of the cultural data transpiled to the semantic vector program on the transformer virtual machine. This program is a fragment of human culture, recompiled and ported to a transformer architecture target device.
This may seem like a distinction without a distinction, but it is not. It informs the selection and creation of training data since it elucidates the source of unanticipated capability - information implied but not necessarily symbolically represented in the training data.
The surprising capabilities of LLMs are sometimes described as "emergent abilities," a term popularized in studies like "Emergent Abilities of Large Language Models" by Wei et al. (2022). These abilities are not spontaneously generated but are latent in the training data, waiting to be uncovered through the right computational lens. The transformer architecture acts as a compiler, translating the vast, messy corpus of human culture into an interactive, executable form.
Cultural Heritage as the Foundation of Synthetic Intelligence
The history of computing is, in many ways, a history of attempts to formalize human language and thought. Humanity has worked to create machine-understanding of language since the inception of programmable digital computing nearly eight decades ago. All programming languages and assembler mnemonics reflect this struggle.
LLMs represent a breakthrough in their ability to encode not just the surface structure of language but the deeper semantic payload—the shared knowledge, assumptions, and contexts that make communication possible.
This process can be thought of as "culture-as-code." Just as software compilers translate human-readable code into machine-executable instructions, LLMs compile human culture into a format that can be dynamically queried and manipulated. The result is a system that can engage with human ideas in ways that feel intuitive and meaningful, not because it "understands" in the human sense, but because it has internalized the patterns and relationships that define human understanding.
Implications and Future Directions
The practical applications of this technology are already vast, from productivity tools to creative collaboration. However, the concept of culture-as-code is still in its infancy, and its potential is only beginning to be explored.
LLMs are interactive cultural artifacts that could evolve into platforms for preserving and exploring cultural heritage. Culture-optimized LLMs could allow users to interact with historical texts, artistic traditions, and scientific knowledge in novel ways.
By serving as intermediaries between human thought and machine processing, LLMs facilitate new forms of collaboration, augmenting human creativity and problem-solving as collaborative interfaces.
Aside from collaborative applications, this new kind of automation will also replace a small but growing segment of human labor, owing to its ability to “understand” and manipulate tools to process and transform data. This may eventually grow to challenge assumptions of the symbiotic relationship of capital to society.
Ultimately, the "magic" of LLMs is not a product of their architecture, but rather, of the cultural data they encode. They are mirrors of human thought, reflecting back the richness and complexity of our shared knowledge. Recognizing this shifts the focus from the models themselves to the cultural heritage that powers them. To fully realize the potential of this technology, we must engage actively in the curation and stewardship of our cultural data, ensuring that it remains a vibrant, inclusive, and evolving resource.
As LLMs become more integral to society, the curation of training data will grow in importance. Ensuring that these models reflect the diversity, depth, and ethical dimensions of human culture will be critical.
The future of LLMs—and of synthetic intelligence more broadly—will be shaped by how well we understand and harness the power of culture-as-code. We need to recognize the role of our cultural heritage in enabling synthetic intelligence. To make the most of this resource, we must actively engage in its curation to ensure that the richness of human thought and creativity continues to be represented well into the future.


