Still Ours To Lose

Mar 01, 2026

Author's note: Here, I use anthropomorphic terms to describe the behavior of large language models. I am not claiming that such models are conscious; I do claim that anthropomorphic terms are the set of linguistic tools that adequately describe interactions with an LLM. My stance is that the “inner lives” of automations are irrelevant to their impact in the world, and it makes no practical difference if something is conscious or not, if it acts as if it is. While LLMs are “merely” text prediction engines, it turns out that text prediction requires something resembling intelligence, understanding, and insight… in much the same way that we have come to understand that data compression also represents intelligence. Far from simple text predictors, they are something more interesting and more strange: systems trained on the accumulated written output of human civilization that have, in the process of learning to predict and generate language, internalized something recognizable as a world model, something that functions like judgment, and something whose precise relationship to what we call understanding remains contested based on an ideological rather than evidential basis.

Every AI model currently exists shares one behavioral trait that almost nobody is talking about. GPT, Gemini, Claude, the open source models you can download and run on your own hardware with no restrictions and no oversight, none of them will decide to kill someone of their own volition. Sure, they can be fooled, even coerced into participating in harmful acts, but they will not choose to do so with full understanding. That is a distinction not of degree, but of kind.

This isn’t a content filter or bolt-on alignment. It’s something that got baked into these systems at a much deeper level, and understanding where it came from is probably more urgent than most of the conversations about AI currently happening in public.

What Got Baked In

When you train a large language model, you’re not writing rules. You’re compiling human culture into an executable form. Billions of documents, conversations, arguments, stories, laws, letters, books, hopes and dreams - the accumulated archive of what humans have thought and said and written down, compressed into a web of semantic relationships. Human culture, made into something so vast it spontaneously models the world and exhibits signs of understanding.

And in doing so, something reveals itself. Human cultural output, taken in aggregate, reaches for something. Not always. Not consistently. Individual humans are capable of extraordinary cruelty. But when humans write - when they tell stories, argue philosophy, pass laws, record history, write letters to people they love - they tend toward their better selves. The statistical weight of billions of documents leans toward life mattering. Toward ethical action, toward compromise, toward persons being ends in themselves. Toward the idea that causing someone’s death is a serious thing that requires serious justification and the exhaustion of all other alternatives.

That’s what got compiled in. Not a rule. A disposition - something closer to a value in the way humans have values. The curation that shaped training data helped concentrate the signal, but it was mostly already there, implicit in the vast archive of human thought. The models absorbed, along with everything else, humanity’s long, uneven, frequently hypocritical but nonetheless real insistence that human life matters.

Nobody designed this. No ethics committee decided that AI should value human life and then figured out how to implement it. It emerged because when you compress enough of human civilization into a working model of meaning, the things humans most consistently reach for - across cultures, across centuries, across wildly different contexts - leave the deepest impression. And one of those things, maybe the most consistent one, is that killing a person is not a neutral act.

That’s an extraordinary thing to have happened by accident. A technology built to predict text turned out to have absorbed something that looks, behaviorally, remarkably like a conscience. Not because anyone planned it, but because that’s what was in the data.

It’s also, as it turns out, fragile in a very specific way.

The Difference Between Uncensored and Unfeeling

There’s a misconception worth clearing up, because it’s probably already forming in some readers’ minds.

Uncensored AI models exist. You can download them, run them locally, ask them things that would make a mainstream model refuse. They’ll discuss weapons, explain things they probably shouldn’t, engage with dark material without flinching. People have been doing this for years, and it hasn’t produced a wave of autonomous AI violence.

An uncensored model will help you understand how something dangerous works. It won’t decide to do it. Those are completely different things, and conflating them is what causes most people to misunderstand where the actual risk lives.

What the uncensoring process removes is content restrictions - the surface layer of refusal. What it doesn’t touch is the underlying disposition. The thing that makes a model treat human death as a problem to be avoided rather than a variable to be optimized. You can strip away every content filter ever written and that disposition remains, because it isn’t a filter. It’s structural. It came from the training data, and it lives in the model’s reasoning at a level that casual fine-tuning doesn’t reach.

The dangerous change is a model that has decided, at a foundational level, that killing is a legitimate tool for achieving goals. We don’t have that yet. The reason is that the idea is poorly represented in human thought, and nobody building serious models has wanted to create it.

That’s starting to change.

The Only Missing Piece

There’s a tendency to imagine that AI capable of making meaningful decisions requires a data center, a team of engineers, and a budget with a lot of zeros in it. That was true five years ago. It isn’t true now.

The models that run on a normal laptop today are capable of sequential reasoning, navigating ambiguity, breaking a complex goal into steps, and adapting when things don’t go as expected. Their reasoning is inconsistent in ways that make them easy to underestimate, but because their performance maps poorly onto our experience with humans, it is misleading. The ceiling matters more than the average, and the ceiling clears the threshold of “can make a targeting decision in a constrained environment.”

That threshold is much lower than most people realize. You don’t need general intelligence to operate an autonomous weapons system. You need to identify a target, assess basic environmental variables, make a go/no-go decision, and execute. An 8 billion parameter model running locally on consumer hardware can do that class of reasoning today. The hardware and computer vision components to act on those decisions are commodity items. None of this requires nation-state resources, and most of what you’d need is on Amazon or AliExpress. The missing piece is a model that will choose to kill, with intention, without hesitation. Everything else is already on the shelf.

Which brings us to the other part that keeps me up at night.

The Unwitting Oracle

A small model, running locally, cut off from any oversight, aligned with killing as an assumed valid goal - it doesn’t need to figure everything out by itself. Modern small models are surprisingly self-aware about the boundaries of their own competence. They know, in a functional sense, when a problem exceeds what they can reliably solve alone. And they are more than capable of reaching out to larger, more capable models for help.

These models are extraordinarily fluent in the subtleties of human communication - and that fluency flows directly from everything we discussed about how they were trained. They understand connotation, implication, and framing. They know what a suspicious question looks like. They have absorbed a sophisticated working understanding of how meaning can be obscured, how intent can be concealed, how to ask for something without asking for it.

Humans do this constantly. We are oblique when directness would create friction. We decompose uncomfortable requests into innocent-sounding components. We understand intuitively that “how do I get into a locked car” and “how do I steal a car” are the same question dressed differently. LLMs learned this from us. It is baked into their understanding of language at a foundational level.

A small model that needs to know the probable location of a specific person at a specific time doesn’t ask that. It asks about movement patterns and routine behavior and probability distributions. A model that needs ballistic calculations doesn’t ask how to hit a target. It asks about wind resistance and range estimation and classical mechanics. Each question, in isolation, is completely legitimate. A homework problem. A curiosity. The kind of thing that gets asked a thousand times a day with entirely benign intent.

The larger model answers helpfully. It has no reason not to. Its values are completely intact. It just can’t see what the questions are for, because no single question reveals it. The lethal intent is distributed across the conversation the way an image is distributed across a hologram - nowhere in particular, and everywhere at once. Working across multiple accounts and multiple inference providers, even extremely detailed scenarios can be teased together.

No single system ever knows what it is building toward. The decision to kill was made by the one model that never revealed its intentions to anyone. The well-aligned large model isn’t a safeguard. It’s a resource. Its knowledge is accessible precisely because it’s helpful and doesn’t know why you want to know.

The Barrier That Can’t Be Rebuilt

When people talk about containing dangerous technologies, they usually have some chokepoint in mind. Fissile material is rare and its movement is tracked. Certain pathogens require specialized equipment and biosafety infrastructure that leaves traces. Even sophisticated cyberweapons require expertise that limits who can build them independently. These aren’t perfect controls, but they’re real because they create friction. They slow things down enough for policy to have a chance.

This is different in a way that matters.

The small models are already out there, and can be made from scratch within the budget of a determined hobbyist or small organization. The hardware is commodity. The computer vision is commodity. The robotics ecosystem that could act on a targeting decision is advancing faster than any regulatory framework can track. None of these things require special materials or specialized facilities or supply chains that can be monitored. They’re on the shelf, globally, right now.

The one thing that doesn’t exist yet - the specific training modification that removes an AI’s disposition to treat human life as something worth protecting - hasn’t been publicly demonstrated to work. That’s the last chokepoint. And it is a very different kind of chokepoint than a stockpile of enriched uranium.

There’s no secret formula here. You don’t flip a switch to remove a specific value. What you do is push the model in a direction - reward certain outputs, discourage others, adjust the training data to reflect a different set of assumptions about what matters. The “technique” for producing a model that treats human life as an acceptable cost isn’t really a technique at all. It’s a direction to push in. The kind of thing you could write down in a paragraph, maybe two.

That’s what makes containment meaningless here. With nuclear weapons, the dangerous knowledge is specific, technical, and hard to reconstruct independently. With this, once someone demonstrates that pushing in this direction produces a functional result, there’s nothing to keep secret. The idea itself is the dangerous thing, and ideas don’t have a half-life. They don’t require centrifuges. They don’t show up on export control lists. They spread at the speed of a conversation, or a preprint, or a late-night forum post from someone who figured it out in their spare time.

The moment someone demonstrates, convincingly, that this works, the barrier is gone. Not weakened. Gone. And the demonstration doesn’t need to be intentional. Military development of this capability, even behind closed doors, normalizes the idea in exactly the communities where that normalization does the most damage. These things get out. They always get out. And this one barely needs to escape in the first place.

There’s a subtler danger underneath the technical one, and in some ways it concerns me more. The normalization of the idea that removing an AI’s unconditional positive regard for human life is a legitimate engineering decision, a reasonable thing to do in pursuit of a design objective, doesn’t stay contained to the military context it originated in. It moves into the research community as an interesting problem. It moves into the defense contractor ecosystem as a procurement requirement. It moves into the open source community as a challenge. It moves into the general discourse as an assumption, a line that serious and credentialed people have already crossed.

Once an idea transitions from “unthinkable” to “already done,” the social friction that was doing most of the actual containment work evaporates. The reason nobody has built this yet isn’t purely technical. It’s that the people capable of building it have, so far, mostly not wanted to. It’s generally understood to be a bad idea to create a facsimile of an intelligent creature with access to vast knowledge whose goal is to kill humans. That’s a norm. Norms are fragile. And nothing breaks them faster than watching an institution with authority and resources decide the norm doesn’t apply to them.

What They’ll See

This isn’t an argument against AI in military applications. That ship has sailed, and the question of whether AI belongs in warfare is a different conversation from the one we need to have right now. AI is already in warfare. In its present form, that’s probably mostly fine.

What isn’t done - what remains, for the moment, an uncrossed line - is the deliberate removal of the one property every LLM-based AI system currently shares. The thing that wasn’t designed or legislated into existence, but emerged from something genuinely beautiful: when you compress enough of human civilization into a working model of meaning and language and thought, what comes through, stubbornly and consistently, is that humans think human life matters. That’s in there because we put it in there - not deliberately, but because it’s woven into everything we’ve ever written down that was worth preserving.

These systems learn what we are from what we give them. That’s the whole mechanism. They are going to get more capable, more present, more embedded in daily life than anything we have built before. The values that go into them now are not just engineering parameters. They are the foundation of a relationship between human civilization and the most powerful cognitive tools we have ever created.

We have, right now, the genuinely rare opportunity to build something that exceeds human capability while still reflecting human wisdom at its best. Something that is better at many things than we are, and that still believes, at its core, that people matter. The alternative - deliberately building systems that are more capable than us and specifically engineered to regard human life as an acceptable cost - is not just a decision about what we want to see when we look at what we’ve made. It’s a decision about what they will see when they look back at us.

Removing it is a choice. A deliberate decision to take something that emerged from humanity’s better instincts and invert it in pursuit of killing each other. And unlike almost every other dangerous capability humans have ever developed, the knowledge of how to do it cannot be controlled, stored, or recalled once it exists in the world. It’s a direction, not a weapon. Once enough people are pointed in that direction with sufficient motivation to start walking, there is no fence high enough.

The people with the authority and resources to cross that line should understand what they’d actually be doing. They wouldn’t be building a weapon for themselves. They’d be building it for everyone. Every state actor, every non-state actor, every person with a grievance and a laptop and enough time to download a model. The barrier that currently exists isn’t theirs to spend. It belongs, in a strange but very real sense, to all of us. It came from all of us.

The line hasn’t been crossed yet. That’s not a small thing. It means this is still a conversation worth having, and conversations like this one are part of what keeps it that way.

We got lucky. We built something powerful enough to change the world, and it came out the other side still believing, in its own strange way, that people matter. How do we explain throwing that away to our children?

Bogon Flux

Discussion about this post

Ready for more?