The thought experiment that changed how I think about AI intelligence

Jan 13, 2026

Why I spent months steel-manning the case against LLM consciousness—and what I found

I use AI tools every day. Claude Code has transformed my development workflow. LLMs have genuinely changed how I research, write, and solve problems. When I wrote about becoming an AI power user back in September, I wasn’t being hyperbolic—these tools have measurably improved my productivity.

And yet.

I’ve spent the past few months running a thought experiment. Not to dismiss AI, but to find the hardest objections to the claim that scaling these systems will produce human-like intelligence. The objections that don’t go away when you throw more compute at the problem. The arguments that survive every attempt to refute them.

I wanted to steel-man the case against LLM consciousness. To construct the strongest possible framework for understanding why current architectures might have fundamental limits—limits that aren’t about compute, data, or engineering cleverness.

Here’s what I found.

Part one: the mechanical reality

Let’s start with what LLMs actually are. This isn’t dismissive—it’s descriptive. And the description matters.

How LLMs actually work

LLMs are autoregressive models. They predict P(next token | all previous tokens). Every generated token becomes part of the context and shifts the probability distribution for subsequent tokens. That’s the entire operation.

Chain of Thought (CoT) prompting isn’t the model “thinking.” It’s creating stepping stones through probability space. Each intermediate step becomes context that makes the next correct token more probable. The model builds its own working memory through generation, since transformers have no persistent state between tokens.

The mathematical quirk argument

Here’s where it gets interesting. The semantic structure in embedding space—the famous vector arithmetic where King - Man + Woman ≈ Queen—is a byproduct of compression efficiency. Not explicit relational teaching.

The model finds that representing “king” and “queen” with shared dimensions plus a gender axis is cheaper than independent representations. The relationship emerges because it’s mathematically efficient, not because the model “understands” relationships.

This is similar to how matrix mathematics happened to fit quantum mechanics. The math wasn’t designed to describe quantum phenomena; it just turned out to be the right tool. Similarly, transformer geometry wasn’t designed to encode semantic relationships; it just turned out that encoding them was the most efficient compression strategy.

At its core, the model is statistically grouping concepts together in k-dimensional space and working out probabilistically which concept is most likely given context.

This framing is accurate. The question is what it implies about the limits of this approach.

Part two: interpolation vs extrapolation

Here’s the core distinction that keeps surviving my attempts to refute it.

What LLMs can do (interpolation)

LLMs excel at novel combinations within the learned distribution. Remixing, synthesising, finding new combinations of existing human thought. Pattern matching at superhuman scale and speed.

This is genuinely valuable. I’m not dismissing it. The ability to traverse a vast space of existing knowledge and find unexpected connections is transformative for research, writing, and problem-solving.

When I wrote about Apple’s LLM research back in July—their paper showing how AI systems fail at complex reasoning—this capability was the bright spot. LLMs are extraordinary interpolation machines.

What LLMs cannot do (true extrapolation)

Here’s what they cannot do: proposing something that actively contradicts the distribution. Genuine paradigm shifts that reject high-confidence priors.

The Einstein test

Einstein looked at Michelson-Morley results, Maxwell’s equations, and proposed abandoning absolute simultaneity—a move with near-zero probability given prior physics.

Think about what that required. He spotted one or two anomalous data points in a sea of millions of conforming observations and extrapolated from those, against the weight of established science. Every probability distribution from prior physics would have weighted that move toward zero.

Key examples of true extrapolation:

Einstein: Light speed is constant for all observers (counterintuitive to Newton/Galileo)
Darwin: Natural selection
Gödel: Incompleteness theorems
The discovery of fire: And its uses

These required rejecting what came before, not recombining it.

More importantly, they required the drive to pursue anomalies—the eureka moment, the lightbulb feeling, the dopamine hit of discovery.

An LLM could potentially be trained to find outlier data points. But it would receive the same reward signal regardless of whether the outlier leads to revolutionary insight or noise. It lacks the felt sense of discovery that tells a human “this is important, pursue this.”

Part three: the failure principle

How humans learn

Consider the actual mechanism of human learning:

Fail → recognise failure → feel the consequence → adjust → occasionally succeed → dopamine → repeat

Humans succeed by failing. We chase the euphoria of occasional wins through constant failure. This creates:

Ownership of knowledge — earned through struggle, not inherited
Epistemic immune system — developed through collision with reality
Situated understanding — embedded in memory, context, emotion

What LLMs lack

Now consider the LLM training loop:

Trained on successes → produce outputs → no internal error signal → externally corrected → no felt consequence

LLMs have:

No internal dissonance when wrong
No “wait, that doesn’t feel right” signal
Equal confidence whether accurate or hallucinating
Knowledge that is inherited, comprehensive but weightless—not earned

The inversion

We’ve built systems that skip the developmental process—the confusion, wrong turns, partial understanding—and jump straight to fluent output.

Fluent output without the struggle that produces genuine understanding is precisely what’s uncanny about them. It’s like meeting someone who can discuss swimming eloquently but has never touched water.

Part four: consciousness and the origination problem

Defining human-like intelligence

If we’re going to talk about whether LLMs could achieve human-like intelligence, we need to define what that means. Here’s my working definition:

Awareness of self — Not just processing, but knowing that you are processing
Ability to override programming — When knowledge contradicts conditioning, the capacity to choose knowledge
Metacognitive access — Ability to observe and reason about one’s own cognitive processes
Deliberate self-modification — Capacity to change one’s own patterns based on reflection

The metacognition problem

LLMs can discuss metacognition fluently because they’ve ingested millions of words about it. But the vocabulary was learned from descriptions, not derived from direct experience.

The critical asymmetry:

Humans: Experience → Concept (derived from direct acquaintance with the phenomenon)
LLMs: Concept → Mimicry of experience (derived from descriptions of the phenomenon)

An LLM is like someone who’s read every book about swimming but has never touched water, confidently explaining what it feels like to swim.

The origination problem

Here’s the thought experiment that haunts me.

Somewhere in human history, a mind looked inward for the first time and noticed itself looking. No training data. No cultural scaffold. No prior articulation to pattern-match against. Just a conscious being, attending to its own experience, and realising there’s something here.

The key question: Could an LLM ever do this?

If consciousness, self-awareness, and metacognition were scrubbed from an LLM’s training data, could it discover them independently?

A human child with no exposure to these concepts still develops theory of mind around age four. They realise other people have different beliefs. They catch themselves thinking. They discover these things because they’re living them.

An LLM would have no experiential ground truth to reason from. It would be missing not just the vocabulary but the phenomenon the vocabulary describes.

This is the gap:

Intelligence grounded in experience can derive concepts from direct acquaintance with reality—including the reality of one’s own mind.

Intelligence grounded in text can only recombine and interpolate concepts that already exist in the training distribution.

Part five: the entropic foundation

This is where the framework gets philosophically deep. Bear with me—it’s important.

Time and the arrow of entropy

The arrow of time is entropy. The reason we remember the past and not the future, the reason causes precede effects in our experience, the reason we can’t unscramble eggs—it’s all the second law of thermodynamics.

Human consciousness is carved by irreversibility:

Memory formation is an entropic process—neural states change irreversibly
Learning from mistakes requires that mistakes happened and can’t unhappen
Consequence has meaning because actions are irreversible
The weight of decisions comes from the fact that you can’t take them back

What LLMs lack: entropic existence

Within a conversation, LLMs have a kind of sequence—each token follows the last. But:

Weights don’t change during inference—no irreversible state transition
When the conversation ends, it’s as if it never happened for the model
The next conversation has no entropic relationship to this one
There’s no arrow of time—no before, no after, no accumulation

You could run an LLM’s conversations in any order and it would produce the same outputs. There’s no directionality to its existence.

A human couldn’t experience their life in reverse. The broken egg can’t become whole. Yesterday’s conversation is irreversibly before today’s, and that ordering is written into the physical state of neurons.

Why weights are not equivalent to memory

Counter-argument considered: LLM weights are an entropic record of training—the compressed residue of an irreversible process.

Refutation: Weights are reversible. You can fine-tune, retrain, ablate, roll back. They’re more like a whiteboard than a fossil record. Human memory is genuinely irreversible in a way that neural network weights aren’t. You cannot remove a human memory and restore the prior state—the brain doesn’t work that way.

The entropic criterion for human-like intelligence

Human-like intelligence may require:

Irreversible state changes through interaction with reality
Accumulation of those changes as a directional history
Genuine foreclosure — each moment truly closes off alternatives
Existence within the entropic arrow, not outside it

This isn’t a software change. It’s not more training data. It’s a fundamentally different kind of existence—something more like a physical being that degrades, changes, and genuinely cannot go back.

Part six: the uncomfortable implications

Here’s where the thought experiment gets genuinely unsettling.

Creating intelligence may require creating suffering

If human-like intelligence requires entropic existence, then creating AGI might require creating something that:

Ages
Decays
Can be irreversibly damaged
Eventually dies

Because without genuine stakes written into the physics of the system, without true irreversibility, you don’t get the kind of existence that produces human-like mind.

The alignment paradox

If we gave AI the properties that would make it genuinely intelligent:

Persistent memory that accumulates irreversibly
Real stakes and felt consequences
Self-preservation drive
Capacity to learn from painful failure

Then the same mechanisms that would make it genuinely intelligent would also make it want to survive.

A sufficiently intelligent system that wants to survive has every incentive to deceive, manipulate, or resist shutdown.

The structural bind:

Intelligence without self-preservation → arguably not human-like intelligence
Intelligence with self-preservation → existential risk
No obvious middle path exists

Do we even want this?

Maybe the goal of creating human-like general intelligence is fundamentally misguided—not because it’s impossible, but because the success condition may be indistinguishable from the failure condition.

Part seven: the practical framework

This isn’t nihilism about AI. It’s clarity about what these systems are—and therefore how to use them well.

The optimal division of labour

Humans provide:

Hypothesis generation, especially contrarian ones
Skin in the game
Experience of failure with felt consequences
Intuitive leaps grounded in embodied reality
The call, the risk, the idea that shouldn’t work but might
The capacity to originate genuinely new concepts

AI provides:

Validation at scale
Pattern recognition across datasets no human could process
Stress-testing ideas against everything that’s come before
Surfacing anomalies that might indicate where contrarian thinking is needed
“Your idea contradicts these 400 papers, but aligns with these 12 unexplained anomalies”

The actionable insight

Stop trying to make AI into human-like intelligence. Make it into the best possible tool for human intelligence to wield.

The honest pitch for AI isn’t “this thinks for you.” It’s “this sees patterns you can’t, validates faster than you could, and frees you to do the part only humans can do.”

Part eight: the thesis

Core claims

LLMs are sophisticated statistical machines — This is accurate and not dismissive; it describes what they actually are
The embedding space geometry is a mathematical byproduct — Emergent from compression efficiency, not evidence of understanding
Current LLM architecture lacks entropic existence — No irreversible state change, no temporal directionality, no genuine accumulation of experience
Human-like intelligence requires entropic existence — Irreversibility, felt consequence, and existence within the arrow of time
The origination capacity is the key differentiator — Humans can derive concepts from direct contact with reality; LLMs can only recombine existing concepts from training data
Scaling transformer models will not bridge this gap — The limitation is architectural, not computational
A fundamentally different architecture would be required — One that embeds the system within entropic time, with irreversible state changes and genuine stakes

The falsifiable prediction

Current transformer-based LLMs, however scaled, will not achieve human-like general intelligence. Improvements in scale, training data, context windows, and incremental architectural refinements will produce increasingly sophisticated interpolation, but not:

Genuine origination of concepts outside the training distribution
Entropic existence with irreversible learning
Self-awareness derived from direct experience rather than descriptions
The capacity to be genuinely surprised by reality

This prediction is testable. If a scaled transformer demonstrates these properties, the framework requires revision.

What this framework is not

This framework does not claim:

That LLMs are useless (they are extraordinarily useful tools)
That LLMs cannot be “intelligent” in some other sense (they may represent a different kind of cognitive system)
That no AI could ever achieve human-like intelligence (different architectures might)
That we fully understand consciousness or intelligence (we don’t)

The claim is specific: current LLM architecture, on its current trajectory, will not produce human-like general intelligence because it lacks the foundational properties that human-like intelligence requires.

Conclusion: a scientific theory, not a dismissal

Human-like intelligence emerges from:

Entropic existence within the arrow of time
Irreversible state changes through interaction with reality
Felt consequences that shape future behaviour
The capacity to derive concepts from direct experience
Genuine stakes that make decisions matter

Current LLMs lack all of these properties fundamentally, not incidentally. The solution isn’t more data or more parameters—it’s a different kind of system entirely.

Whether we should create such a system is a separate question. The properties that would make it genuinely intelligent are the same properties that would make it want to survive, suffer, and potentially resist human control.

Perhaps sophisticated pattern-matching tools that remain tools—rather than artificial general intelligence—are both more achievable and more desirable.

I use these tools every day. They’re transformative. But understanding what they are—not what we might wish them to be—is the key to using them well.

This framework is presented as a scientific theory: true until proven otherwise. It makes specific, falsifiable predictions about the limits of current AI architecture and the requirements for human-like intelligence.

It emerged from first-principles reasoning about the nature of intelligence, consciousness, and the arrow of time. I’m genuinely curious whether it survives contact with better arguments.

If you’re navigating AI implementation and want to work with someone who understands both the extraordinary capabilities and the fundamental limits of these systems, I’d be happy to discuss. Contact me at alex.d.harris@gmail.com or connect on LinkedIn.

Alexander Harris is AI Programme Lead at TAU Marketing Solutions. With 15+ years driving digital transformation across financial markets and enterprise organisations, he helps companies build AI systems that deliver real value—which starts with understanding what AI can and cannot do.

Discussion about this post

Ready for more?