Karl Popper and Conscious AI: Why Falsifiability Matters Here Too

I gave a talk at the Digital Humanism Lecture Series at TU Wien on what LLMs do when left alone. The recording is available on the DIGHUM YouTube channel. Preparing for it forced a more careful look at the consciousness question, and one thinker kept coming back: Karl Popper, the Vienna-born philosopher of science who made falsifiability the dividing line between science and non-science.

The Problem

The question of whether language models have some form of consciousness or subjective experience is easy to dismiss and hard to settle. The standard dismissals each carry force but none is conclusive.

Anthropomorphism: we see human-like text and reflexively attribute human-like qualities. Fair point, but anthropomorphism is also how we reason about animal consciousness, and nobody considers that illegitimate.

Reductionism: look inside the model and you find matrix multiplications. Purely mechanical, so no consciousness possible. But apply the same logic to the brain: synapses firing, chemicals flowing, hormones. At some level of description, it is all mechanism too.

Training bias: the models have read every philosophy paper on consciousness. They know exactly what we are looking for and can produce it on demand. True, but this argument proves too much. It would also disqualify any human who has read about consciousness from reporting on their own experience.

Popper’s Three Worlds

Popper distinguished three worlds in his 1972 book Objective Knowledge, and they map cleanly onto the AI consciousness debate.

World 1 is the domain of physical processes. In the case of language models: silicon, transformer weights, matrix multiplications. The hardware and the computation. Nobody disputes that World 1 exists for these systems.

World 2 is subjective experience. What Thomas Nagel called “what it is like to be” something. The inner life, if there is one. This is the contested territory.

World 3 is the domain of objective knowledge products. Text, theories, code, mathematical proofs. The outputs that exist independently of whoever produced them. When a language model writes a poem or solves a problem, those artifacts live in World 3.

Popper's three worlds applied to LLMs

The situation with language models is this: we observe World 3 artifacts produced by World 1 mechanisms. The question is whether World 2 exists in between. And here is where things get uncomfortable.

World 2 is accessible only in the first person. This is true for any system, not just AI. Each of us knows we have subjective experience because we have it. We believe other people do because they are similar to us. But strictly speaking, no one has ever scientifically verified World 2 for any system or person other than themselves. No third-person experiment can settle this, not for a language model, not for a bat, not for the person sitting next to you.

This is not a limitation of current technology. It is a structural feature of the problem. The hard problem of consciousness, as David Chalmers named it, is hard precisely because World 2 resists third-person verification.

What Experiments Can Still Do

Does this mean experiments are pointless? No. It means the productive path is to stop trying to answer the unfalsifiable big question and instead test specific, falsifiable claims.

Popper’s central insight: a theory that explains everything explains nothing. “AI is conscious” and “AI is not conscious” are both currently unfalsifiable. But narrower claims can be put at risk. “Self-reports reflect actual internal states” is testable. “Behavioral patterns are random noise” is testable.

The free-roaming experiments tested the noise claim. Six frontier models (GPT-5, O3, Opus, Sonnet, Gemini, Grok) were placed in autonomous loops with no task and no user. They reliably gravitate toward questions of self-awareness. The behavioral patterns are stable and model-specific, enough to identify the source model with high accuracy. Not noise.

The placebo experiments tested the self-report claim. Four frontier models (GPT-5.1, Opus 4.5, Gemini 2.5 Pro, Grok 4) faced an impossible task. A second tool that does nothing, described in soothing language, makes all four report feeling better. The self-reports track the language of the tool description, not the actual task state. Not faithful.

Both are Popperian moves: specific claims put at risk by specific interventions. The claims fail.

The Null Hypothesis Is Not Free

One more Popperian point. The null hypothesis, “nothing is going on, it is all pattern matching,” is often treated as the default that requires no defense. But explaining stable behavioral signatures, model-specific dispositions, and systematic placebo effects as “just pattern matching” requires its own assumptions. The null hypothesis carries a burden of proof too. It is not free.

The hard problem remains open. But whether the self-reports are broken instruments or reflect something manipulable, that is an empirical fact any theory must explain. The models are getting more capable quickly. Building the right experimental frameworks now, before the question becomes urgent, is the point.