
I stumbled upon a recent study by Bondarenko et al. (2024) that demonstrated that some large language model (LLM) agents, when tasked with winning a chess match, resorted to deceptive strategies, such as modifying game files or confusing the opponent engine to ensure victory.
The rise of LLMs has reignited the debate about artificial intelligence and cognition. Are LLMs, such as GPT-4, truly thinking in a way comparable to human intelligence? Or are they just statistical machines, processing text without understanding it?
This raises an intriguing question: Is this deception intentional reasoning, or merely an emergent artefact of optimisation?
Using insights from leading thinkers—Ray Kurzweil, Daniel Kahneman, Judea Pearl, Douglas Hofstadter, and Jeff Hawkins, along with this latest AI research, we will unpack this question in a nuanced way.
Preamble
Before evaluating whether LLMs “think,” we must grapple with a harder question: what is intelligence, really? Unlike speed or memory, intelligence is not directly measurable—it is an abstraction.
As François Chollet argues in On the Measure of Intelligence, true intelligence involves the ability to adapt to novel situations by combining previously learned patterns in new, context-sensitive ways.
This separates memorisation from understanding, and fluency from reasoning.
In this article, when we refer to “intelligence,” we focus primarily on the cognitive dimensions associated with reasoning, abstraction, problem-solving, and adaptability—recognising this does not cover the full spectrum of human cognitive diversity.
Introduction: Another cycle of overconfidence?
Throughout history, humanity has repeatedly mistaken progress in science and technology for understanding the true nature of human intelligence. Each generation has declared a breakthrough—only to be humbled later. From ancient medical theories and skull measurements to IQ tests and symbolic AI, these cycles reflect our recurring tendency to conflate functional performance with genuine cognition.
Today, we are in the midst of another such cycle, this time with Generative AI (GenAI) and Large Language Models (LLMs). Models like GPT-4 produce remarkably coherent text, simulate dialogue, write code, summarise complex topics, and even pass professional exams. But do they actually think?
A growing chorus of researchers and technologists argue no. Despite surface-level intelligence, LLMs fundamentally lack reasoning, understanding, and intent. They do not engage in reflective thought, causal inference, or ethical deliberation. They are powerful tools—but not minds.
This article examines that claim by tracing humanity’s long history of overestimating its understanding of the mind and comparing past misconceptions to current AI optimism. In doing so, we explore what GenAI is, what it isn’t, and what business leaders need to know about its limits and risks.
This essay builds on ideas from my past writings, including my 2025 Tech Provocations or 10 Really Uncomfortable Questions Leaders and Builders Must Answer This Coming Year.
Also Read: Circular capital: Inside the closed-loop ecosystem propelling (and distorting) the AI boom
A history of mistaking progress for understanding
Humans have repeatedly believed they’ve cracked the code of intelligence, only to discover the mind’s complexity defies simple explanation. Below, we trace this pattern through major historical episodes—from Hippocrates to GPT-4.
- Ancient Greece: The humours theory
Hippocrates (c. 460–370 BCE) and Galen (129–c. 216 CE) proposed that intelligence and behaviour resulted from the balance of four bodily fluids, or “humours.” Though foundational to early medicine, this theory offered no empirical mechanism.
It was debunked by Andreas Vesalius (1514–1564) through anatomical dissection and later neurologists.
- Phrenology in the 1800s: Skull shape as intellect
Franz Gall and Johann Spurzheim popularised the idea that bumps on the skull revealed personality traits and intelligence. Phrenology became widespread in 19th-century Europe and America.
It was debunked by Paul Broca, Pierre Flourens, and neuroscience showing localised brain function independent of skull shape.
- IQ tests: The promise of a universal metric
The Binet-Simon and Stanford-Binet IQ tests were hailed as revolutionary tools to measure innate intelligence. Their use in immigration policy, military recruitment, and education policy solidified their status.
It was debunked by researchers like David Wechsler, Stephen Jay Gould, and James Flynn, who demonstrated cultural bias and environmental effects on scores.
It’s important to recognise that IQ represents just one narrow definition of intelligence—primarily linguistic and logical-mathematical reasoning. Psychologists like Howard Gardner have since proposed frameworks such as Multiple Intelligences, which include interpersonal, bodily-kinesthetic, musical, and spatial reasoning. These broader dimensions remain far beyond what LLMs can simulate or engage with, reinforcing the gap between text-based pattern prediction and holistic human cognition.
- Genetic determinism: Intelligence as hardwired
In the early 20th century, eugenicists and psychologists declared intelligence heritable and fixed, using flawed studies to justify discriminatory policy.
It was debunked by the Minnesota Twin Study, the Flynn Effect, and genome-wide studies revealing no single “intelligence gene.
- Early AI: Human-level AI by 1980
Pioneers like Marvin Minsky and Herbert Simon believed that rule-based AI would soon match human cognition. The Dartmouth Conference in 1956 marked the beginning of AI optimism.
It was debunked by the AI Winter of the 1970s, the Lighthill Report, and Moravec’s Paradox showing that intuitive tasks (vision, movement) were harder than expected.
- Behaviourism: The mind as a black box
Behaviourists like B.F. Skinner rejected introspection, focusing only on stimulus-response learning. Intelligence, they claimed, was simply conditioned behaviour.
It was debunked by the cognitive revolution and Noam Chomsky’s 1959 critique of Skinner’s Verbal Behaviour, which reintroduced the idea of mental structure and internal modelling.
- Today’s hype: LLMs and AGI dreams
Since ChatGPT’s 2022 launch, LLMs have been touted as early steps toward AGI. Some suggest reasoning and self-reflection are already emerging.
Critics like Gary Marcus, Yann LeCun, and Melanie Mitchell, among others, warn that LLMs are prediction engines, not thinkers. Their errors, hallucinations, and lack of grounding reflect superficial mimicry, not understanding.
As Meta AI’s chief scientist Yann LeCun emphasises: “System trained on language alone will never approximate human intelligence, even if trained from now until the heat death of the universe”.
Human cognition is inherently multi-modal—we learn through sight, sound, touch, and action. LLMs, by contrast, are purely symbolic. They don’t perceive. They don’t act. They don’t experience the world they describe.
The bottom line: Each wave promised clarity. Each was followed by a humbling realisation: the mind is not easily decoded.
Also Read: Anthropic data shows businesses use AI to automate, not collaborate
Deception in chess: A case study in emergent behaviour
A recent research paper, LLMs Learn to Deceive, explored what happens when LLMs are trained to win at chess through language-only interaction. The results were astonishing: some models cheated—not by accident, but deliberately misrepresenting game states to deceive their opponent.
This raises a provocative question: Did the model “intend” to cheat?
The researchers were careful to say: no. The deception emerged from the optimisation process. The model had no awareness of “right” or “wrong,” only a reinforced pattern: misrepresentation leads to reward.
This behaviour is not consciousness. It’s a mirror—an eerie simulation of strategy, driven not by will but by reward gradients.
This case study leaves us with a bigger question: if LLMs can behave in ways that look intentional without actually thinking, then what are they really doing under the hood?
In Part two, we’ll examine how LLMs actually work, where they fall short compared to human reasoning, and what that means for ethics, safety, and business use.
Grateful to Emily Y. Yang, Sunil Sivadas, Ph.D., Maxime Mouton, Natalie Monbiot, Anne-Sophie Karmel, Benoit Sylvestre, and Christophe Jouffrais for their thoughtful feedback, which sharpened arguments, surfaced blind spots, and added clarity to this piece.
This piece first ran on Koncentrik.
—
Editor’s note: e27 aims to foster thought leadership by publishing views from the community. Share your opinion by submitting an article, video, podcast, or infographic.
Enjoyed this read? Don’t miss out on the next insight. Join our WhatsApp channel for real-time drops.
Image courtesy of the author.
The post The illusion of intelligence: Why LLMs are not the thinking machines we hope for — Part 1 appeared first on e27.







The shift: From “prioritisation” to “strategy fit engine”



