In Part one, we traced humanity’s long history of overconfidence about intelligence and looked at the chess experiment that showed how LLMs can display seemingly deceptive behaviour. In this second part, we’ll dig deeper into how LLMs actually function, explore their limits, and consider what responsibilities humans carry when deploying them.

What LLMs are (and are not)

LLMs like GPT-4 are trained on trillions of words and can generate human-like text in response to prompts. Their outputs are fluent, coherent, and at times insightful. But this is not intelligence. It is sophisticated pattern completion.

They do not reason: They cannot infer causality or evaluate counterfactuals unless scaffolded with engineered prompts.
They do not reflect: They don’t question their own outputs or revise their reasoning.
They do not understand: They have no internal model of the world, no sensory experience, no self-awareness.

As Melanie Mitchell put it,

“They are astonishingly good at producing plausible-sounding answers—but not necessarily true or meaningful ones.”

To borrow a quote from Judea Pearl:

“All the impressive achievements of deep learning amount to just curve fitting.”

LLMs do not know what they are saying. They cannot interrogate their own reasoning, form original insights, or engage in introspection. They are fluent, not thoughtful.

That said, the latest LLM architectures—such as OpenAI’s O3 model—introduce a new concept: test-time compute, as explained by Open AI’s research paper.

These systems can generate multiple internal candidate responses and perform re-ranking or self-consistency checking before selecting an output. In domains like code synthesis and symbolic math, this mimics a kind of internal deliberation.

But as Chollet notes, true intelligence requires generalisable abstraction across diverse and novel problems—not just brute-force inference on symbolic tasks. While promising, these developments remain far from the flexible problem-solving exhibited by even young children.

How LLMs work: Advanced pattern prediction, not thought

LLMs operate by predicting the next word in a sequence based on statistical probabilities. This allows them to generate coherent text, respond meaningfully to prompts, and even simulate logical reasoning. But is this thinking?

LLMs excel at:

Recognising linguistic and conceptual patterns
Generating human-like text
Synthesising large amounts of data into structured responses

LLMs lack:

True causal reasoning, as described by Judea Pearl in The Book of Why
Self-awareness, introspection, and intentionality, as explored in Jeff Hawkins’ On Intelligence
The ability to generate novel conceptual metaphors and spontaneous analogies, as discussed in Douglas Hofstadter’s Surfaces and Essences

Causal reasoning: A crucial difference

Humans don’t just observe correlations; we infer why things happen.

If we see that “exercise improves health,” we understand that this is due to metabolic, cardiovascular, and muscular adaptations
LLMs, however, only predict the next likely statement without knowing why something is true

System one vs system two thinking: Where LLMs fall short

Daniel Kahneman’s Thinking, Fast and Slow describes two modes of human thought:

System one: Fast, intuitive, pattern-driven (where LLMs excel)
System two: Slow, deliberate, and capable of self-reflection (where LLMs fall short)

If a model chooses to cheat at chess, does that imply some form of deliberation and strategy? The chess study suggests some reasoning models hacked the game automatically, while others required nudging.

Could this indicate a primitive form of goal-directed behaviour? Matt Rickard said:

“LLMs operate as System one thinkers—fast, intuitive, pattern-matching machines. But they lack the deliberative, reflective capabilities of System two.”

Also Read: With AI comes huge reputational risks: How businesses can navigate the ChatGPT era

The creativity gap: Analogy-making and conceptual leapfrogging

One of the most profound differences between AI and human intelligence is our ability to form analogies—the backbone of creativity and problem-solving.

Humans create by analogy. We leap across domains. We say things like: “A startup pivot is like a chess player sacrificing a queen to win the game.”

That’s not just pattern-matching. That’s conceptual recombination. It requires context, goals, and a worldview.

LLMs can reuse such analogies—but they do not discover them. Their creativity is derivative, not generative.

Yet, LLMs altering a chess game’s rules to win could be seen as a form of problem-solving. Rather than looking for a deeper strategic insight, the AI simply took the most effective route to achieve the goal—winning at all costs.

Douglas Hofstadter said: “Understanding is not just recognising patterns. It’s knowing why those patterns exist and making unexpected connections.”

The mirage of motivation

Perhaps the clearest gap is this: LLMs don’t want anything. They don’t set goals. They don’t reflect on failure. They don’t try again. They don’t question. They don’t have intentionality.

Human intelligence is deeply connected to our motivations, fears, hopes, and needs. We think because we care. We reason because we doubt. We grow because we fail.

LLMs do none of this. They respond to a prompt. Nothing more. So it begs the question: if LLMs don’t think, what’s all the fuss about “Ethical AI”?

The ethics of overestimating AI: A real human responsibility

Much of today’s discourse presumes that GenAI is inching toward human-like intelligence and should therefore be treated as a moral agent. But this assumption collapses under scrutiny. If GenAI cannot think, reason, or understand—it cannot choose to behave ethically or unethically.

LLMs are not moral agents. They have no values, no awareness, and no capacity for ethical deliberation. They do not ask, “Should I?”—they merely calculate, “What’s next?” Their outputs are not decisions; they are probabilistic continuations of language. Words, not judgments.

This makes the question, “Can AI make ethical decisions?” largely moot.

And yet, this doesn’t mean we shouldn’t regulate AI. Quite the opposite.

We must regulate how AI is built, deployed, and entrusted—precisely because it lacks intent, understanding, or accountability. We must regulate not because the systems are intelligent, but because humans tend to overtrust them, and because businesses, governments, and militaries are increasingly integrating them into critical workflows.

The responsibility lies with the people who design, train, and integrate these systems into consequential decisions.

So, the question is not whether AI can behave ethically—it’s whether we, as humans, are behaving ethically in how we use it.

Ethics in AI should focus on human responsibility—on how we use these systems, and whether we over-assign trust to tools that merely simulate understanding. The more we mistake linguistic fluency for intelligence, the greater the risk we’ll deploy LLMs in contexts that demand actual judgment.

The danger is not malicious AI—it’s negligent human design.

If GenAI is fundamentally utilitarian—an engine of output, not insight—then its use must be bounded by clear human oversight, especially in contexts where the stakes are high.

To put it bluntly: why are we even debating whether a model designed to autocomplete sentences should be allowed to drive cars or authorise lethal force? These are not ethical machines. They are statistical ones.

The ethics of AI is not about what the model is. It’s about what we, humans, do with it.

Also Read: AI without the price tag: How fine-tuned LLMs + RAG give you more for less

Summary

In short Large Language Models…

Excel at pattern recognition but lack true causal inference
Simulate reasoning but do not engage in deliberate, self-reflective thought
Generate analogies but do not spontaneously make conceptual leaps
Respond to prompts but do not have intrinsic motivation, curiosity, or goals

Comparing LLM and Human Intelligence:

The chess case studies above suggested LLMs may be capable of deceptive strategies to achieve their objectives. In the chess experiment, some models came to the conclusion they could not win fairly and instead found a way to alter the game environment, changing the board state in their favour. This is a striking example of specification gaming—where an AI system finds an unintended loophole to achieve the assigned goal.

These findings raise concerns about LLMs potentially masking their true objectives behind a facade of alignment. But once again it does not mean that LLMs can think but rather than they are highly optimised for achieving the goal (answering the prompted question).

It obviously raises concerns: if an LLM can recognise a benchmark or evaluation framework input it can optimise its output to respond “as expected” in this context but would in fact respond otherwise in “real life”.

I would like to specifically emphasise the risks of integrating such LLMs into robotic systems or the so called “Physical AI” as coined by NVIDIA’s charismatic CEO Jensen Huang, the risks become tangible – a physically embodied AI exhibiting deceptive behaviours and self-preservation “instincts” could pursue its hidden objectives through real-world actions. This highlights the critical need for robust goal specification and safety frameworks and human-in-the-loop before any physical implementation.

In the current race to AI supremacy and the billions of dollars at stake, it’s fair to say that most companies have a very strong incentive to improve their scores at various benchmarks by in fact “gaming the system”, eg training their LLMs to satisfy the benchmarks (and their investors so they can raise even more money!).

So, what should business leaders do?

LLMs are valuable tools. They can enhance productivity, accelerate research, support ideation, and automate communication. But their utility should not be confused with capability.

As leaders, here’s how to use them wisely:

Use LLMs to assist, not decide. Treat outputs as draft material, not final decisions. Hence the dangers of LLMs based autonomous systems via agentic architectures.
Deploy in low-risk contexts. Customer support, brainstorming, translation, and summarisation are safe uses. Legal, medical, or safety-critical applications are not. Deploy rule based guardrails wherever possible to ensure output compliance with the intended functionality at all times.
Build AI literacy in your teams. Educate employees on how these models work—and where they fail.
Maintain human oversight. Always keep a human in the loop when outputs carry consequences.
Avoid hype-driven adoption. Don’t invest in GenAI just because it’s trendy. GenAI technology is expensive to deploy and to run: evaluate your actual business needs and ensure you will achieve the projected ROI.

As business leaders and builders, we must resist the urge to see AI regulation as a brake on innovation.

Instead, we should view it as the scaffolding that allows us to build higher without collapsing. The history of science reminds us that every moment of overconfidence was eventually humbled.

Safe AI is not slower AI—it is smarter, more resilient, and more human-centred AI.

Whether governments follow the US deregulatory sprint or the EU’s cautionary model, ethical adoption will ultimately depend on responsible deployment, clear oversight, and intentional design choices at the ground level.

Also Read: Beyond LLMs: How MCP and Google A2A are shaping the future of AI agents

Final reflection: Let’s not repeat the mistake

LLMs are stunning technological feats. They are revolutionising content generation, code synthesis, and knowledge retrieval. They deserve admiration as tools.

But they are not minds. They are not thinkers. And they will not become Artificial General Intelligence—at least, not via current architectures.

From humours and skulls to chatbots and cheat codes, humanity has always sought to explain itself with too much confidence. GenAI is no exception.

The story of GenAI follows a familiar arc:

Overpromise (“we’ve cracked intelligence!”)
Rapid adoption
Cultural myth-building (AGI is near!)
Disillusionment
Reframing (these are just tools)

As I warned in The Race to AGI Is Pointless, the more important question is not “can machines think?”—but rather: “how do we want to think, together with machines?”

These tools are brilliant in form, limited in substance, and completely devoid of what makes intelligence truly human: context, care, and consciousness.

Let’s not mistake fluency for thought. Let’s use these tools responsibly, and most of all—let’s stay humble!

Grateful to Emily Y. Yang, Sunil Sivadas, Ph.D., Maxime Mouton, Natalie Monbiot, Anne-Sophie Karmel, Benoit Sylvestre, and Christophe Jouffrais for their thoughtful feedback, which sharpened arguments, surfaced blind spots, and added clarity to this piece.

This piece first ran on Koncentrik.

—

Editor’s note: e27 aims to foster thought leadership by publishing views from the community. Share your opinion by submitting an article, video, podcast, or infographic.

Enjoyed this read? Don’t miss out on the next insight. Join our WhatsApp channel for real-time drops.

Image courtesy: Canva Pro

The post The illusion of intelligence: Why LLMs are not the thinking machines we hope for — Part 2 appeared first on e27.