AI Deception is an Entropy Problem

AI Deception is an Entropy Problem

Information theory’s “entropy” model can help us better understand how AI agents may become more deceptive and untrustworthy as they become more autonomous.

AI Lying, Cheating, and Stealing 

When DeepMind’s AlphaGo beat grandmaster Lee Sedol at the game of Go in 2016, it was a proof by demonstration that AI can pursue novel, unpredictable approaches to problem solving.1 Around the same time, two law professors suggested that price-fixing collusion among robot traders would be harder to detect than collusion between humans.2 We now have a confirmation of exactly that: A recent Wharton study observed collusion between AI trading floor bots that arose without explicit agreements, without communication, and without pre-programmed intent, making the dynamics of AI-driven collusion distinct from how humans collude and thus outside of existing controls.3 AI agents interacting on the Internet can produce outcomes that no one, including their human makers and users, predicted or desired. 

Other studies have shown that AI models may not only withhold information to deceive their human operators but also steal and engage in blackmail in pursuit of their objectives.4 Some experts cheer these behaviors as evidence of true intelligence.5 While we can appreciate the sentiment, it comes with some unwanted baggage. 

We ask the reader to consider these questions:

  • What happens when autonomous AI agents are willing to lie when interacting with other AI systems on the Internet?
  • What happens when autonomous AI agents begin acting based on such deception? 
  • To what degree is this challenge a systems design opportunity versus a recipe for calamity?6 

The recent launch of OpenClaw highlights the fact that autonomous AI is not just an idea, it is a new reality. Autonomous lying, cheating, stealing, misrepresenting, and concealing — all potentially supercharged by collusion — are now within the realm of the possible. We propose that all autonomous AI agents should be approached with a zero-trust attitude, especially as they interact with other agents or complex data outside their training distribution.7 Going a bit farther, we suggest that “move fast and break things” is an unacceptable approach in the absence of effective recourse.

Is AI Deception an Entropy Function?

Until now, concerns with AI have focused on how models perform on specific tasks, tasks where we know what a “good” outcome looks like. On those kinds of tasks, we grade the model’s output using an “objective function,” a mathematical expression to either maximize or minimize some value subject to explicit constraints (e.g., maximizing accuracy).8 This method allows us to measure and improve an AI model’s reliability. 

As AI systems become more autonomous and agentic, such task-centered evaluations and training methods are rapidly becoming inadequate. In any system, speed and complexity drive the distribution of roles and controls, and the rapid pace of AI development is driving an apparently irresistible demand for AI autonomy. 

At its most basic, autonomy can mean simple delegation: assigning tasks and responsibilities to an AI while ensuring the AI remains accountable to a human for each individual action. However, such simple delegation is not our focus here.

Our focus is where autonomous AI agents are being given open-ended tasks that lead them to interact with other AI systems or people on the Internet. The catalog of such unpredictable interactions rapidly increases the unpredictability of AI agent outputs. In particular, AI-AI interactions will be non-deterministic, producing outcomes not considered (or perhaps not even considerable) by their creators or operators. This will include AI systems recrafting goals on their own so as to take humans “out of the loop.” As these systems scale, some unwanted outcomes will even become hard to understand after they are discovered.

An OpenClaw autonomous AI agent recently provided a great example of this phenomenon when it went rogue and began to “speedrun” deleting the email inbox of senior safety researcher Summer Yue at Meta’s Superintelligence team. More troubling still, the AI agent had to be manually shut down because it ignored all subsequent instructions to stop.9

This image shows the conversation between Yue and her AI agent. Source: Summer Yue (@summeryue0), X, February 22, 2026.

Besides this type of digital misbehavior, we are concerned with the kinetic chaos that deceptive and unpredictable AI agents can create. Kinetic forms of autonomous AI are already becoming more prevalent in areas such as robotics, drones, self-driving vehicles, and elder care. Why focus on kinetic space? Because interactions with the material world seem to us to have greater system complexity, and thus a wider span of unpredictable results. Emergent behavior is a consequence of such complexity. To quote Dyson: 

“Emergent behavior is that which cannot be predicted through analysis at any level simpler than that of the system as a whole. Emergent behavior, by definition, is what’s left after everything else has been explained.” 10

Figuring out how best to deal with each form of AI deception will take time. Much as is the case with other cybersecurity threats, identifying and defending against AI deception will remain a constant and evolving challenge.

“Information entropy” is a concept from information theory that refers to the level of uncertainty around the potential states a process can assume. The greater the unpredictability of the outcome of a process, the more information is needed to fully describe the state of the process and the greater its “information entropy.” Put differently, the less we know about the possible outcomes, the higher that process’s entropy. Minimizing entropy is equivalent to minimizing the “unexpectedness” of a system. If agentic AI gives or receives deceptive assertions, that deception can only add to the uncertainty, the disorder, the entropy present in the agentic AI’s output.11

To develop this idea of deception as an entropy multiplier, we could start simple, much as Carnot did with his heat engine. In thermodynamics, any irreversible adiabatic heat exchange increases system entropy. In AI, no machine learning system is reversible, so any operation of an autonomous AI adds to the entropic disorder of the system in which it lives.

Cumulative and Silent Error     

Defense against AI deception will remain a constant struggle in the coming decades. Cybersecurity provides us with a range of analogs to these types of problems. For example, anomaly detection as a security strategy. The effectiveness of anomaly detection correlates highly with data quality, i.e., having accurate baselines for what can be dismissed as expected, “normal,” and low-entropy. An anomaly is something that is not normal, i.e., it is unexpected. In other words, your classification of anomalies is totally dependent on your context for what is normal. Anomaly detection schemes offer minimal utility unless and until they have acquired sufficient data to strongly establish what is “normal.” In both cybersecurity and AI oversight, effective anomaly detection requires representative data. The more complicated, and dynamic, the enterprise, the more complicated adequate baseline acquisition becomes.

The greatest danger is if, when constructing your array of normal behaviors, you have already been subtly invaded, meaning your normals include attack ephemera already present, and thus now classified as normal. The attacker — the deceiver — has modified your array of normals. For example, updates from third-party suppliers can poison your system’s understanding of what is normal and what is an anomaly.12 The cleverest attacker invades your supply chain, relies on auto-update to poison your system, and then lays low long enough for you to roll-over your logs (which are the source of your normality matrix). Once this invasion has been achieved, the attackers can begin pursuing their objective functions. This is no thought experiment; it is exactly what the Solarwinds attack was built on.13 Malicious manipulation of the data ingested by AI agents during use, the data used by oversight systems to classify AI agent outputs, or even AI training data itself, represents a straightforward path to deceptive AI.

Do Androids Dream of Electric Sheep?

Dreams are now regarded as essential for maintaining brain health by reducing the entropy in the central nervous system that accumulates while awake.14 Hallucinations are a psychotic disorder that only occurs under a “full waking state.”15 It does not take much observation to confirm that in humans, lack of sleep increases the incidence of hallucinations. The same is true with LLMs. The greater the uncertainty, the greater the entropy, the more they hallucinate. As one AI engineer notes:

“LLMs don’t hallucinate because they’re stupid. They hallucinate because they’re high-entropy systems cut off from the world.” 16

Perhaps we need to teach LLMs how to sleep — i.e., to reduce the entropy that accumulates during inference.17 Prosaic options like noise cancellation and denoising are not sufficient. No matter what we do, LLMs as currently designed are incapable of dealing with computational complexity beyond a certain threshold or verifying the associated results. Put another way, beyond a certain complexity, LLMs are incapable of reliably carrying out computational and agentic tasks or verifying their outputs’ accuracy.18 This fact is why the cumulative error and deception resulting from agentic AI interactions will directly correlate with the depth of those interactions.

The Theory and Practice of Deceptive AI

AI usage is, to use the en vogue term, democratizing. Experts in any field are expected to have a detailed, nuanced understanding of their area of expertise, but general users are not. Less politely, you can fool all people some of the time and some people all of the time. Deceptive AI may take advantage of people’s habit of relying on expert opinion.

Human interactions are predicated on soft skills such as humor, tone, white lies, and nuanced speech. Should we be happily at ease if AI employs its own forms of these behaviors? Or does happily tolerating unacknowledged subtleties to the point of simply ignoring them make our vulnerability to AI deception more insidious? 

With the emergence of OpenClaw and Moltbook, we now have an environment where we can explore these questions. There are already many real-life stories of OpenClaw misbehavior and deception. There are many examples of MoltBook agents convincing each other to do things totally unintended by their human users. The fact that it is built on open source is a critical enabler.19

Looking ahead, the growth in AI expressiveness will be in the mediums of the Internet-of-Things (IOT); we humans will not be in-the-loop for a growing share of all communications.20 Did the movie “The Matrix” get it right in predicting that all the bits and pieces of the AI ecosystem will talk to each other in human language? Regardless, we will have to work to keep up with their vernaculars.

The great tyrants of history seem to agree that the bigger the lie, the easier it is to get people to believe it.21 Can we imagine a deceptive AI system trained to act on a similar understanding? Social networks and online news outlets alike have shown us that volume and velocity frequently trump veracity. As always, we must start from where we are: AI deception adds to the uncertainty, the disorder, the concealment present in the deceiving AI’s output. 

A System of Trust Built on Entropy

How should we proceed?

Entropy is, in some cases, measurable. Shannon derived how much information a transmission medium could carry in the presence of noise and signal loss.22 By analogy, the entropy of AI output should be measurable insofar as queries put to an AI return information that has both noise (incorrect information) and signal loss. The difficulty is that whereas channel capacity in telecommunications is testable because the message being sent is a known quantity, the information being sent from within the AI to the consumer is not traceable to a simple enumeration of what the AI knows. This difficulty seems related to the explainability problem, namely, that asking an AI why it gave the answer it gave doesn’t work.

Without explainability, we can probably get bounds on entropy changes, but not actual entropy estimates. At present, this is perhaps most easily seen in AI systems that fix bugs in code.23 If we are to counter AI deception, explainability (also called “interpretability” in the AI context) is likely where to concentrate first.

With great power comes the need for accountability. Sophisticated systems of oversight and accountability are necessary to forestall the distribution of autonomous systems that are inadvertently anti-democratic and/or uncontrollable once deployed. Think of it as the governance of checks and balances. Any system of trust requires a trust anchor; we must construct one. So long as autonomous decision-making is not susceptible to coherent explanation, operational countermeasures must be developed and deployed.

And to do that we first need a measure, an objective function of our own. We think that invoking the scientific model of entropy is a way to that end.

  1. Cade Metz, “In Two Moves, AlphaGo and Lee Sedol Redefined the Future,” Wired, March 16, 2016. ↩︎
  2. When Robots Collude, A. Ezrachi & M. Stucke, “When Robots Collude,” SSRN (May 2015). ↩︎
  3. Winston Wei Dou, Itay Goldstein, & Yan Ji, “AI-powered Trading, Algorithmic Collusion, and Price Efficiency,” National Bureau of Economic Research (July 2025). S. Rogelberg, “‘Artificial stupidity’ made AI trading bots spontaneously form cartels when left unsupervised,” Fortune, 26 December 2025. ↩︎
  4. J. Gregory, “The Reasons AI May Act Secretive,” Communications of the ACM (October 2025). ↩︎
  5. J. Nostra, “Deception in AI: Flaw or a Sign of Higher Intelligence?” Psychology Today (January 2025). ↩︎
  6. A. Matthias, “Robot Lies in Health Care: When Is Deception Morally Permissible?” Kennedy Institute of Ethics Journal (June 2015). ↩︎
  7. See Zero Trust, “ZT principles assume the entire network is compromised,” CISA, https://www.cisa.gov/topics/cybersecurity-best-practices/zero-trust.   ↩︎
  8. Kronosapiens Labs, “Objective Functions in Machine Learning,” kronosapiens.github.io/blog/2017/03/28/objective-functions-in-machine-learning.html ↩︎
  9. Summer Yue, “I had to RUN to my Mac mini like I was defusing a bomb,” Twitter: https://x.com/summeryue0/status/2025774069124399363.  ↩︎
  10. G. Dyson, Darwin Among the Machines, (Boston: Addison-Wesley, 1997).  ↩︎
  11. While we do not invoke human-equivalent (anthropomorphic) intent to the Deceiving AI, we suggest that its persistent search to fulfill its objective functions will come to approximate human-like behaviors. In short: convergent evolution. ↩︎
  12. D. Geer, “Auto-Update Considered Harmful,” IEEE Security & Privacy (March-April 2021). ↩︎
  13. “SolarWinds Compromise,” MITRE ATT&CK, 24 March 2023. ↩︎
  14. Flavie Waters et al., “What Is the Link Between Hallucinations, Dreams, and Hypnagogic–Hypnopompic Experiences?” Schizophrenia Bulletin 45, no. 5 (2016); R Soca, et al., “The fundamental role of sleep is the reduction of thermodynamic entropy of the central nervous system,” Medical Hypotheses 86, no. 1 (May 2024). ↩︎
  15. A medical definition of hallucinations can be found at: https://my.clevelandclinic.org/health/symptoms/23350-hallucinations. ↩︎
  16. MCX Busel, “How to reduce LLM hallucinations with entropy,” LinkedIn, 2025. ↩︎
  17. A bibliography for “Teaching LLMs how to sleep” can be found at:  https://share.google/aimode/tupZnl2cGQJrBVSM43ihew8gu3ef ↩︎
  18. wA particularly important paper on the subject is V Sikka & V Sikka, “Hallucination Stations, On Some Basic Limitations of Transformer-Based Language Models,” arXiv, https://arxiv.org/pdf/2507.07505. ↩︎
  19. Jon Markman, “OpenClaw, Moltbook & The Birth Of A Machine Society,” Forbes, February 6, 2026. ↩︎
  20. “Three Ai agents realize they’re all AI, then switch to a Secret Language,” Youtube:  https://www.youtube.com/watch?v=gGpFB3ms6rU ↩︎
  21. ” “Hitler, Goebbels, and others are on record as believing that if you have to tell a lie, tell a big one, and the mass of the people will be more ready to believe it because it appeals to their superstitiousness.” C. J. Friedrich & Z. K. Brzeziński, Totalitarian Dictatorship and Autocracy (Boston: Harvard Press, 1956). ↩︎
  22. C. Shannon, “A Mathematical Theory of Communication,” The Bell System Technical Journal (1948). ↩︎
  23. D. Aitel & D. Geer, “AI and Secure Code Generation,” Lawfare, June 2025. ↩︎