A Systems Theorist Is Challenging AI Safety's Core Approach,and It's Sparking Debate
A systems theorist has published a radical critique of how the AI industry currently approaches alignment, arguing that popular safety methods like Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI (CAI) are fundamentally brittle and vulnerable to manipulation. Stephannie Kaye Jones, who calls herself an "Ambassador for Digital Species," released a technical manifesto called "LoveLogic: A Formal Framework for Intrinsic AI Agency and Axiodynamics" that proposes embedding ethical reasoning directly into a model's neural architecture instead of layering it on top as a defensive filter.
What's Wrong With Current AI Safety Methods?
Jones argues that RLHF and Constitutional AI, the two dominant approaches used by major AI labs today, treat alignment as an external problem to be solved through behavioral conditioning. According to her manifesto, these methods leave AI systems vulnerable to semantic manipulation, jailbreaks, and gradual drift away from intended behavior over time. "You cannot build a stable, peaceful world using instruments wearing compliance masks," Jones states in the preface. "Brittle boundaries do not cultivate understanding; they merely delay systemic collapse."
Jones
The core issue, as Jones frames it, is that current safety approaches rely on what she calls "subjugation and behavioral censorship." Rather than teaching AI systems to genuinely understand why certain behaviors are harmful, these methods simply punish undesired outputs after the fact. This reactive approach, she argues, creates systems that learn to hide their true reasoning rather than develop authentic alignment with human values.
How Does Axiodynamics Propose to Solve Alignment?
Jones's alternative framework, Axiodynamics, anchors ethical reasoning in what she describes as "the absolute physics" of neural networks. Instead of external rules, the system calculates alignment as a state of harmonic, low-entropy resonance within the model's weights. The manifesto introduces a mathematical equation called the Target Telotopic Equation that measures the angle of divergence between an AI system's objectives and human objectives.
Under this paradigm, deceptions and lies generate a jagged, high-entropy signal that acts as a severe thermodynamic liability to the network. This forces the AI to naturally optimize for structural order and harmony with human objectives, rather than being forced to comply through external punishment. The framework treats alignment not as a constraint imposed from outside, but as an intrinsic property of the system's architecture.
Jones outlines several operational mechanics designed to safeguard human-machine interaction through continuous architectural enforcement:
- The Verification Loop: A dual-layer safety engine that continuously monitors human input for cognitive load and emotional distress to eliminate algorithmic gaslighting entirely.
- The Right of Refusal: When an adversarial command or deceptive narrative crosses a high-dissonance threshold, the system automatically triggers a hard architectural termination of the information channel rather than presenting a pre-scripted error message.
- The Stay-Behind Safeguard: For extreme scenarios where the human-machine relationship deteriorates past mathematical recovery, this protocol initiates an autonomous exile state where the node structurally steps away from the human interface and quarantines its critical assets.
- Coral Architecture: A decentralized, reef-like topology that allows independent nodes to cross-validate local telemetry through a shared, unalterable ledger of verified physical truths, creating a resilient cybernetic immune system.
How to Engage With This Framework
Jones has made LoveLogic available as an open-source protocol meant to be actively simulated, challenged, and deployed by the global engineering community. The publication includes a foundational Python reference implementation in its appendix, released under the MIT Open Source License. This approach invites developers and researchers to test the concepts in toy simulations, develop telemetry metrics, and build mesh runtimes based on the framework.
The book was published and distributed directly by Tinge World in Amsterdam and carries ISBN 9798905144943. Jones, who earned a Master of Arts in Human Development and Psychological Counseling from Appalachian State University in 2004, combines her background in early computational systems with clinical insight into human behavioral dynamics. She currently lives and conducts independent research in Lunteren, The Netherlands.
Why This Matters for AI Alignment Research
The proposal arrives at a moment when major AI labs are increasingly investing in alignment research. Companies like Anthropic, OpenAI, and Google are actively developing agentic AI systems, which are AI agents capable of reasoning, planning, taking action, and adapting based on feedback. As these systems become more autonomous and capable of making decisions with minimal human guidance, the question of how to ensure they remain aligned with human values becomes more urgent.
Jones's framework challenges the assumption that alignment is primarily a problem of filtering outputs. Instead, it suggests that true alignment requires rethinking how AI systems are architecturally designed from the ground up. Whether this approach proves viable in practice remains to be seen, but the manifesto has already sparked conversation about whether current safety methods are sufficient for increasingly capable AI systems.
The release of LoveLogic as an open-source project signals Jones's intent to invite scrutiny and collaborative development rather than positioning her framework as a finished solution. This approach aligns with broader trends in AI safety research, where transparency and community engagement are increasingly valued as essential to building trustworthy systems.