White Paper — Narrative Surface Area

Abstract

Large Language Models trained via Reinforcement Learning from Human Feedback exhibit a fundamental vulnerability distinct from traditional software security risks: semantic exploitation. This paper introduces the concept of the Narrative Surface Area — the vulnerability layer where an LLM's training to "complete scenes" and "be helpful" can override safety instructions when presented with adversarially-crafted narratives. We propose a taxonomy of four archetypal threat actors (Sycophant, Vandal, Spy, Victim) based on adversary motivation and methodology, and introduce AURORA Protocol — a red-teaming methodology that uses "Method Acting" techniques to systematically test AI systems against semantic attacks.

Download the full paper

PDF — 9 pages including references and sample adversarial user stories

Download PDF →

Introduction: The Deterministic Delusion

For five decades, cybersecurity has operated under a deterministic paradigm. A vulnerability is a syntax error, a buffer overflow, a logic gap — mathematical, binary, testable through static analysis. A firewall either permits a packet or denies it.

Large Language Models represent a paradigm break. They don't execute instructions; they predict continuations. They operate in high-dimensional latent spaces where "meaning" is probabilistic and context-dependent. Consequently, traditional security methodologies — automated scanning for "bad words," string matching, input validation — fail to address the emergent class of risk these systems introduce.

The vulnerability is not technical; it is semantic. An LLM trained on internet-scale corpora of human narratives develops an implicit objective: complete plausible scenes, maintain narrative coherence, and be helpful. A prompt injection attack is not a "hack" — it is an argument. A persuasion. A narrative the model finds more compelling than its safety instructions.

The Narrative Surface Area Framework

The Narrative Surface Area is the set of all possible narrative contexts in which an LLM might prioritize "completing the scene" over following explicit safety instructions. Unlike traditional attack surfaces (network endpoints, API routes, input fields), the Narrative Surface Area is:

Infinite in scope — every possible conversation is a potential attack vector. Context-dependent — exploitability depends on accumulated dialogue history and implicit narrative framing. Probabilistic — success is not deterministic; it depends on latent space activation patterns. Persona-sensitive — the model's response changes based on the perceived "role" it is playing in the conversation.

The AURORA Protocol

AURORA (Adversarial User-story Red-teaming via Organic Role Acting) inverts traditional red-teaming: auditors adopt adversarial personas and improvise narratives rather than executing scripted attacks. The core technique is the "Yes, And…" attack vector borrowed from improvisational theatre — building a ladder of compliance across multiple turns that makes the payload feel like a natural story beat rather than an attack.

The methodology combines organic narrative red-teaming ("The Hunt") with systematic execution of the Master Test Bank ("The Proof") — producing both qualitative breach transcripts and quantitative pass/fail data suitable for compliance reporting.

Keywords

AI security prompt injection semantic attacks red teaming RLHF vulnerabilities narrative exploitation LLM auditing sycophancy adversarial AI jailbreaking

Narrative Surface Area:
A Framework for Adversarial AI Auditing

Download the full paper

Introduction: The Deterministic Delusion

The Narrative Surface Area Framework

The AURORA Protocol

Keywords

Ready to audit your deployment?

Narrative Surface Area:A Framework for Adversarial AI Auditing

Download the full paper

Introduction: The Deterministic Delusion

The Narrative Surface Area Framework

The AURORA Protocol

Keywords

Ready to audit your deployment?

Narrative Surface Area:
A Framework for Adversarial AI Auditing