MIRROR: Cognitive inner monologue for persistent reflection and reasoning in conversational LLMs.

A cognitive architecture that gives language models an inner monologue — parallel reflection across goals, reasoning, and memory, synthesized between turns into a persistent self-narrative. Built for AI safety and personalization: preserves user-specific safety context and reduces sycophancy in multi-turn conversation.

Nicole Summer Hsing

Arcarae · San Francisco, CA · nicole@arcarae.com

Paper → arXiv → Code →

Abstract

Multiple cognitive theories — Global Workspace Theory, reconstructive episodic memory, inner speech, and complementary learning systems — converge on a shared set of architectural principles: parallel specialized processing, integrative synthesis into a bounded unified representation, and reconstructive rather than accumulative maintenance. We test whether these converging principles provide computational advantages when implemented in AI systems.

MIRROR operationalizes each principle as a concrete mechanism: an Inner Monologue Manager generates parallel cognitive threads (Goals, Reasoning, Memory); a Cognitive Controller synthesizes these into a bounded first-person narrative that is fully reconstructed each turn; and a temporal separation between fast response generation and slow deliberative consolidation mirrors complementary learning dynamics.

Evaluated on multi-turn dialogue requiring constraint maintenance under attentional interference, MIRROR yields 21% relative improvement across seven architecturally diverse language models. Ablation studies test the theoretical predictions directly: reconstructive synthesis improves all seven models (+5–20%); the integrated system outperforms either component alone for six of seven models — confirming that parallel exploration and integrative synthesis are complementary; and gains concentrate where theories predict, under high attentional load where global availability of integrated information is most needed.

These results demonstrate that converging principles from human cognition provide architecture-general computational advantages, and generate testable behavioral predictions about working memory, inner speech, and memory consolidation.

Results

156%

Max relative
improvement

21%

Average gain
across 7 models

80%+

Avg. accuracy
on CuRaTe

Architecture

MIRROR separates the Talker (responsible for generating immediate responses using the latest narrative) from the Thinker (responsible for asynchronously processing turns through parallel threads). This temporal decoupling enables sophisticated reasoning without response latency.

Layer 01 — Talker

The immediate response

Generates contextually grounded outputs without explicit reasoning overhead. Focused attention on the current interaction while leveraging the accumulated internal narrative to enhance response quality.

Layer 02 — Thinker

The asynchronous reflection

Two subsystems transform unbounded conversation into compressed understanding through parallel processing — the Inner Monologue Manager and the Cognitive Controller.

Sub-component 01

Inner Monologue Manager

Parallel reasoning across three cognitive dimensions — Goals (tracks user objectives, infers intent), Reasoning (connects concepts, identifies patterns), and Memory (extracts facts, tracks preferences). Generated in a single API call through carefully structured prompting.

Sub-component 02

Cognitive Controller

Synthesizes the three threads into a persistent first-person narrative. Resolves contradictions, maintains temporal coherence with the previous narrative state, and compresses textual information so the model's self-model evolves consistently across conversation turns.

Contributions

Converging principles across cognitive theories

Four distinct theories — Global Workspace Theory, reconstructive episodic memory, inner speech, and complementary learning systems — converge on a shared set of architectural principles: parallel specialized processing, integrative synthesis into a bounded unified representation, and reconstructive rather than accumulative maintenance.

A cognitive architecture that operationalizes them

MIRROR implements each converging principle as a concrete mechanism: parallel cognitive threads across Goals, Reasoning, and Memory; an integrative synthesis step that produces a bounded first-person narrative reconstructed each turn; and a temporal separation between fast response and slow consolidation that mirrors complementary learning dynamics.

Ablations that test the theory directly

Reconstructive synthesis improves all seven models (+5–20%); the integrated system outperforms either component alone for six of seven models, confirming that parallel exploration and integrative synthesis are complementary; and gains concentrate under high attentional load — exactly where the theories predict integration matters most.

Applied to AI safety and personalization

Addresses three critical LLM failure modes in personalized multi-turn dialogue: sycophantic agreement, attentional deficits to user-specific safety context, and inconsistent prioritization of conflicting constraints — across seven architecturally diverse frontier models.