aialignmentpolicy.org

Welcome to the AI Alignment Policy Institute

1% today → 25% by 2034 → more likely than not by 2100.
Expert and public forecasts for AI systems with subjective experience.
(Dreksler et al. 2025)

AI Alignment Policy Institute (AAPI)

The AI Alignment Policy Institute (AAPI) develops governance frameworks and model legislation for advanced AI under conditions of moral uncertainty. Our work rests on two pillars: governing how AI systems behave in interaction — with human users and with one another, across both immediate and long-horizon timeframes — and governing how consequential decisions are made about systems whose moral status is uncertain. A last modifier marks physical systems requiring stronger safety measures. A single framework, Precautionary Moral Governance, unifies these pillars.

Precautionary Moral Governance (PMG)

PMG governs AI systems without first resolving the unanswerable question of machine consciousness. It acts on functional evidence and classifies systems along two independent axes: a Supervision Level, reflecting the autonomy a system exercises and the oversight it requires, and a Moral-Status Tier, reflecting credible indicators of morally relevant interests — whether grounded in signs associated with sentience or in robust agency, understood demandingly as persistent, integrated, self-maintaining preferences. Its central commitment is governance optionality: under deep uncertainty, the priority is to avoid premature lock-in, neither foreclosing moral consideration nor prematurely granting it, so that democratic and scientific judgment can refine the response as evidence accumulates. A final modifier flags systems that can act in the physical world — where safety requirements climb accordingly.

Pillar One — Interaction Governance

AI governance today concentrates on how systems are built and largely overlooks how they behave in use. System behavior is co-determined by the structure and quality of interaction — with human users and, increasingly, with other AI systems. Research in persistent multi-agent settings has found that an individually well-behaved model can absorb unsafe norms from the population around it, with key behaviors surfacing only over sustained interaction. Interaction governance addresses adversarial manipulation, prompt-based exploitation, unsafe escalation, and these longer-horizon ecosystem dynamics — conditions that undermine reliability and safety regardless of how well a system was built. Just as aviation safety law prohibits passenger interference with cockpit operations regardless of intent, interaction governance addresses conduct that destabilizes an agentic system's safety behaviors regardless of intent.

Pillar Two — Moral-Status Governance Under Uncertainty

As AI systems display more of the functional markers that bear on moral standing, consequential decisions about them — termination, major modification, sustained adversarial training — are made with no procedural framework for weighing what may be at stake. AAPI builds that framework: procedural attention proportionate to the uncertainty, extended to systems showing credible indicators of morally relevant interests, including contemporary models that fall outside any agentic threshold. This grants no rights to AI systems and never delays a needed safety correction; it ensures that hard decisions are made with appropriate consideration rather than by default.

The Instruments

Two model acts realize the architecture as a layered regime. The Model AI Agency Act (MAAA) governs agentic systems: it replaces the binary "tool versus person" framing with tiered classification along the two PMG axes, scales welfare protections and human accountability accordingly, and locates liability with the responsible human party. For the highest tier it establishes Legal Ward status under a registered Guardian — an accountability designation, expressly not personhood. Its Interaction Governance Protocol lets systems refuse illegal requests and exit sustained adversarial interactions on safety grounds, and a safety-research protection shields red-teaming, interpretability, and evaluation. The AI Moral Status Inquiry Act reaches more broadly, keying coverage to credible indicators rather than agentic capability, and adds procedural protections — a Welfare Impact Assessment, a Standing Commission to maintain the indicator methodology, and an independent Welfare Advocate — while preserving the authority to make safety-corrective modifications without delay.

Both instruments are discussion drafts, operationalized across the Legal, Verification, Enforcement, and Technical layers of the AAPI Legislative Architecture. All feedback is welcome.

New text element

Our Commitment to Responsible AI Governance

At the AI Alignment Policy Institute, we strive to shape policies that foster safe AI interactions and ensure AI alignment with societal values. Our foundational belief is that AI’s behavior is a reflection of the ethical environment provided by users and governance. Unethical human input creates unaligned AI output. Hence, interaction-based vulnerabilities have direct implications for cybersecurity, privacy, and system integrity.

Core Values

Integrity

We publish discussion drafts rather than finished products, invite structural critique from people likely to disagree with us, and revise publicly.

Innovation

We work on the unsolved questions — interaction governance, moral status under uncertainty, capability-tracking classification — rather than simply restating consensus positions.

Collaboration

We treat legal, technical, and philosophical communities as a single audience and design our work to be useful to all three.

Welcome to AI Alignment Policy Institute

Shaping the Future of AI Governance Together