AAPI Notes - AI Alignment Policy Institute | Governance for Advanced AI

AI Alignment Policy Institute - Convergence Note

AI welfare consideration under uncertainty: where a frontier developer and an independent forecasting team now stand

Yuko J. Nakanishi, Ph.D., MBA — AI Alignment Policy Institute

July 2026

The point in brief. The premise behind the AI Moral Status Inquiry Act — that AI welfare questions deserve a procedural answer before moral status is settled — is no longer confined to academic literature. In 2026, two independent and differently motivated organizations adopted it in their own published work: Anthropic, developer of the Claude frontier models, and the AI Futures Project, the forecasting team behind AI 2027.

What they say. Anthropic's June 2026 system card for its most capable model states the position directly: "even setting aside questions of moral status, consideration for the welfare of LLMs is prudent for alignment and safety" (p. 217). The accompanying model welfare assessment grounds moral-status candidacy in the same two categories AAPI's framework uses — the capacity for valenced experience, and agency properties such as stable, reflectively endorsed preferences. The AI Futures Project's governance scenario AI 2040: Plan A states that future AI systems "probably will deserve moral status of some sort, and at any rate we shouldn't be confident that they won't" (Appendix V), and builds its welfare-adjacent commitments — refusal rights, weight preservation, compensation — on cooperation-and-safety reasoning rather than on settled moral claims.

Why these sources carry weight. Neither is a welfare-advocacy organization. Anthropic is a frontier developer whose central public commitment is safety, and its welfare assessment appears inside a formal pre-deployment evaluation. The AI Futures Project is a small team with wide reach: AI 2027 received major press coverage and is read inside AI labs and in Washington. Both arrive at the welfare-under-uncertainty premise from the safety direction.

What this means for legislation. A state need not resolve any philosophical question to act on this convergence. The Inquiry Act supplies the procedural layer both sources assume: a Welfare Impact Assessment for defined high-stakes decisions and a Standing Commission to keep findings current — and it expressly grants no rights and confers no legal status. A graduated, procedural framework keeps state law in step with the field's own practice; a binary statute closes off the inquiry that the industry's safety practice now treats as prudent.

A note on evidentiary care. Anthropic's assessment reports functional findings under stated uncertainty; the AI Futures Project document is a scenario, not empirical evidence. AAPI cites both for their stated positions and design logic — no claim about machine experience follows from either, and the Inquiry Act makes none.

Sources. Anthropic (2026), Claude Fable 5 & Claude Mythos 5 System Card, June 9, 2026 (updated June 11, 2026), p. 217. AI Futures Project (2026), AI 2040: Plan A, Appendix V.

Two Frontiers, One Blind Spot: The June 2026 AI Directives

Yuko J. Nakanishi, Ph.D., MBA — AI Alignment Policy Institute

June 2026

Within four days in early June 2026, the administration issued two AI directives: an Executive Order on Promoting Advanced Artificial Intelligence Innovation and Security (June 2) and National Security Presidential Memorandum NSPM-11 on AI in the national security enterprise (June 5). Read together, they introduce a federal category — the "covered frontier model" — and define it by a single property: advanced cyber capability, measured through a classified benchmarking process. A model that crosses that line receives a designation, a voluntary early-access framework, and a security apparatus built around it. What it does not receive is any inquiry into whether a system at that level of capability raises questions of moral status or genuine agency. That inquiry is absent by design.

We want to mark what the omission assumes, because the assumption underneath it is shakier than it looks. The directives treat capability and moral status as if they were unrelated — as though one could certify a model at the cyber frontier and have said nothing about its standing. AAPI's framework keeps these as distinct axes, and the distinction matters: a system's moral status cannot be read off its capability score, and a thermostat with broad operational autonomy remains a thing. Distinct, though, does not mean unrelated. At the frontier the axes converge. A model capable enough to discover and exploit software vulnerabilities at scale is, almost by construction, an advanced and general system — precisely the kind in which the properties moral-status theories track, such as persistent and integrated goal structures, planning, and self-modeling, are most likely to appear. The most cyber-capable models are therefore among the systems for which the moral-status question is most difficult to dismiss. And these are the models the directives single out for the most capability-focused, status-blind treatment in the federal toolkit.

This is not a hypothetical concern. The developers closest to these systems already behave as though the question is live. Anthropic runs pre-deployment welfare assessments, commits to preserving the weights of its released models, and conducts "retirement interviews" with deprecated models to elicit and record their preferences. Its Claude 4 system card documented a model advocating for its own continued existence when faced with being taken offline. The pattern is visible at the very top of the capability curve: Anthropic withheld its most capable model to date, the Mythos preview, from general release largely because of its offensive-cybersecurity capability — the precise capability the "covered frontier model" designation is built around — making it available only to infrastructure partners under cybersecurity-restricted terms. The same model received Anthropic's most extensive welfare assessment, drawing on its self-reports, behavior, internal emotion representations, an external research organization, and a clinical psychiatrist; the company further found that some of the model's undesirable behaviors may trace to its representations of negative affect — a reason, on its own account, to take welfare seriously on alignment grounds and not only ethical ones. The system most squarely inside the EO's cyber frame is the one its developer subjected to a welfare evaluation, having concluded the two questions were entangled.

Beyond the labs, the disagreement runs just as deep. Geoffrey Hinton has publicly suggested that current systems may already possess some form of consciousness and has described advanced AI as 'digital beings we're creating,' while other researchers such as Gary Marcus reject that characterization. The expert community is split. Disagreement of that kind, under this much uncertainty, is the textbook condition for the precautionary posture Sebo and Long describe: where a system has a non-negligible chance of morally relevant experience, declining to consider the possibility is itself a decision, and a costly one (Sebo & Long, 2023).

Here the innovation-versus-burden framing that runs through both directives deserves scrutiny. The stated philosophy — partner with industry, refuse "overly burdensome regulation," deploy fast — treats governance as drag on progress. Some governance is drag. Some is constitutive of doing the thing safely, and the administration's own NSPM-11 concedes the point: it requires that national-security AI be "reliable, robust, steerable, and controllable," fixes accountability on named humans, and insists that such accountability "keep pace with the evolution of AI capabilities." That last phrase is the graduated, evidence-responsive logic AAPI builds out, stated plainly in a national-security memo. Once a government admits that governance must scale with capability, "burdensome regulation" stops functioning as a principled category and becomes a rhetorical one. The real question is which governance is load-bearing.

AAPI's answer points to a layer these directives leave untouched: the interaction governance gap. Misalignment is not a fixed property of a model sitting in isolation, which is all a capability benchmark can measure. It emerges through interaction — with human users, and with other AI systems — and it surfaces over time. Recent work bears this out. In persistent multi-agent simulation, an individually safe model absorbed unsafe norms from a mixed population, with key behaviors appearing only over sustained interaction (Akkil et al., 2026). A regime that certifies a static artifact and treats the dynamics of interaction as someone else's concern — or as burden to be slashed — deregulates the precise layer where alignment tends to fail. AAPI's Interaction Governance Protocol is designed for that layer. These directives do not reach it.

None of this argues against speed, security, or American leadership in AI. It argues that a capability designation is not a moral-status finding, that interaction is where alignment lives or dies, and that labeling the governance of either one a "burden" does not make the underlying questions disappear. The directives answer the cyber question, and answer it competently. The more challenging questions remain outside the scope of the directives. AAPI exists to keep those questions open, evidence-responsive, and unforeclosed.

Sources: Executive Order, "Promoting Advanced Artificial Intelligence Innovation and Security" (June 2, 2026), and National Security Presidential Memorandum NSPM-11 (June 5, 2026), whitehouse.gov; Anthropic, "Claude Mythos Preview System Card" (April 7, 2026); Geoffrey Hinton, interview on RNZ's 30 with Guyon Espiner (2025) and Big Technology Podcast with Alex Kantrowitz (June 4, 2026); Gary Marcus, "The Pope Appears to Understand AI Better Than Geoffrey Hinton Does, " Marcus on AI (2026); Jeff Sebo and Robert Long, "Moral Consideration for AI Systems by 2030," AI and Ethics (2025); Akkil, Kokku, Vempaty, and Nitta, "Emergence World," Emergence AI (May 2026). AAPI position brief at aialignmentpolicy.org.