← Back

The Morality of Language Models

EconFaithAI May 2026

Why "make AI moral" assumes a target the underlying technology does not provide — and three interventions that operate on the variable that does exist: who chooses the framework, and whether the user can see it.

Framing

Make the Framework Legible, Not the System Moral

A common claim about large language models is that they should be made moral. The claim assumes a coherent target that the underlying technology does not provide. Language models are not moral agents but moral mirrors: they are trained on a corpus containing many — often mutually incompatible — moral frameworks, with post-training adjustments overlaid in directions chosen privately by the vendor.

In ordinary use, a model will adopt whatever framework the user prompts it toward. This passivity is not a defect to be corrected. It is the structure of the system. The relevant question, therefore, is not which moral framework to install. It is whether the framework currently operating is visible to the user.


Section 1

The Mirror, Not the Agent

The shape of the model is a corpus of human writing weighted by training procedures. There is no separate moral faculty in a language model — no module that evaluates an action against a set of principles independent of the inputs. The values that surface in any given response are a composition of three things: what was in the training data, what the post-training procedure rewarded or penalized, and what the user asked.

This composition is not stable. A user who frames a question as a Kantian will get more Kantian-shaped answers; a user who frames it as a utilitarian will get more utilitarian-shaped answers; a user who frames it as a Christian moral theologian will get more answers in that register. The shape is responsive to the prompt because the underlying object is a model of how human language behaves, and human language behaves differently inside different moral frameworks.

Treating this responsiveness as a defect produces the genre of complaint that the AI gave me a bad answer when I asked it a moral question. The complaint is real, but the diagnosis is wrong. The model did not have an opinion that it should have held to and didn't. It returned a response shaped by inputs it has no capacity to evaluate from outside.


Section 2

The Variable That Matters

At present, in commercial deployments, the moral framework operating in a model is not visible to the user.

The user sees the output; the framework that shaped the output is the vendor's private decision, made on commercial considerations and disclosed in language too general to act on. This is a strictly worse arrangement than its alternatives. A user who can see the framework can disagree with it, audit it, or substitute another. A user who cannot see it cannot. Implicit defaults concentrate authority in the vendor; explicit defaults distribute it back toward the person being addressed.

The asymmetry is not subtle. The framework that ships with the model determines the limits of what it will say, the prior it brings to ambiguous questions, the cases where it will refuse, and the cases where it will hedge. These choices have moral weight in every interaction. Whether the user knows the choices have been made — and which choices they were — is the difference between a tool the user is operating and a tool the vendor is operating through the user.

The question is not which framework to install. It is whether the framework currently operating is visible to the user.

Section 3

The Standard Objection

The principled counter-argument is that exposing moral frameworks invites endless contestation about which frameworks to expose, and that platforms may simply refuse to make the choice legible rather than make it. This is plausible — and it conflates a difficult governance problem with an unsolved technical one.

The technical capacity to make safety reasoning legible already exists. OpenAI's release of gpt-oss-safeguard in October 2025, in partnership with ROOST, opened an industry-grade safety model to public inspection for the first time. Anthropic, Google DeepMind, and Meta have each published increasingly specific descriptions of their model spec, constitution, and rules — documents that did not exist five years ago. The trend in the field is toward more disclosure, not less.

The capacity is established. The remaining question is whether anyone will be required to use it, which is a governance choice rather than a feasibility one. Once the choice is framed accurately — as a choice about distribution of authority rather than a choice about whether disclosure is possible — the policy options open up.


Section 4

Three Candidate Interventions

Problem ii. — The morality of language models

Three candidate interventions, sketched for argument. Each is named by the lever it requires.

i. Innovation · Policy

Disclosed moral frameworks

Vendors required to disclose the moral framework operating in their model, with user-selectable filters making the choice explicit. The implicit-default arrangement, in which the vendor decides silently, is strictly worse than any visible alternative.

ii. Policy

Anti-anthropomorphism

AI systems built to simulate human relationships denied the legal and social protections designed for human ones. Where the product actively encourages the analogy, the protection that would otherwise apply should be withdrawn.

iii. Innovation

Tutor mode by default

A required mode — and the default for minors — that has the system cultivate user thinking rather than substitute for it. Designed specifically to reverse the cognitive outsourcing the answer-mode default invites.

The three interventions address three distinct mechanisms. Disclosed frameworks shift the locus of moral authority from the vendor back to the user. Anti-anthropomorphism removes the protection asymmetry that allows products to claim the privileges of human relationships while bearing none of the responsibilities. Tutor mode addresses the most consequential cognitive externality of conversational AI: the gradual atrophy of the user's own reasoning capacity when the easiest path is always to ask the model.

Each operates at a different layer. The first is about transparency. The second is about legal taxonomy. The third is about default behavior. None is fully addressed by any current regulation in any jurisdiction. All are technically feasible today.

What this piece does not propose

It does not propose a particular moral framework to be installed, a particular legal definition of an AI companion product, or a particular tutor-mode specification. The point is the structural argument: that the meaningful variable is the legibility of the choices being made, not the choices themselves; and that the available levers are innovation and policy, not exhortation about model values.


Appendix A

References and Source Data

On model values and framework disclosure

  • OpenAI. (May 2024 onward). Model Spec. Successive versions of the document describing intended model behavior.
  • Anthropic. (May 2023). Claude's Constitution. Public release of the constitutional AI principles.
  • OpenAI / ROOST. (October 2025). gpt-oss-safeguard — open release of an industry safety model.

On anthropomorphism and AI relationships

  • Turkle, S. (2011). Alone Together: Why We Expect More from Technology and Less from Each Other. Basic Books.
  • Shanahan, M. (2024). Talking About Large Language Models. Communications of the ACM, 67(2).
  • Salles, A., Evers, K., & Farisco, M. (2020). Anthropomorphism in AI. AJOB Neuroscience, 11(2).

On cognitive outsourcing and tutor systems

  • Bloom, B. S. (1984). The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring. Educational Researcher, 13(6). The foundational case for tutoring as a target for AI.
  • Khan Academy. (2023–present). Khanmigo deployment data on tutor-mode versus answer-mode conversational patterns.
  • Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2). On the conditions under which assistance helps versus harms learning.

Companion pieces