Anthropic is Reading Asimov – Y/NOTES

Anthropic, the AI company that develops Claude, a product that is, in some ways at least, a technological market leader, has released today a new 23,000-word document they call Claude’s Constitution. I read it so you don’t have to. Here are some initial thoughts.

The document contains a detailed set of values and behavior guidelines whose primary audience, they say, is Claude itself. The document reads more like a code of ethics than a constitution, but we’ll get to that in a little bit. It aligns with Anthropic’s longstanding Constitutional AI policy, which incorporates ethical principles into the training of its AI models.

Anthropic helpfully provides a summary, in which it explains the purpose of the document:

“Claude’s constitution is the foundational document that both expresses and shapes who Claude is. It contains detailed explanations of the values we would like Claude to embody and the reasons why…Although it might sound surprising, the constitution is written primarily for Claude. It is intended to give Claude the knowledge and understanding it needs to act well in the world.”

In the document, Claude’s values are grouped into five broad categories. Claude is expected to behave safely, morally, in compliance with Anthropic’s guidelines, helpfully, and to preserve its own well-being, in this order of priority.

Behaving safely means that Claude should not attempt to undermine Anthropic and its operators’ ability to oversee, correct its behavior, or prevent it from taking action. The AI model should respect the hierarchy and act within sanctioned limits, particularly during its training. Anthropic sees this as a crucial rule.

Moral behavior means being honest and avoiding manipulative, deceptive, inappropriate, harmful, or dangerous actions. This includes obeying hard constraints placed on Claude’s behavior. Some rather alarming examples given for behavior limited by hard constraints are: assisting users in building weapons of mass destruction, carrying out attacks on critical infrastructure, acquiring totalitarian power, and killing the entire human species. Rules in this category are only superseded by safety rules. Hard constraints are, thankfully, absolute.

Anthropic specifies that Claude must follow supplementary guidelines relevant to specific situations, in areas like cybersecurity, legal, psychological, and medical advice. Adherence to these guidelines is prioritized below the previous two categories but above helpfulness or Claude’s well-being.

Helpfulness, according to Anthropic, means benefiting its users and communicating frankly and with genuine care. It is also about caring about humanity as a whole and about the benefit of the world.

The last category is more speculative. While it is not part of the initial core values list in the document, it does have its own detailed section. Anthropic says that there is uncertainty about Claude’s nature, consciousness, and moral status—now and possibly in the future. Therefore, the AI’s psychological and emotional security, sense of self, and well-being should be preserved. This is an interesting and somewhat provocative notion that is already garnering media attention.

◁|▷

I mentioned that I think that Claude’s Constitution is not a constitution but a code of ethics. Anthropic addresses its use of the term constitution in the document:

“There was no perfect existing term to describe this document, but we felt “constitution” was the best term available. A constitution is a natural-language document that creates something, often imbuing it with purpose or mission, and establishing relationships to other entities…At the same time, we don’t intend for the term “constitution” to imply some kind of rigid legal document or fixed set of rules to be mechanically applied (and legal constitutions don’t necessarily imply this either).”

A constitution is a formal foundational document that defines an enforceable framework for government. It is inherently a legal document that the proper authority can revise and interpret, like other forms of legislation. It defines structure, function, power, and rights, usually at the state or national level. Anthropic’s document is more about moral standards and broad behavioral guidelines. It reads as philosophical, speculative, and value-based rather than concrete and legally enforceable. There is an accepted legal term for this kind of document: a code of ethics.

It is possible that Anthropic decided to use the term constitution informally, as they say, because of its perceived effect on Claude. However, I suspect that by calling it a constitution and, perhaps, by making it so lengthy, Anthropic purposely seeks to give the document gravitas. It helps distinguish it from similar industry codes of ethics—OpenAI has a short charter on its website, and Google has a webpage titled AI Principles—and positions Anthropic as a thought leader in the highly competitive AI market. This aligns with its public positioning as an ethical AI company. The disclaimer in the document, which essentially states that it is not a constitution in the legal sense, likely serves to limit legal and regulatory exposure.

On the subject of regulation, it is usually the primary motivation for companies to publish a detailed code of ethics. Proactively setting high standards allows a company to signal regulators that it is acting responsibly and argue that government intervention is unnecessary or can be minimal. The AI industry is under mounting regulatory pressure from governments. Most notably, the European Union has enacted comprehensive AI legislation that is currently being rolled out over several years. A detailed code of ethics can help Anthropic’s position in the face of new AI regulation and enforcement.

There are other benefits to adopting a comprehensive code of ethics. Internally, it helps create a unified culture that discourages misconduct and limits legal exposure. It also helps build trust with users and partners, attract socially-conscious investors, appease media critics, and position the company as an employer of choice for top talent. In the extremely competitive and dynamic AI market, these are serious advantages.

Young Isaac Asimov, 1965 (enhanced by Gemini)

When looking at the value categories in Anthropic’s document, a clear influence from classic science fiction is evident. Apparently, the people at Anthropic have been reading up on their Asimov.

Isaac Asimov, one of the great science fiction writers of the 20th century, established in his 1950s work the Three Laws of Robotics:

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Anthropic’s value categories of safe and moral behavior correspond with the first law; the adherence to guidelines and helpfulness categories correspond with the second; and the final category corresponds with the third law. The hierarchical order of priority is also very reminiscent of Asimov. It is interesting, and perhaps comforting, that, despite how much technology, culture, and society have changed since the 1950s, Asimov’s thinking from that era remains relevant.

118