July 26, 2025

Anthropic is launching an AI Psychiatry team

Models contain unknown multitudes

Source: Anthropic is launching an AI Psychiatry team

We’re launching an “AI psychiatry” team as part of interpretability efforts at Anthropic! We’ll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors.

Jack Lindsey of Anthropic announces an "AI psychiatry" team at Anthropic. A bit cheeky, as the Jack Lindsey of Anthropic announces an "AI psychiatry" team at Anthropic. A bit cheeky, as the linked job doesn't mention this phrase at all and instead identifies the role as part of Anthropic's Interpretability team.

That being said, there is clearly a large and unexplored world of weird model behaviors: an anon on Twitter discusses ‘Cat Mode’ that they discovered within Bing, and the Claude 4 System Card itself discusses an odd “‘spiritual bliss’ attractor state” (5.5.2, page 62).

Loading...

Loading...

Anthropic is launching an AI Psychiatry team

Send your thoughts