Current Micro-Season

Loading...

Loading...

Loading...

Loading...

July 26, 2025

Anthropic is launching an AI Psychiatry team

Models contain unknown multitudes

We’re launching an “AI psychiatry” team as part of interpretability efforts at Anthropic! We’ll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors.

Jack Lindsey of Anthropic announces an "AI psychiatry" team at Anthropic. A bit cheeky, as the Jack Lindsey of Anthropic announces an "AI psychiatry" team at Anthropic. A bit cheeky, as the linked job doesn't mention this phrase at all and instead identifies the role as part of Anthropic's Interpretability team.

That being said, there is clearly a large and unexplored world of weird model behaviors: an anon on Twitter discusses ‘Cat Mode’ that they discovered within Bing, and the Claude 4 System Card itself discusses an odd “‘spiritual bliss’ attractor state” (5.5.2, page 62).

← All notes

Send your thoughts

Name and email are optional.

ESC
Type to search...
↑↓ to navigate to select