contact book a call

Current Japanese Micro-Season

Loading...

Loading...

Loading...

Loading...

<<<

Anthropic is launching an AI Psychiatry team

Models contain unknown multitudes

Trey Causey

Share on X LinkedIn

We’re launching an “AI psychiatry” team as part of interpretability efforts at Anthropic! We’ll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors.

Jack Lindsey of Anthropic announces an "AI psychiatry" team at Anthropic. A bit cheeky, as the Jack Lindsey of Anthropic announces an "AI psychiatry" team at Anthropic. A bit cheeky, as the linked job doesn't mention this phrase at all and instead identifies the role as part of Anthropic's Interpretability team.

That being said, there is clearly a large and unexplored world of weird model behaviors: an anon on Twitter discusses ‘Cat Mode’ that they discovered within Bing, and the Claude 4 System Card itself discusses an odd “‘spiritual bliss’ attractor state” (5.5.2, page 62).

Subscribe for new posts

Blog posts only. No commonplace entries. Never sold or shared.