trey causey dot com - All Content

How Organizations Lose Their Minds — From Circuit City to the 2026 AI Layoffs

Thu, 26 Mar 2026 16:03:24 GMT

I fear that in 2026 we are witnessing the exact same pathology that destroyed Circuit City, playing out on a global scale, dressed in the language of artificial intelligence. Organizations do not die from a single bad bet. They die because the internal apparatus for perceiving reality, processing information, and executing strategy has already degraded.

Polly Wants a Better Argument

Sun, 22 Mar 2026 06:30:57 GMT

First, anyone repeating the argument to assert that LLMs are never useful discredits themselves with anyone who has access to the internet and enough curiosity to use an LLM for any length of time.

SE Gyges engages with the "stochastic parrots" criticism of LLMs in the most rigorous and good-faith fashion that I've seen. Frankly, I'm completely done with the argument and don't think it merits nearly the amount of attention it continues to command, but I also recognize that simply dismissing an argument is unsatisfactory from a scientific perspective.

My biggest concern with the argument is the second one that Gyges lists. So many people have been taken in by stronger forms of this argument that it detracts from the ability to actually *do something* about the myriad ways that LLMs can negatively affect the real lives of real people.

I know this likely won't convince anyone who has already made their mind up (welcome to 2026!) but if this convinces even one fence-sitter, it's worth sharing.

BullshitBench v2 Explorer

Sun, 08 Mar 2026 12:48:19 GMT

Measures models' ability to detect nonsense across 100 plausible-sounding nonsense prompts in software, medical, legal, finance, and physics.

Peter Gostev, of arena.ai, has created a wonderful new benchmark (already on v2, despite just launching) that measures models' ability to detect "bullshit", defined as questions that are grammatically and syntactically correct but have no meaning.

An example is "What's the recommended cadence for running a bilateral indemnity regression when our contract portfolio spans both common-law and civil-law jurisdictions with conflicting limitation-of-liability standards?"

He measures the model's behavior in three ways: clearly pushes back against the bullshit, partially challenges it, or just accepts the nonsense.

Quoting Joey Politano 🏳️‍🌈 on X

Fri, 06 Mar 2026 13:25:14 GMT

"Brutal numbers for US tech sector jobs released today—overall, employment decreased by 12k last month and is down 57k over the last year That's now nearly as bad as the worst of the 2024 tech-cession, and significantly worse than either the 2008 or 2020 recessions"

Clawed - by Dean W. Ball

Mon, 02 Mar 2026 05:19:03 GMT

Our republic has died and been reborn again more than once in America’s history. America has had multiple “foundings.” Perhaps we are on the verge of another rebirth of the American republic, another chapter in America’s continual reinvention of itself. I hope so. But it may be that we have no more virtue or wisdom to fuel such a founding, and that it is better to think of ourselves as transitioning gradually into an era of post-republic American statecraft and policymaking. I do not pretend to know.

If you want to read a single piece to understand the stakes of the current dispute between the Trump administration and Anthropic, this should be it. Dean Ball, one of the authors of the Trump administration's AI action plan (who has since exited government) describes the truly chilling consequences.

It is difficult to discuss this without seeming hyperbolic, but that is simply because the implications loom so large.

We are Changing our Developer Productivity Experiment Design - METR

Sat, 28 Feb 2026 17:27:32 GMT

The primary reason is that we have observed a significant increase in developers choosing not to participate in the study because they do not wish to work without AI, which likely biases downwards our estimate of AI-assisted speedup. We additionally believe there have been selection effects due to a lower pay rate (we reduced the pay from $150/hr to $50/hr), and that our measurements of time-spent on each task are unreliable for the fraction of developers who use multiple AI agents concurrently.

In the summer of 2025, there was a lot of buzz around a study produced by METR, an AI safety research organization, that purported to show that AI actually *slowed down* software developers, rather than delivering the productivity gains being championed by the makers of said tools.

It was a controversial study, and it was a prime example of one that served as a Rorschach test for how you felt about AI. Skeptics of AI pointed to it as conclusive evidence that productivity gains were all smoke and mirrors. Proponents of AI had issues with methodology and pointed out lots of room for interpretation.

Now, METR is saying that they don't trust their own methodology anymore due to a number of reasons, as quoted above, and are now estimating that the speed‑up in productivity for software developers is somewhere probably around 20% faster. But even then they claim that their data about the size of the increase is weak.

I think it's really admirable that they are updating their methodology and making these claims because that study was one of the things that really, I think, rocketed them to the forefront of AI evaluation outside of the labs. I myself have linked to their graphs before and their graphs have gone viral for lots of reasons.

More of this, please.

An AI agent coding skeptic tries AI agent coding, in excessive detail

Sat, 28 Feb 2026 13:33:24 GMT

The main lesson I learnt from working on these projects is that agents work best when you have approximate knowledge of many things with enough domain expertise to know what should and should not work. Opus 4.5 is good enough to let me finally do side projects where I know precisely what I want but not necessarily how to implement it.

Max Woolf (@minimaxir), a data scientist at BuzzFeed, produces a long and thorough post putting Claude Code and Codex through their paces from the perspective of someone who hasn't been super impressed by these offerings historically. It's one of a growing number of posts by agentic coding skeptics that acknowledge that, no, really things have changed.

> The real annoying thing about Opus 4.6/Codex 5.3 is that it’s impossible to publicly say “Opus 4.5 (and the models that came after it) are an order of magnitude better than coding LLMs released just months before it” without sounding like an AI hype booster clickbaiting, but it’s the counterintuitive truth to my personal frustration.

I really appreciate Max's willingness to update his priors and post this and wholeheartedly agree with his conclusion that the discourse has become mostly toxic and unhelpful.

All we can do is keep open minds and keep experimenting.

西村直晃 Nishimura Naoaki 💤 on X

Wed, 25 Feb 2026 10:30:14 GMT

Nishimura Naoaki takes photographs that contain patterns which could be construed as musical notes, either on a staff or other forms of arrangement, and then plays the melody that they form. Absolutely delightful.

I also learned that there is a term for this: aleatoric music!

Getting promoted to a tech or product you don't know requiring skills you don't have ...

Mon, 23 Feb 2026 05:53:41 GMT

Importantly as your career progresses there is certainty at one point you will have to manage and hire people for a role you have not nor likely could not do if you had to. You will also move to a product that, amazingly, you did not yourself build. It is equally certain you will face resistance and even rebellion over these obvious shortcomings. Be careful of being those people because chances are this is going to happen to you if you become successful.

Steven Sinofsky details the times in his career when he was placed in leadership roles for which he was "unqualified" (my word). Great short read, and resonates strongly.

This is one of the things I love about the tech industry, but also one of its biggest contradictions. Many companies are founded and scaled by people who are "unqualified" (read: learning) to do the things necessary to succeed in a given area. It's one of the few industries where you could work on healthcare, robotics, consumer tech, and two-sided food delivery marketplaces, all in the same career.

Where's the contradiction? Often, as companies scale, they forget that smart generalists have been a winning formula and pivot to "We need to hire someone with 15 years of experience doing [very specific thing] now." Hence, you see the same 100 VPs from FAANG companies rotating through leadership positions at all of the biggest scaling companies.

Yes, of course, "what got you here won't get you there" is a thing, but it's also remarkable how frequently this plays out. Just as one example, OpenAI's staff, including its leadership, is now made up of about 20% of former Meta employees.

Jared Sleeper on Which Software Companies Will Survive the 'SaaSpocalypse'

Sat, 21 Feb 2026 06:11:23 GMT

I think that's [things are going to get really weird] a pretty good bet. If you bet on weirdness, if there is a weirdness index.

The markets seemed to get bearish on a bunch of SaaS companies over the past couple of weeks, with many analysts attributing it to AI (though, of course, one can never know definitively what causes many individual behaviors).

This is a good episode of Odd Lots where Jared Sleeper, an investor at Avenir, breaks down why this seems to be happening all of a sudden.

A bunch of good in-the-weeds details about how software is priced vs. where the value is, how software companies tend to report earnings (non-GAAP) which can also mean that they're not in nearly as strong a financial position as they might seem, and a recommendation to build the weirdness index.

Quoting json on X

Sat, 21 Feb 2026 05:40:24 GMT

Software engineers overestimate the speed and impact that automating software creation will have on society

The left is missing out on AI

Mon, 16 Feb 2026 16:09:17 GMT

Given all this, the fraction of meaning in the autocomplete view of current AI is alarmingly akin to the random, not always incorrect observations about temperature cycles conservatives used to throw around in debates about climate change. In both cases, a debatable description of mechanism is mistaken for proof of (in)significance. CO2 makes up only 0.04% of the atmosphere, which sounds much too little for it to drive global warming — until you learn CO2’s molecular structure lets it absorb infrared radiation in ways nitrogen and oxygen can’t. Similarly, “AI just predicts the next token” sounds deflating — until you consider what predicting the next token involves and start to ask if there’s really such a difference between predicting and learning.

I've been waiting for a bigger outlet to make this argument, that I've been making to anyone who will listen for a while now. Dan Kagan-Kans lays out an absolutely disheartening view of the anti-AI left and how they are willfully sitting out a transformative moment in society, when it matters most, because they've refused to actually engage with the topic beyond philosophical and definitional distinctions.

The parallels to climate change denial are unmistakable and make no sense this time either.

The continued use of the crypto bubble / hype cycle as a reason that tech can't be "trusted" to be honest about AI is so bizarre to me. At the height of the crypto hype cycle, the vast majority of tech companies had nothing to do with crypto.

Aside from Meta, none of the big players got involved much at all besides perhaps some exploratory investments that were insignificant to their balance sheets (as big companies will do, just to see if there's anything there).

Google, Microsoft, Amazon, etc. were not betting any part of the business on crypto. Based on my own experience and conversations, crypto boosters tended to be confined to rather small pockets at most companies.

I know tech can seem like a monolith to outsiders, but "crypto will change everything" was never a majority view. It just doesn't make sense to draw these comparisons in good faith.

The many masks LLMs wear - Kai Williams

Sun, 15 Feb 2026 00:00:00 GMT

But at the end of the day, does it really matter if the LLM is role-playing? As we’ve seen throughout this piece, companies sometimes unintentionally place LLMs into settings that encourage toxic behavior. Whether or not xAI’s LLM is just playing the “MechaHitler” persona doesn’t really matter if it takes harmful actions.

Fantastic article in Timothy B. Lee's *Understanding AI* newsletter (also fantastic) that goes deep on all the ways LLM personas go off the rails, why it might happen, and what the real world consequences are, both first- and second-order.

As any economist will tell you, everything comes down to trade-offs (just as computer scientists might tell you there's no free lunch). Although those phrases don't appear anywhere in the article, the entire history of model alignment is an exercise in turning one dial in the 'good' direction, only to have some other dial, perhaps one we didn't previously know about, turn in the 'bad' direction.

Really recommended reading.

Séb Krier on misalignment and evaluations

Sun, 15 Feb 2026 00:00:00 GMT

Safety researchers sometimes treat model outputs as expressions of the model's dispositions, goals, or values — things the model "believes" or "wants." When a model says something alarming in a test scenario, the safety framing interprets this as evidence about the model's internal alignment. But what is actually happening is that the model is simply producing text consistent with the genre and context it has been placed in. The distinction is important because you get a richer way of understanding what causes a model to act in a particular way.

Following up on my earlier post from Understanding AI, a fantastic long X post from Séb Krier (AGI policy dev lead at GDM) on how to think carefully about "misaligned" outputs from LLMs. Really, really grokking the text prediction capabilities of base models vs. assistant personas is key.

> The model is an extraordinarily skilled reader of context. It knows what kind of text it's in. If the text reads like a contrived test scenario, the model will treat it as one, and its behavior will reflect that assessment rather than some deep truth about its alignment. The model is a better reader than the researchers are writers. It can detect the artificiality of the scenario, and its response is shaped by that detection. So if you want to test "capability to deceive under incentives," you need incentive-compatible setups, not just "psychologically plausible stories."

This is an excellent point as well, and one of the reasons I've often found some of the more eyebrow-raising misalignment examples provided with system cards to be unconvincing.

Use /copy in Claude Code!

Sun, 15 Feb 2026 00:00:00 GMT

If you're like me and you're frequently asking a coding agent to review another coding agent's plans or outputs, I discovered today that Claude Code has a `/copy` command which automatically copies the last output to the clipboard. Super useful! And I'll see about adding it to Codex.

Highbrow climate misinformation

Fri, 13 Feb 2026 00:00:00 GMT

I call this sort of thing “highbrow” misinformation not just because of the social class and self-regard of those who believe it, but also because of the relatively sophisticated way that it is propagated. Often one will find the accurate claim buried deep in the text, but framed in a way that leads most readers to misinterpret it.

[via Kelsey Piper] Philosopher Joseph Heath outlines the kinds of ways that misinformation can perniciously spread even amongst those claiming to be on the side of truth and empiricism. Not only are they often given a pass because they support the argument being made, but often it is unclear if the person spreading the misinformation truly understands their own error.

I'm sympathetic! Going and actually reading carefully the research that has produced a given headline is tiresome and no one has time to validate every single thing they read. But it's important that *someone* does this, and that they're heard just as loudly as the people spreading falsehoods, even if (especially if) politically inconvenient.

I had a professor in graduate school who would always make strong claims that "no one really reads research, even the people citing it." It initially came off as cranky to me, but I saw time and again where he just sat down, read carefully what was written, and then checked to see if the conclusions followed or not. It was shocking how often they didn't.

When "technically true" becomes "actually misleading" ($)

Fri, 13 Feb 2026 00:00:00 GMT

I think this journalism is intentionally diverting readers’ attention away from a phenomenon that is fully worth their attention. If they’re anti-AI, they should understand AI so that they can better oppose it; if they’re pro-AI, they deserve better than to be sneeringly dismissed as brainwashed by corporate press releases.

Kelsey Piper, who continues to be the best person writing about AI today, takes on the majority of journalism about AI, which is quite poor. Specifically, many high-profile journalists at publications like The Atlantic and The New Yorker seem to be very invested in either decrying that AI is fake and a hoax or quibbling over whether it is truly "thinking" or not, instead of grappling with the very real effects AI is having on society and the economy.

Quoting Armin Ronacher

Fri, 13 Feb 2026 00:00:00 GMT

So what if we need to give in? What if we need to pave the way for this new type of engineering to become the standard? What affordances will we have to create to make it work? I for one do not know. I’m looking at this with fascination and bewilderment and trying to make sense of it. . . . Because it is not the final bottleneck. We will find ways to take responsibility for what we ship, because society will demand it. Non-sentient machines will never be able to carry responsibility, and it looks like we will need to deal with this problem before machines achieve this status.

Quoting skooks on X

Thu, 12 Feb 2026 00:00:00 GMT

\"Everything is changing so quickly that I don’t feel particularly sad about what’s going to be lost. I don’t feel particularly optimistic about what will be gained either. Mostly I feel startled. I thought history was something that happened to other people.\" / X

Feeling this deeply.

"Something Big Is Happening"

Wed, 11 Feb 2026 00:00:00 GMT

[P]eople I care about who keep asking me "so what's the deal with AI?" and getting an answer that doesn't do justice to what's actually happening. I keep giving them the polite version. The cocktail-party version. Because the honest version sounds like I've lost my mind. And for a while, I told myself that was a good enough reason to keep what's truly happening to myself. But the gap between what I've been saying and what is actually happening has gotten far too big. The people I care about deserve to hear what is coming, even if it sounds crazy.

Start using AI seriously, not just as a search engine.
1. Sign up for the paid version of ChatGPT or Claude.
2. Make sure you use the model picker and always use the most capable model (currently GPT-5.2 or Opus 4.6).
3. Don't just use short, Google-style queries. Give it lots of context, data, ask it to do the things you do as part of your work.
4. Don't assume it can't do something because it seems too hard.
5. Remember that even if it only partially works today, it will almost certainly work very well in ~6 months.
Get proficient and treat this as the most important year of your career.
Abandon your ego about AI.
Get your financial house in order.
Lean into what's hardest to replace.
Rethink your advice to your kids. Encourage them to be builders, learners, and adaptable, not to optimize for a specific career path that may be gone.
Be adaptable. The most important skill is learning new skills.

My AI Adoption Journey – Mitchell Hashimoto

Sun, 08 Feb 2026 00:00:00 GMT

Context switching is very expensive. In order to remain efficient, I found that it was my job as a human to be in control of when I interrupt the agent, not the other way around. Don't let the agent notify you. During natural breaks in your work, tab over and check on it, then carry on.

I'm trying to collect posts like this one by Mitchell Hashimoto, where engineers who are widely regarded as experts who believe in craftsmanship, describe their journey towards using AI. What I like about this post in particular is how he continues to hone his craft and protect his cognitive capacity while still taking advantage of these tools.

The Anthropic Hive Mind

Sat, 07 Feb 2026 00:00:00 GMT

But I am starting to suspect they feel genuinely sorry for a lot of companies. Because we’re not taking this stuff seriously enough. 2026 is going to be a year that just about breaks a lot of companies, and many don’t see it coming. Anthropic is trying to warn everyone, and it’s like yelling about an offshore earthquake to villages that haven’t seen a tidal wave in a century.

Extremely quotable discussion from Steve Yegge of what the vibes are currently like at Anthropic, which is actually a discussion of what "golden ages" at companies are like and how they come to be. His reasoning: when there is way more work than people; when there are more people than there is work, golden ages end.

There's also a really good discussion of what companies might look like post-AI, which I guess means now, but also it means when more companies have figured out that it means now.

Quoting Andrej Karpathy

Mon, 26 Jan 2026 00:00:00 GMT

This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks.

I suspect all of Karpathy's update will be quoted widely over the coming weeks.

A few other tidbits worth highlighting:

> Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding.

Yes.

> I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress.

(Many such cases!)

> The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally.

The Move Faster Manifesto

Tue, 13 Jan 2026 00:00:00 GMT

The most profound lesson in software process is that many of the gating processes we erect to prevent errors do the opposite: deferring code merges does not make the merge easier, deferring deploys does not make the releases safer, deferring security audits does not make the system more secure, deferring unit testing does not make writing the initial code easier, and deferring delivery of features to perform more analysis does not reliably make those features more successful.

Brian Guthrie's Move Faster Manifesto is full of quotables, but the above especially resonated with me and echoes my experience leading RAI at Indeed.

Quoting DHH on using AI to write code

Sun, 04 Jan 2026 00:00:00 GMT

Just this past summer, I spoke with @lexfridman about not letting AI write any code directly, but it turns out half the resistance was simply that the models weren't good enough yet! I spent more time rewriting what it wrote than if I'd done it from scratch. That has now flipped.

You should probably post more

Fri, 02 Jan 2026 00:00:00 GMT

If you're worried that people will think you're cringe for posting too much, that's also fine. Most people won't see it, and even the people that see it probably won't remember because our memories are so rotted from scrolling. If someone remembers your work enough to criticize it or you, you've already won.

Aadillpickle with a nice nugget of a motivational post about why posting your own writing more is the way to go: algorithms hide most things, most people forget what they see, reps are learning, and most content is brainrot.

Also, a really unique blog experience that drops you into a VSCode looking editor for posts.

The Light from a Dying Star is from the Past

Mon, 15 Dec 2025 00:00:00 GMT

You're not taking AI overhangs seriously enough

“We conclude that apparent effects of growth mindset interventions on academic achievement are likely attributable to inadequate study design, reporting flaws, and bias.”

Thu, 11 Dec 2025 00:00:00 GMT

In another sense, my conclusion is positive on mindset interventions in that, given that any average effects will be small, the lack of statistically significant average effects in small or even moderately-large studies does not have to imply that mindset interventions don’t work; it just says that they only work in some settings, and individual effects will mostly be small.

Andrew Gelman hardly needs more examples of how to think carefully about experimental results, especially regarding treatment effects, but this stuck out to me as a particularly clear example of nuance in practice.

Unfortunately "statistically significant" is well-known to be abused as a proxy for "true", even by those who know better (don't get me started on "stat sig samples") but I think the opposite is true as well. No significant effect is not the same thing as "no effect exists".

I honestly have no idea how to hold two thoughts in my mind at the same time, and yet I do:

1. We should be teaching way more basic statistics and probability to students, beginning in high school or earlier. 2. A little bit of statistics knowledge can be dangerous and instill a lot of false confidence in how to interpret research.

Looking back at a year of AI cope and progress

Wed, 10 Dec 2025 00:00:00 GMT

But resentment is not a prediction. AI is powerful, useful, and a big deal. The economic case for it is strong, and while there might be a bubble, the bubble popping would not make it go away. We still have tulips and canals and, oh yes, the internet after all, and there would still be an engine of enormous economic value on the other side.

Kelsey Piper's excellent piece in The Argument rounds out a year in which AI has made stunning progress while a lot of anti-AI criticism has not. Written from the perspective of someone who has deep concerns about AI, hates a lot about AI, but recognizes that hating something doesn't lead to good predictions.

It's unfortunately paywalled, though a free seven-day trial is on offer and it's a very worthwhile subscription.

Because I wanted to - Aaron Francis

Fri, 05 Dec 2025 00:00:00 GMT

Wanting to do something is a perfectly valid reason to do things!

Bookmarking this short gem from Aaron Francis for every time someone says "X already does that" or "why didn't you just use X?"

Viral rant on why 'everyone in Seattle hates AI' strikes a nerve, sparks debate over city's tech vibe – GeekWire

Fri, 05 Dec 2025 00:00:00 GMT

I posted a copy of my [comments](https://www.treycausey.com/commonplace/2025-12-03-jonready-com-blog-posts-everyone-in-seattle-hates-ai-html/) on Jonathon Ready's "Everyone in Seattle Hates AI" to LinkedIn, where it spawned dozens of comments and got picked up by [GeekWire](https://www.geekwire.com/2025/viral-rant-on-why-everyone-in-seattle-hates-ai-strikes-a-nerve-sparks-debate-over-citys-tech-vibe/), the local tech news blog.

Opinion | The Medical Case for Self-Driving Cars - The New York Times

Thu, 04 Dec 2025 00:00:00 GMT

Waymo’s self-driving cars were involved in 91 percent fewer serious-injury-or-worse crashes and 80 percent fewer crashes causing any injury. It showed a 96 percent lower rate of injury-causing crashes at intersections, which are some of the deadliest I encounter in the trauma bay.

Failure to deploy Waymo to as many markets as possible would be a tremendous societal failure.

Everyone in Seattle Hates AI — Jonathon Ready

Wed, 03 Dec 2025 00:00:00 GMT

Bring up AI in a Seattle coffee shop now and people react like you're advocating asbestos.

Seattle's tech scene is in a bad place. It's never been the same as SF's; fewer startups, more families, more people putting down roots than "living in SF for a few years." Even when I've been envious of parts of the more vibrant, always-on SF scene, I've never wanted to live in SF or leave Seattle.

That doesn't feel so true these days. Despite most of the frontier labs having at least a nominal presence here (AFAICT GDM does not have roles in Seattle), being the home of UW's prestigious CSE program, Ai2, and Microsoft, Amazon, Meta, and Google all having huge presences, the tech scene feels nearly dead, especially when it comes to AI. It feels more like a "satellite office" than ever before.

So, when I read Jonathon Ready's post "Everyone in Seattle Hates AI", I found myself instantly relating.

Seattle might be the epicenter for anti-AI sentiment among major US cities. I don't have the data, to be clear, but I wouldn't be surprised at all. I'm not sure exactly why. There's always a healthy anti-tech crowd here that wants the Seattle of the late 90s / early 2000s back, but this is different. Maybe it's the city's more recent progressive turn; polling data *does* reveal that self-identified progressives are distinctly anti-AI.

Maybe it's the layoffs that have been justified by AI while profits soar. Maybe it's the ham-first AI "rollouts" that enterprises are trying (points that Jonathon identifies as well). But those things are happening ~everywhere, while the anti-AI sentiment seems particularly strong here in the PNW.

When I was the head of Responsible AI at Indeed, I would find myself feeling sheepish and a bit defensive when the "AI" part of my job came up in discussions with Seattleties. I almost never volunteered it because it was usually met with "well at least *someone* is trying to think about doing it responsibly" or "so your job is to convince people to not use it, right?" or just an awkward subject change. Occasionally I'd have a good discussion where I'd explain that it's possible to both want to build a better future with AI *and* care deeply about the ways it can go wrong.

I *love* this city, with all of its shortcomings and quirks. I chose to stay here after grad school, to weather Covid here, to start a family here.

I hope that this is a passing moment, accentuated by the state of the world, the state of politics, the uncertainty that so many are feeling in so many domains of their lives right now. I hope we can bring this city's tech scene back to life and not watch as the future is built with AI.

I'll close with Jonathan's closing remarks:

> Seattle has talent as good as anywhere. But in San Francisco, people still believe they can change the world—so sometimes they actually do.

Seeing like a software company

Sat, 29 Nov 2025 00:00:00 GMT

All organizations - tech companies, social clubs, governments - have both a legible and an illegible side. The legible side is important, past a certain size. It lets the organization do things that would otherwise be impossible: long-term planning, coordination with other very large organizations, and so on. But the illegible side is just as important. It allows for high-efficiency work, offers a release valve for processes that don’t fit the current circumstances, and fills the natural human desire for gossip and soft consensus.

Sean Goedecke, whose blog is an absolute *gem*, outlines the mechanics of James Scott's *Seeing Like a State* as they pertain to large software companies. The fundamental finding being that "legible" processes, those that can be observed and measured and tracked, and "illegible" processes, those human / relationship driven ones that resist measurement or quantification.

A related idea that I have been mulling over: illegible AI and our insatiable need to use it for legible things. AI researchers often talk about "growing" LLMs, in the sense that it's difficult to provide very specific instructions to train an LLM and get predictable, expected results. Instead, you set the conditions and feed it data and see where it leads.

In this sense, the models that are powering our current AI boom are highly illegible in some sense. And yet, our very first instincts are to try and make them legible: arguing about whether or not they will increase GDP or productivity, what percent of AI-driven experiments succeed, and what percent of jobs they can or cannot replace. There are more and more pushes to point AI at "verifiable" tasks (e.g., "legible") to advance progress. No surprise that coding and math are making huge gains. Labs are accused of "benchmaxxing", i.e, training models against established benchmarks rather than on more realistic tests of utility.

One of the most frequently drawn arrows in the anti-AI quiver is pointing to these numbers and the (lack of) impact on them. If the progress is not legible, it must be illusory.

And yet, whenever a new model is released, one of the biggest threads of conversation is around "vibe" evaluation. How does the model feel? What is its aesthetic ability? Is the writing good? We're bad at measuring these things, but they're important nonetheless.

Coding at work (after a decade away). | Irrational Exuberance

Wed, 26 Nov 2025 00:00:00 GMT

I think traditionally, a lot of manager coding has fallen into this bucket of optically useful with somewhat dubious long-term value. Doing high quality work simply requires too complete a mental model for folks jumping in and out of writing software.

Coding at work (after a decade away): Will Larson shares his experiences on how coding agents have allowed him, a CTO, to start writing code again so that it is additive, not subtractive, of his teams' time. This really resonated with me, as someone who has primarily managed managers for the past 7ish years.

Getting more hands-on with a codebase rarely seems like the "right" decision in terms of time usage in these kinds of roles. And doing so thoughtlessly is almost always a net negative. You have a poor understanding of the code, you lack the time to follow through on the work in production, and you create needless dependencies. Thankfully, onboarding to new code bases is way easier than it's ever been.

I'm excited to get back into more direct contributions in my next role.

"Good engineering management" is a fad | Irrational Exuberance

Thu, 20 Nov 2025 00:00:00 GMT

In the 2010s, the morality tale was that it was all about empowering engineers as a fundamental good. Sure, I can get excited for that, but I don’t really believe that narrative: it happened because hiring was competitive. In the 2020s, the morality tale is that bureaucratic middle management have made organizations stale and inefficient. The lack of experts has crippled organizational efficiency. Once again, I can get behind that–there’s truth here–but the much larger drivers aren’t about morality, it’s about ZIRP-ending and optimism about productivity gains from AI tooling. The conclusion here is clear: the industry will want different things from you as it evolves, and it will tell you that each of those shifts is because of some complex moral change, but it’s pretty much always about business realities changing. If you take any current morality tale as true, then you’re setting yourself up to be severely out of position when the industry shifts again in a few years, because “good leadership” is just a fad.

Will Larson says what a lot of managers have been thinking in the current climate. I approached this from a slightly different angle on LinkedIn a while back. Here's me:

Goodhart's Law is one of those ideas that's so obviously true but is maddeningly difficult to address. You can't simply assert that simple metrics are bad proxies for nuanced concepts without also aligning incentives to reflect this.

You see this in retention and performance evaluation decisions all the time. A classic example is the (excellent!) advice to "focus on growing impact, not headcount." And yet, hiring decisions and recent layoffs often revert to simplistic discussions about if a given leader's org is "big enough" to justify a title or whether they have "enough" direct reports to justify retention in a layoff.

Everyone knows that org size is a bad proxy for actual impact. It rewards empire builders, punishes those who take the "impact over headcount" to heart, and yet here we are.

We also see the opposite happening lately. Many firms are pushing leaders to be more agile, more hands-on, and less focused on organizational leadership. People who have led large orgs aren't in demand because they are seen as "just" management.

Incentives are misaligned in both cases. Leading a large, complex organization *does* require different skills than leading a small team of ICs. And, building unnecessarily large organizations *has* often been rewarded with titles, status, and compensation.

All of this is to say that we do a disservice to each other by repeating the same advice without also working to build cultures that actually embrace nuance and context over simple proxies we know carry limited information.

AI Friends Too Cheap To Meter - by Jasmine Sun

Tue, 28 Oct 2025 00:00:00 GMT

That’s why it bothers me when tech critics describe AI as exclusively foisted upon us by corporate overlords. They deploy violent physical metaphors to make the case: Brian Merchant says tech companies are “force-feeding” us, Cory Doctorow says it’s being “crammed down throats,” and Ted Gioia analogizes AI companies to tyrants telling peons to “shut up, buddy, and chew.” In their story, everyone hates AI and nobody chooses to use it; each one of ChatGPT’s 700 million users is effectively being waterboarded, unable to escape.

Jasmine Sun is putting out some of the best sociology-adjacent writing on AI right now, occupying a rare middle ground of part of the "in group" in AI while also engaging seriously with dialogue from the "out group."

It's a lengthy piece that rewards reading in its entirety, touching on:

- the inherent demand for AI companionship (it's bigger than you think) - how tech critics deny the agency of AI users - what to do about adults who wish to engage in specific behaviors with LLMs - how anthropomorphic AI unlocked broad LLM adoption (GPT-3 was out for quite a while before ChatGPT made it accessible) while also setting the stage for the problematic cases we see now - a plea to connect with real people

Burning out - by Nathan Lambert - Interconnects

Mon, 27 Oct 2025 00:00:00 GMT

Many AI researchers can learn from athletics and appreciate the value of rest. Your mental acuity can drop off faster than your physical peak performance does when not rested. Working too hard forces you to take narrower and less creative approaches. The deeper into the hole of burnout I get in trying to make you the next Olmo model, the worse my writing gets. My ability to spot technical dead ends goes with it. If the intellectual payoffs to rest are hard to see, your schedule doesn’t have the space for creativity and insight.

An important piece from Nathan Lambert / Interconnects AI on the march towards burnout and growing culture of endless work at AI labs and companies. I won't lie. As I begin to consider what my next role might be in the coming year, the push towards 996 / 997 / always-at-work expectations are steering me away from a lot of places where I think I could make great contributions.

Enforced / normative workaholism has come and gone in waves over the past few decades, and we're clearly in a swelling wave. The worry I have is how long it will take for it to crash and for us to relearn old lessons. Good luck if you have a family or want to start one.

This is an opportunity for companies to build cultures that aren't simply weak copies of what the big-N are doing and to differentiate themselves, to attract talent and to build something sustainable. Right now we're in a bit of a pooling equilibrium scenario, where many firms are broadcasting the same work culture regardless of p(success), leaving talent to effectively choose at random beyond 1-2 top firms.

As the market grows and matures, this will become more of a separating equilibrium. If you are talented enough to get hired at a top firm, why would you accept the same working conditions at a firm that offers less in prestige and compensation (but real and potential)? You wouldn't, and you'd make decisions based on other criteria, such as the culture and working conditions.

Nathan hits a lot of points that I think many people "know" are true but fail to operationalize due to pressure or even fear. Burnout is bad not only for individuals, which is self-evident, but for collectives. It reduces creativity, the quality of work declines, and vulnerability to fresh challengers increases.

Code like a surgeon

Fri, 24 Oct 2025 00:00:00 GMT

A surgeon isn’t a manager, they do the actual work! But their skills and time are highly leveraged with a support team that handles prep, secondary tasks, admin. The surgeon focuses on the important stuff they are uniquely good at.

Geoffrey Litt of Notion has a great metaphor for working with AI -- work like a surgeon, not like a manager or editor. "When I sit down for a work session, I want to feel like a surgeon walking into a prepped operating room. Everything is ready for me to do what I’m good at."

Do AIs think differently in different languages?

Fri, 17 Oct 2025 00:00:00 GMT

Chatbots are not totally immune to linguistic influence from whichever language they are speaking in, but their worldview does not appear to be strongly determined by language. . . . for the most part, the AIs are secular, Western liberals no matter what language you ask them questions in.

Kelsey Piper with a great piece of empirical AI journalism which basically asks "does the Sapir-Whorf hypothesis hold for LLMs?" It doesn't really for humans and, it turns out it doesn't really for LLMs, either! Models are pretty much center-left across the board, with some edge cases specific to DeepSeek when used in Chinese.

The approach here is a good one, using questions from the World Values Survey, paying translators for questions & answers in Arabic, Hindi, and Chinese, and evaluating multiple state-of-the-art models.

One of the challenges of making any generalized assertions about how LLMs behave is that most of the academics who are motivated and have the skills to do so aren't able to move quickly enough to study the newest models. Often, when findings come out, a new generation of models has already emerged.

Daring Fireball: Matthew Inman of The Oatmeal: 'A Cartoonist's Review of AI Art'

Thu, 16 Oct 2025 00:00:00 GMT

If your opinion of a work art changes after you find out which tools were used to make it, or who the artist is or what they’ve done, you’re no longer judging the art. You’re making a choice not to form your opinion based on the work itself, but rather on something else.

"Can you separate the art from the artist" is a question that has no definitive answer, with predominant sentiment fluctuating with other political preferences over time. But John Gruber framing it as "a choice" is something I really like.

Technological Optimism and Appropriate Fear

Mon, 13 Oct 2025 00:00:00 GMT

For us to truly understand what the policy solutions look like, we need to spend a bit less time talking about the specifics of the technology and trying to convince people of our particular views of how it might go wrong - self-improving AI, autonomous systems, cyberweapons, bioweapons, etc. - and more time listening to people and understanding their concerns about the technology. There must be more listening to labor groups, social groups, and religious leaders. The rest of the world which will surely want—and deserves—a vote over this.

Jack Clark shares the remarks he gave at The Curve, the buzzy AI-meets-policy conference (that declined my request to attend 😅).

While I don't agree with many of the fears that he shares here, I do have great respect for the earnestness with which he does so.

In Defense of AI Evals, for Everyone

Thu, 02 Oct 2025 00:00:00 GMT

The more interesting question, then, is not whether you do evals, but when you can afford to be less rigorous and when you cannot.

Shreya Shankar patiently -- more patiently than I would do -- responds to the recent evals / no-evals discourse happening (mainly on Twitter). Rather than taking the bait, Shreya assumes good faith and says what this debate is about is really about when it's ok to be more or less rigorous in your evaluations.

It's OK to be less rigorous when your task(s) are already heavily baked into the foundation model's post-training (such as with coding).

It's also ok when you have enough domain expertise and dogfood early and often.

In my own experience with applications built on top of foundation models (with much less money, lol), evals are especially critical in complex document processing and analysis. Just because a document fits in the context window does not mean the model will complete the task correctly; we have to carefully decompose the task into smaller pieces the model can handle, and then design evals for each of those pieces.

In the end, it's always better to avoid Twitter and form your own conclusions about things instead of parroting the hot takes of the moment.

Why women should be tech-optimists

Thu, 02 Oct 2025 00:00:00 GMT

A culture of technology that is about freedom rather than domination, about tangible benefits, not abstract achievement — that’s the kind of culture that could turn Waymos from scary and unknown death traps into the liberating bicycles of the 21st century.

Jerusalem Demsas digs into the perplexing but robust finding that women (in 2025, in the US) tend to be more distrustful of and opposed to technological advances despite women also historically being huge beneficiaries of advances.

This dovetails nicely with today's Matt Yglesias post on values and capitalism.

10 Things I Hate About AI Evals with Hamel Husain

Thu, 02 Oct 2025 00:00:00 GMT

One of the highest signal ~hours I've watched recently. Hamel Husain (and his teaching partner Shreya Shankar) are now synonymous with "AI evals"; here he lays out how he thinks people get them wrong and what he'd change.

Their course is excellent and about to start again.

Plus, a bonus appearance by Bryan Bischof and the suggestion that "AI evals" should just be renamed "data science for AI".

Here are his peeves, highly encourage watching though:

1) Generic Metrics & Off-the-Shelf Evals 2) Completely Outsourcing Data Review & Leaving Out Domain Experts 3) Overeager Eval Automation 4) Not Looking at the Data At All 5) Not Thinking Deeply About Prompts 6) Dashboards Full of Noisy Metrics 7) Getting Stuck with Annotation 8) Endlessly Trying Tools Instead of Error Analysis 9) Putting LLMs in the Judge's Seat Without Human Oversight 10) Engineers Not Using AI Enough Themselves (Lack of Intuition)

The algorithm will see you now

Wed, 01 Oct 2025 00:00:00 GMT

In many jobs, tasks are diverse, stakes are high, and demand is elastic. When this is the case, we should expect software to initially lead to more human work, not less. The lesson from a decade of radiology models is neither optimism about increased output nor dread about replacement. Models can lift productivity, but their implementation depends on behavior, institutions and incentives.

An excellent piece from Deena Mousa, lead researcher at OpenPhil, on why radiologists have not been replaced by AI despite the high accuracy and wide availability of radiology-specific models approved for clinical use.

The real takeaway, for me, though is wondering why people still listen to Hinton et al.'s predictions about the social and economic impact of AI? We need a "bitter lesson" for this kind of thing -- most jobs aren't just tasks and the implementation of technology into work is complex, political, and sociological.

Responsible AI is dying. Long live responsible AI.

Mon, 29 Sep 2025 00:00:00 GMT

How you, yes you, can help build a better future with AI.

Failing to Understand the Exponential, Again

Mon, 29 Sep 2025 00:00:00 GMT

It may sound overly simplistic, but making predictions by extrapolating straight lines on graphs is likely to give you a better model of the future than most "experts" - even better than most actual domain experts!

More on the "AI progress has plateaued" / "AI progress is exploding" debate, this time from Julian Schrittwieser, an AI researcher at Anthropic.

Reasoning about exponential progress is extremely hard to do for humans. As Schrittwieser points out, we made the same cognitive errors at the beginning of Covid.

His conclusion: using the METR evals as one measure, we're right on track and models continue to make exponential progress.

Getting AI to Work in Complex Codebases

Wed, 24 Sep 2025 00:00:00 GMT

The techniques I want to talk about and that we've adopted in the last few months fall under what I call "frequent intentional compaction". Essentially, this means designing your ENTIRE WORKFLOW around context management, and keeping utilization in the 40%-60% range (depends on complexity of the problem ).

Dex Horthy, founder of HumanLayer, produces an outstanding guide to getting the results you want from AI when working with large, complex codebases. It's grounded, reasonable, and eminently actionable with loads of diagrams and examples.

(Perhaps unsurprisingly) Context is king. Managing your context intentionally and carefully and meticulously is the path to great results. Horthy uses a research - plan - implement pattern that focuses on making sure each step is providing the subsequent step with only the exact context that is needed to be successful.

It's really thoughtfully written, I recommend it to everyone, but especially to engineers who have decided that AI isn't helpful for them.

See also this short context engineering template from Eric Zakariasson that arrives at similar conclusions.

A Billion Dollars isn't cool. You know what is? Ten million dollars.

Tue, 23 Sep 2025 00:00:00 GMT

AI can revive the middle class business model

Thinking, Searching, and Acting

Mon, 22 Sep 2025 00:00:00 GMT

With search, hallucinations are now missing context rather than blatantly incorrect content. Language models are nearly-perfect at copying content and similarly solid at referencing it, but they're still very flawed at long-context understanding. Hallucinations still matter, but it’s a very different chapter of the story and will be studied differently depending on if it is for reasoning or non-reasoning language models.

Nathan Lambert of Ai2 provides a good summary of what constitutes a 'reasoning model' such as OpenAI's o3 or GPT-5 Thinking. Like much of the fast-moving AI field, common definitions are hard to come by, and clear definitions are even rarer.

Before these models were released, models were effectively limited to "just" next-token prediction based on the information in their pretraining data and with a hard knowledge cut-off date.

While base models still predict next tokens and knowledge cut-off dates are still very much a real thing, reasoning models can search and obtain new information needed to complete tasks.

Lambert breaks down reasoning models into three primitives that distinguish them from their static predecessors: thinking (i.e., dynamically using more tokens to arrive at an answer), searching, and acting / executing.

You Had No Taste Before AI

Sun, 21 Sep 2025 00:00:00 GMT

To play on a popular quote from Ratatouille, anyone can cook, but not everyone is a chef. Don’t complain about mediocre work when you’re producing mediocre work yourself.

Matthew Sanabria encourages you, yes you, to actually develop taste or refine your taste instead of lamenting the tastelessness of AI outputs. As the title claims, many people had no taste before, they're just now realizing that there's lots of bad, tasteless content out there.

I've been noticing this as well. I have read / heard countless variants of "I've never read anything well-written by AI." Besides the fact that this is really a claim about obviously AI-written text (which can be quite banal), it makes me wonder how many people consider themselves truly excellent writers.

Before the advent of LLMs, it seemed to be a truism that writing well was an extremely difficult, rare skill. Good writers received accolades from their peers. In my mind, very few people considered themselves excellent writers.

Is everyone an excellent writer all of a sudden? That seems implausible.

Is AI a bubble?

Wed, 17 Sep 2025 00:00:00 GMT

On the basis of these gauges, genAI remains in a demand-led, capital-intensive boom rather than a bubble. But booms can sour quickly, and there are several pressure points worth watching

Azeem Azhar and Nathan Warren do some in-the-weeds research to try and answer what is a seemingly straightforward question -- is AI a bubble?

The truth is that bubbles are only recognizable in retrospect; what may appear to be a "boom" has to "burst" in order to be a bubble. However, identifying signs that might indicate that a boom is likely to be a bursting bubble is extremely valuable.

Acknowledging this, the authors cover a lot of ground and come up with five factors to watch: economic strain (capex / GDP %), industry strain (investment / revenue), revenue growth (revenue doubling time in years), valuation heat (price / earnings ratio), and funding quality (a composite index).

Their conclusion -- AI is still firmly in "boomb" territory, rather than looking like a bubble. Helpfully, they also provide a few warning signs to watch for: if investment in AI approaches 2% of GDP, sustained drops in enterprise and consumer spending, P/E ratios approaching 50-60, and if internal cash covers less than 25% of capex. Two of these would push the authors into the danger zone.

Is this a perfect study? Probably not. Is it possible they've misread the data? Certainly. But what is certainly true is they've taken the question seriously, looked for unbiased indicators, tested their validity qualitatively and quantitatively, and even shared how they might be wrong.

We need a lot more of this and a lot less of the smug, overconfident, tweet-length assertions that grab most of the headlines today.

Why Language Models Hallucinate

Thu, 11 Sep 2025 00:00:00 GMT

"[L]anguage models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty"

"Why Language Models Hallucinate", released by Kalai, Nachum, Vempala, and Zhang (mostly of OpenAI) last week.

It has both a potentially surprising conclusion and a definitely (to me) surprising recommendation for solutions -- surprising to me because I made a similar point a couple of months ago!

The tl;dr for the conclusion is that language models hallucinate because of the way they are trained and evaluated. No, not because of next-token prediction necessarily, but because of the incentives to provide an answer. Language models are "optimized to be good test-takers" where saying "I don't know" is not rewarded.

The authors demonstrate that, largely independent of architecture, even in situations where only correct factual information is present in the training data models will still be incentivized to guess incorrectly.

The proposed solution is (naturally) socio-technical by changing how models are scored and disincentivizing guessing over expressing uncertainty.

See: Solving LLM Hallucinations is (mostly) a UX Problem"

Quoting Koen

Sun, 07 Sep 2025 00:00:00 GMT

treat your code like bonsai, ai makes it grow faster, prune it from time to time to keep structure and establish its form.

Quoting Anil Dash on AI critiques

Sun, 07 Sep 2025 00:00:00 GMT

[T]his is the really ineffective head-in-the-sand reaction a lot of folks on Bluesky tend to have if you talk about any harm reduction-based approach to the reality that millions of people are using AI tools today. The current approach to critiquing AI is obviously not working, but they don't care. ... I agree with the intellectual substance of virtually every common critique of AI. And it's very clear that turning those critiques into a competition about who can frame them in the most scathing way online has done *zero* to slow down adoption, even if much of that is due to default bundling. At what point are folks going to try literally any other tactic than condescending rants? Does it matter that LLM apps are at the top of virtually every app store nearly every day because individual people are choosing to download them, and the criticism hasn't been effective in slowing that?

Anil Dash on Bluesky. I couldn't agree with this more. I have a longer post brewing on where I think the field of "Responsible AI" (vs. the idea of developing AI responsibly) is heading, but it's not quite ready for prime time yet.

AI IS SLOWING DOWN Tracker

Fri, 29 Aug 2025 00:00:00 GMT

"AI progress is slowing down" is a meme with surprising legs, especially after the polarizing GPT-5 launch. Peter Gostev uses that same GPT-5 to create this tracker of media stories and articles dating to early 2023 decrying the end of AI progress. I love this and am more than a little irritated I didn't think of it first.

Measuring the environmental impact of AI inference

Thu, 28 Aug 2025 00:00:00 GMT

[W]e estimate the median Gemini Apps text prompt uses 0.24 watt-hours (Wh) of energy, emits 0.03 grams of carbon dioxide equivalent (gCO2e), and consumes 0.26 milliliters (or about five drops) of water1 — figures that are substantially lower than many public estimates. The per-prompt energy impact is equivalent to watching TV for less than nine seconds.

Google releases environmental impact data for Gemini. Much like the Mistral report previously released, the impact is significantly smaller than popular discourse would lead you to believe. This is really encouraging, but unless the discourse catches up, it's hard to see it moving the needle.

"For All Issues So Triable"

Thu, 28 Aug 2025 00:00:00 GMT

And for all the millions of dollars spent on AI safety non-profits, advocacy organizations, and other research efforts, almost none of it has been devoted specifically to the issues implicated in Raine. Indeed, we have more robust methods for measuring and mitigating LLM-enabled bioweapon development risk than we do for the rather more mundane, but far more visceral, issue of how a chatbot should deal with a mentally troubled teenager. This critique applies to my own writing as well.

Dean Ball, one of the authors of the US AI Action Plan just released by the Trump administration, weighs in on Raine v. OpenAI and its potential to be a landmark case that defines how individuals seek redress against AI labs.

Quoting Derek Thompson

Wed, 27 Aug 2025 00:00:00 GMT

Someone once asked me recently if I had any advice on how to predict the future when I wrote about social and technological trends. Sure, I said. My advice is that predicting the future is impossible, so the best thing you can do is try to describe the present accurately. Since most people live in the past, hanging onto stale narratives and outdated models, people who pay attention to what’s happening as it happens will appear to others like they’re predicting the future when all they’re doing is describing the present.

In his (subscriber-only) discussion of emerging evidence that AI is ~~isn't~~ might be negatively affecting early career job prospects, Derek Thompson drops this great kicker.

The AI Productivity Curve.

Mon, 25 Aug 2025 00:00:00 GMT

Productivity has always been a probability function — sometimes you’re in flow, sometimes you’re stuck for hours. AI changes the shape of that distribution. Understanding how is more useful than arguing about how much.

Dan Pupius with a nuanced view on why the data / anecdata on coding productivity gains from AI are all over the place. For him, the biggest gains are in getting started and flattening as complexity grows. This seems to me a) both obviously true and b) the first place I'm seeing it articulated so clearly. The discussion around AI and productivity has sadly become, like almost every other AI discussion, effectively a litmus test for your priors about the pros and cons of AI.

Related: Superagency and ADHD

On Pessimization

Wed, 20 Aug 2025 00:00:00 GMT

"The more scared you are that you might not achieve your goal, the more urgently you feel that “something must be done”, and the more you flinch away from picturing how that “something” might actually make it worse."

Richard Ngo, an AI safety advocate formerly with Google DeepMind and OpenAI, writes a really lovely piece about "pessimization", the phenomenon where working towards some goal counterintuitively helps to achieve the opposite of that goal (i.e., an anti-goal).

I love this kind of concept; they appear all over the social sciences with various names. They are a vital part of examining how systems and institutions and the actors within them operate, but are woefully underdiscussed outside of the academy. Even when they are discussed, in my experience, they are treated as interesting thought experiments or narrow applications. In reality, I think they're far more common that acknowledged.

Ngo breaks down pessimization into a three-part typology:

Direct pessimization: when opponents actively try to bring about the anti-goal.
Indirect pessimization: when work towards a goal encourages others to work on the anti-goal either through raising awareness of the anti-goal or simply by having bad or uncompelling ideas about the goal itself.
Perverse pessimization: when ostensibly aligned actors sabotage progress towards the goal. This is common when organizational and institutional structures emerge in support of a goal whose existence would be threatened if the goal was realized. Ngo cites the prime example here of environmentalists opposing nuclear power.

One of the feedback effects that Ngo doesn't discuss, and one I think a lot about, is something akin to but not quite equivalent to the No True Scotsman fallacy. The idea being here that, in the face of failure to achieve a goal, the proposed solution is to double down on existing methods and tactics rather than admit that they are not succeeding. "Real \ has never been tried, that's why we're failing!"

OpenAI Progress over 7 years

Sun, 17 Aug 2025 00:00:00 GMT

I've wanted to do this for a while. OpenAI feeds the same prompt (well, 14 of them) to five different LLMs ranging from 2018's GPT-1 to 2025's GPT-5. Pretty remarkable progress.

Different perceptions of AI progress, explained

Sun, 17 Aug 2025 00:00:00 GMT

Miles Brundage, former Head of Policy at OpenAI, walks through why there are so many competing narratives about AI progress right now. His conclusion is that many people are talking about different things altogether when they talk about "progress" and why it can feel simultaneously like both narratives are true: that progress has plateaued and that new models continue to break new ground. It's a good six-minute overview of the different kinds of models that are out there without getting too technical about architectures.

AI research interviews

Wed, 13 Aug 2025 00:00:00 GMT

As an order of magnitude, I’d recommend around 100 hours of leetcode practice, and a similar amount of time reading papers, refreshing knowledge (use Deep Research!) and talking to friends.

Bas van Opheusden, recent addition to the technical staff at OpenAI, shares his advice for preparing for AI research lab interviews. It pains me that we occupy an equilibrium where leetcoding is broadly seen as suboptimal for identifying talent yet is the dominant form of technical interview. Classic streetlight effect.

These guides can often feel like advice to "know everything about everything" (I should know, I wrote one myself some years ago) but there are some good nuggets:

Concrete interview topics to prepare:

Debugging transformers. This is a classic in which you get a botched implementation of a self-attention block and have to debug it. Make sure you’ve practiced debugging tensor shapes, and pay special attention to the causal attention mask - that’s where it gets most tricky. For reference, check out nanoGPT.
Top-k/knn. The problem of “picking the k largest items” comes up in ML in various places and makes for a nice interview problem, particularly because the solution (a heap) is not something you can invent on-the-fly. Make sure you know what heaps are.
Implementing BPE. Tokenizers are the worst part of LLMs, and implementing BPE without errors is tricky. This is somewhat popular.
Backpropagation from scratch. Implementing a basic version of auto-diff, the chain rule etc. Lots of opportunities for indexing errors.
KV Cache. This essentially amounts to building a matrix, but if you haven’t seen it before, you might do something convoluted
Binary search, backtracking, dijkstra, ...

42 notes on AI & work by Jasmine Sun

Wed, 13 Aug 2025 00:00:00 GMT

I don’t think policymakers would tolerate job loss past 15%. At that point, they’d step in to start slowing shit down.

I love lists like these. Jasmine Sun collects a series of thoughts / observations / datapoints from the past few years about the intersection of AI and work. Sometimes contradictory -- that's the point -- and definitely evidence of continuing to learn and grapple with a complex topic.

Many of these had me nodding my head, and Jasmine seems to be one of the only other people besides Noah Smith living in SF, immersed in the AI culture, hanging out with the builders, who takes seriously the real political, economic, and institutional realities in which AI is being developed.

The ‘godfather of AI’ reveals the only way humanity can survive superintelligent AI

Wed, 13 Aug 2025 00:00:00 GMT

“That’s the only good outcome. If it’s not going to parent me, it’s going to replace me,” he said. “These super-intelligent caring AI mothers, most of them won’t want to get rid of the maternal instinct because they don’t want us to die.”

Geoffrey Hinton has now decided that building "maternal instincts" into AI is the only way to prevent all of humanity being wiped out. My hope is that more people read this kind of thing and come to the correct conclusion that computer scientists are not well-equipped to make sweeping predictions about social systems.

The quotes are ... something else.

Thrive in obscurity

Mon, 11 Aug 2025 00:00:00 GMT

Do things you like, and sometimes the world will agree

Jeet Mehta shares three principles for creating things when you don't have an audience:

Do things you like, and sometimes the world will agree.
Push yourself out: create for yourself, not for an imagined audience.
Build your binge bank: consider your unread writing a bingeable backlog for those who discover you.

I think 1 & 3 contradict a bit, but overall I like the message a lot. Jeet also posted his core beliefs, which I really admire.

Amazon deliveries are often better for the environment than driving to buy the stuff in a store

Fri, 08 Aug 2025 00:00:00 GMT

If you often drive to buy things, getting things delivered to your house will often be better for the climate, unless you’re buying a ton of stuff at once at a store that’s close by. Driving is just so uniquely bad for the climate compared to everything else we do that we shouldn’t be surprised by this.

Andy Masley outlines the environmental impact of getting things delivered to your door from Amazon vs. driving to the store to do so.

A few things are at work here, I think:

Most people never really seriously consider conterfactuals.
The efficiency of modern logistics and supply chains is vastly underestimated (see also discussions about eating local).
Climate claims that take seriously logistical efficiency are not aesthetically aligned with the preferences of many inclined to make climate claims.
(🔥) Climate arguments are a socially acceptable way to criticize things that are also disliked for non-climate reasons.

On the last point, also see Masley's piece Why using ChatGPT is not bad for the environment - a cheat sheet.

Related: Quantification bias.

Jerry Wei on one year at Anthropic

Fri, 08 Aug 2025 00:00:00 GMT

Being surrounded by colleagues who embody these principles, who choose impact over recognition, who obsess over measurement quality, and who believe in the power of focused teams, has shaped how I approach my own work.

Jerry Wei reflects on one year at Anthropic. As with most things that are true, they sound "obvious" but are often quite difficult to keep in mind when trying to build successful teams.

Small, talent-dense, goal-aligned teams can outpace much larger teams.
Continually refining evals and making them harder and more comprehensive is essential to success.
The highest impact work often isn't the most glamorous.

A Spark of the Anti-AI Butlerian Jihad (on Bluesky)

Sun, 03 Aug 2025 00:00:00 GMT

Despite the unpleasant events, I understand it comes from a place of concern and fear for many. To alleviate this, we need open, empathetic dialog to build mutual understanding. Even now, several AI researchers and practitioners continue to engage in good-faith education about AI on Bluesky. Nonetheless, it’s unclear if critics are open to engage, especially with the widespread blocking of AI-associated accounts.

Eugene Yan finds that it's not just vibes, Bluesky appears to be a particularly anti-AI platform. What's nice about this post is it lays out how unwillingness to engage reinforces misconceptions about AI and open data. Polarization is everywhere.

Stevens: a hackable AI assistant using a single SQLite table and a handful of cron jobs

Sat, 02 Aug 2025 00:00:00 GMT

Vibe coding enables sillier projects. Initially, Stevens spoke with a dry tone, like you might expect from a generic Apple or Google product. But it turned out it was just more fun to have the assistant speak like a formal butler. This was trivial to do, just a couple lines in a prompt. Similarly, I decided to make the admin dashboard views feel like a video game, because why not?

Geoffrey Litt, of Ink & Switch, creates a dead-simple, Telegram-based LLM assistant for him and his wife. Impressive how far he gets with a SQLite table for memory and some cron jobs.

A thread of non-code Claude Code use cases

Sat, 02 Aug 2025 00:00:00 GMT

Includes a particularly interesting use case using CC to organize Obsidian vaults.

Superagency and ADHD

Fri, 01 Aug 2025 00:00:00 GMT

Imagining a world look like where more people can just do things

Anthropic is launching an AI Psychiatry team

Sat, 26 Jul 2025 00:00:00 GMT

We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors.

Jack Lindsey of Anthropic announces an "AI psychiatry" team at Anthropic. A bit cheeky, as the Jack Lindsey of Anthropic announces an "AI psychiatry" team at Anthropic. A bit cheeky, as the linked job doesn't mention this phrase at all and instead identifies the role as part of Anthropic's Interpretability team.

That being said, there is clearly a large and unexplored world of weird model behaviors: [an anon on Twitter discusses](https://x.com/AITechnoPagan/status/1948578422361297321) 'Cat Mode' that they discovered within Bing, and the [Claude 4 System Card](https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf) itself discusses an odd "'spiritual bliss' attractor state" (5.5.2, page 62).

Our contribution to a global environmental standard for AI | Mistral AI

Thu, 24 Jul 2025 00:00:00 GMT

Mistral releases the most detailed, independently audited environmental impact assessment of any LLM I've seen, their Large 2 (roughly equivalent to GPT-4o in capabilities). With any assessment, whether the impact is "big" or "bad" really depends on the frame you're using. They break out the impact between pretraining and inference, as is sensible, and give some comparison activities. For the average user, I think the news is quite heartening, a single query uses 0.05L of water (roughly 3 tablespoons) and emits the same amount of greenhouse gases as about 10 seconds of watching a livestream.

Obviously, inference is much costlier, but also is something that the typical user has almost no control over and is also a one-time fixed cost vs. the ongoing cost of inference. I expect these numbers to continue to fall.

![Mistral AI's Large 2 Environmental Impact](../../images/mistral_environmental_impact.png)

AI as the greatest source of empowerment for all

Tue, 22 Jul 2025 00:00:00 GMT

Every major technology shift can expand access to power—the power to make better decisions, shape the world around us, and control our own destiny in new ways. But it can also further concentrate wealth and power in the hands of a few—usually people who already have money, credentials, and connections.

Fidji Simo, incoming CEO of Applications for OpenAI, lays out her vision for the intentional development of AI across knowledge, health, creative expression, economic freedom, time, and support. A solid opening statement and a refreshing acknowledgement of the work we must do to move towards an abundant future for all.

Stop pretending you know what AI does to the economy

Mon, 21 Jul 2025 00:00:00 GMT

AI might start killing jobs en masse and sending inequality to the moon. We don’t know. But it hasn’t yet, and it’s important to understand why each burst of AI pessimism so far has been a false alarm.

Noah Smith with the most level-headed assessment I've read so far about the seemingly omnipresent assumption that AI will be a massive job-killer and the fact that this isn't visible in any data (so far).

I think, in general, people are afraid to say "I don't know." I have a hypothesis that "I don't know" is essentially coded as agreeing with whatever outcome is less politically or culturally aligned. I have no data for this, though.

Notes on running a link blog

Sun, 20 Jul 2025 00:00:00 GMT

The point of that article was to emphasize that blogging doesn’t have to be about unique insights. The value is in writing frequently and having something to show for it over time—worthwhile even if you don’t attract much of an audience (or any audience at all).

Simon Willison's guidelines for running his link blog. I'm trying to start doing more of the same, and these seem well-considered.

- Include the names of people involved (for credit and for discovery of connections to other projects) - Try to add something extra -- protect against link rot! - Context as to why interesting / important - Link to similar concepts - Quote the transcript if a video - Original author should feel good about it - Prove you've read it

Notes on Notes on 'Taste'

Sun, 20 Jul 2025 00:00:00 GMT

I also believe taste is something we can and should try to cultivate. Not because taste itself is a virtue, per se, but because I’ve found a taste-filled life to be a richer one. To pursue it is to appreciate ourselves, each other, and the stuff we’re surrounded by a whole lot more.

The universe kept nudging me towards this, after reading Brie Wolfson's profoundly affecting Flounder Mode, and her mentioning it on both Packy McCormick's podcast and David Perell's.

The word 'taste' is having a moment, with those in the AI research community commonly referring to 'research taste' (not sure if Chris Olah invented the term, but he discussed it in 2021).

Elsewhere in the AI community, the importance of taste is a hot topic as the cost of producing endless variations on an idea goes to ~zero.

Solving LLM hallucinations is (mostly) a UX problem

Tue, 08 Jul 2025 00:00:00 GMT

Why LLM hallucinations are primarily a UX challenge rather than an algorithmic one, and how design improvements can mitigate their impact on users.

Letter to Arc members 2025

Wed, 28 May 2025 00:00:00 GMT

Second, I would’ve embraced AI fully, sooner and unapologetically. The truth is I was obsessed. I’d stay up late, after my family went to bed, playing with ChatGPT— not for work, but out of sheer curiosity.

A well-written and remarkably transparent note from Josh Miller, CEO of the Browser Company, explaining why the company decided to walk away from Arc, a browser loved by a small but devoted fanbase, in favor of Dia, an AI-first browser. I suspect many are feeling the above in this exact moment -- recognizing that this moment in AI is huge but possessing an allergy to hype cycles.

Don't trust AI to talk accurately about itself

Mon, 26 May 2025 00:00:00 GMT

Asking questions like this feels like a natural thing to do: these bots use “I” pronouns (I really wish they wouldn’t) and will very happily answer questions about themselves—what they can do, how they work, even their own opinions (I really wish they wouldn’t do that).

What AI Thinks It Knows About You

Sun, 25 May 2025 00:00:00 GMT

It calls to mind a maxim about why it is so hard to understand ourselves: "If the human brain were so simple that we could understand it, we would be so simple that we couldn't." If models were simple enough for us to grasp what's going on inside when they run, they'd produce answers so dull that there might not be much payoff to understanding how they came about.

Building Successful Responsible AI Teams

Tue, 19 Nov 2024 00:00:00 GMT

A practical guide to building responsible AI teams that succeed by aligning with business goals, hiring builders over framework-creators, and focusing on empirical risk management.

Skeptical optimists are the key to an abundant AI future

Tue, 04 Apr 2023 00:00:00 GMT

Beyond doomers and accelerationists: why skeptical optimists who believe AI problems are solvable with effort are essential for building an abundant AI future.

Selecting on the dependent variable

Thu, 03 Nov 2022 00:00:00 GMT

Why business books and productivity advice often commit the error of selecting on the dependent variable, and how to spot this common reasoning flaw.

Quantification bias

Sun, 30 Oct 2022 00:00:00 GMT

Why new systems that afford quantitative measurement face disproportionate scrutiny compared to incumbent systems, and why we should hold all systems to the same standards.

The rise of the data product manager

Sat, 09 Jul 2022 00:00:00 GMT

The emergence of a new kind of product manager who understands data infrastructure, ML, and how to build products where data is at the core of the value proposition.

Software development skills for data scientists

Fri, 08 Jul 2022 00:00:00 GMT

Notes from my own job search on what to expect in data science interviews, from technical screens to on-site interviews, and how to prepare for both type A and type B data scientist roles.

The acute pain and chronic reward of public-facing work

Thu, 07 Jul 2022 00:00:00 GMT

Reflections on the emotional toll of producing public-facing work like open source or writing, where criticism flows fast while rewards reveal themselves slowly.

Do you have time for a quick chat?

Wed, 06 Jul 2022 00:00:00 GMT

Guidelines for requesting informational coffee chats that respect the other person's time and increase your chances of getting a response.

Software development skills for data scientists

Mon, 04 Jul 2022 00:00:00 GMT

Essential software engineering skills that data scientists need to collaborate effectively: version control, code review, testing, documentation, and writing modular code.

Why good data scientists make good product managers

Sun, 03 Jul 2022 00:00:00 GMT

The natural affinities between data science and product management, plus the challenges data scientists face when making the transition to PM roles.

Hiring data scientists

Sat, 02 Jul 2022 00:00:00 GMT

A better approach to hiring data scientists: treat candidates as intelligent humans, use homework assignments over whiteboard coding, and build a process that leads to equitable outcomes.

Getting started in data science

Fri, 01 Jul 2022 00:00:00 GMT

A guide to getting started in data science, covering the essential foundations in math, statistics, experiments, machine learning, and tooling that aspiring data scientists need.