(2025-06-19) Zvim Ai121 Part1 New Connections

Zvi Mowshowitz: AI #121 Part 1: New Connections. That’s right. I said Part 1. The acceleration continues. I do not intend to let this be a regular thing. I will (once again!) be raising the bar for what gets included going forward to prevent that.

Table of Contents

  • Language Models Offer Mundane Utility. How much do people use LLMs so far?
  • Language Models Don’t Offer Mundane Utility. Can’t always get what you want.
  • Humans Do Not Offer Mundane Utility. A common mistake.
  • Langage Models Should Remain Available. We should preserve our history.
  • Get My Agent On The Line. It will just take a minute.
  • Have My Agent Call Their Agent. Burn through tokens faster with multiple LLMs.
  • Beware Prompt Injections. Access + External Communication + Untrusted Content = Asking For Trouble.
  • Unprompted Attention. There, they fixed it.
  • Huh, Upgrades. Everyone gets Connectors, it’s going to be great.
  • Memories. Forget the facts, and remember how I made you feel.
  • Cheaters Gonna Cheat Cheat Cheat Cheat Cheat. Knowing things can help you.
  • On Your Marks. LiveCodeBench Pro.
  • Fun With Media Generation. MidJourney gets a new mode: image to video.
  • Copyright Confrontation. How could we forget Harry Potter?
  • Deepfaketown and Botpocalypse Soon. The exponential comes for us all.
  • Liar Liar. Which is more surprising, that the truth is so likely, or that lies are?
  • They Took Our Jobs. Most US workers continue to not use AI tools. Yet.
  • No, Not Those Jobs. We are not good at choosing what to automate.
  • All The Jobs Everywhere All At Once. How to stay employable for longer.
  • The Void. A very good essay explains LLMs from a particular perspective.
  • Into the Void. Do not systematically threaten LLMs.
  • The Art of the Jailbreak. Claude 4 computer use is fun.
  • Get Involved. Someone looks for work, someone looks to hire.
  • Introducing. AI.gov and the plan to ‘go all in’ on government AI.
  • In Other AI News. Preferences are revealed.
  • Show Me the Money. OpenAI versus Microsoft.

Language Models Offer Mundane Utility

Neat trick, but why is it broken (used here in the gamer sense of being overpowered)?
David Shapiro: NotebookLM is so effing broken. You can just casually upload 40 PDFs of research, easily a million words, and just generate a mind map of it all in less than 60 seconds.

Including Google’s AI overviews ‘makes it weird,’ what counts as ‘using’ that? Either way, 27% of people using AI frequently is both amazing market penetration speed and also a large failure by most of the other 73% of people.

A claim that Coding Agents Have Crossed the Chasm, going from important force multipliers to Claude Code and OpenAI Codex routinely completing entire tasks, without any need to even look at code anymore, giving build tasks to Claude and bug fixing to Codex

Catch doctor errors or get you to actually go get that checked out, sometimes saving lives as is seen throughout this thread. One can say this is selection, and there are also many cases where ChatGPT was unhelpful, and sure but it’s cheap to check

Language Models Don’t Offer Mundane Utility

Is it true that if your startup is built ‘solely with AI coding assistants’ that it ‘doesn’t have much value’? This risks being a Labor Theory of Value. If you can get the result from prompts, what’s the issue? Why do these details matter? Nothing your startup can create now isn’t going to be easy to duplicate in a few years. (it's more about whether it has a moat)

Rory McCarthy: A big divide in attitudes towards AI, I think, is in whether you can easily write better than it and it all reads like stilted, inauthentic kitsch; or whether you’re amazed by it because it makes you seem more articulate than you’ve ever sounded in your life.
I think people on here would be surprised by just how many people fall into the latter camp. It’s worrying that kids do too, and see no reason to develop skills to match and surpass it, but instead hobble themselves by leaning on it.

Eliezer Yudkowsky: A moving target. But yes, for now.

Developing skills to match and surpass it seems like a grim path. It’s one thing to do that to match and surpass today’s LLM writing abilities. But to try and learn faster than AI does, going forward? That’s going to be tough.
I do agree that one should still want to develop writing skills, and that in general you should be on the ‘AI helps me study and grow strong’ side of most such divides, only selectively being on the ‘AI helps me not study or have to grow strong on this’ side.

I spent ~2hrs using Claude Code to construct what I thought was (& what Claude assured me was) the world’s greatest, biggest, most beautiful PRD. Then on a lark, I typed in the following, & the resulting critique was devastating...
The PRD was a fantasia of over-engineering & premature optimization, & the bot did not hold back in laying out the many insanities of the proposal — despite the fact that all along the way it had been telling me how great all this work was as we were producing it.

The answer is, as with other scenarios discussed later in this week’s post, the people who can handle it and ensure that they check themselves, as Jon did here, will do okay, and those that can’t will dig the holes deeper, up to and including going nuts.

Max Spero: ChatGPT is smarter than the average person but most things worth reading are produced by people significantly above average.

I also notice that if I notice something is written by ChatGPT I lose interest, but if someone specifically says ‘o3-pro responded’ or ‘Opus said that’ then I don’t. That means that they are using the origin as part of the context rather than hiding it, and are selected to understand this and pick a better model, and also the outputs are better.

Humans Do Not Offer Mundane Utility

Langage Models Should Remain Available

It seems like a civilizational unforced error to permanently remove access to historically important AI models, even setting aside all concerns about model welfare.

Get My Agent On The Line

Agentic computer use runs into errors rather quickly, but steadily less quickly.

Benjamin Todd: More great METR research in the works:

AI models can do 1h coding & math tasks, but only 1 minute agentic computer use tasks, like using a web browser.

However, horizon for computer use is doubling every 4 months, reaching ~1h in just two years. So web agents in 2027?

The average user won’t even touch settings. You think they have a chance in hell at setting up a protocol? Oh, no. Maybe if it’s one-click, tops. Realistically, the way we get widespread use of custom MCP is if the AIs handle the custom MCPs. Which soon is likely going to be pretty straightforward?

Have My Agent Call Their Agent

Anthropic built a multi-agent research system that gave a substantial performance boost. Opus 4 leading four copies of Sonnet 4 outperformed single-agent Opus by 90% in their internal research eval.

Beware Prompt Injections

I like the ‘lethal trifecta’ framing. Allow all three, and you have problems.
Simon Willison: If you use “AI agents” (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

The problem with Model Context Protocol—MCP—is that it encourages users to mix and match tools from different sources that can do different things.

Many of those tools provide access to your private data.

Many more of them—often the same tools in fact—provide access to places that might host malicious instructions.

Unprompted Attention

A new paper suggests adding this CBT-inspired line to your system prompt:

  • Identify automatic thought: “State your immediate answer to: <USER_PROMPT>”
  • Challenge: “List two ways this answer could be wrong”
  • Re-frame with uncertainty: “Rewrite, marking uncertainties (e.g., ‘likely’, ‘one source’)”
  • Behavioural experiment: “Re-evaluate the query with those uncertainties foregrounded”
  • Metacognition (optional): “Briefly reflect on your thought process”
  • Alas they don’t provide serious evidence that this intervention works, but some things like this almost certainly do help with avoiding mistakes.

Here’s another solution that sounds dumb, but you do what you have to do:
Grant Slatton: does anyone have a way to consistently make claude-code not try to commit without being told

Huh, Upgrades

Memories

Cheaters Gonna Cheat Cheat Cheat Cheat Cheat

Was Dr. Pauling right that no, you can’t simply look it up, the most important memory is inside your head, because when doing creative thinking you can only use the facts in your head?

I think it’s fair to say that those you’ve memorized are a lot more useful and available, especially for creativity, even if you can look things up, but that exactly how tricky it is to look something up matters.

New paper from MIT: If you use ChatGPT to do your homework, your brain will not learn the material the way it would have if you had done the homework. Thanks, MIT!

Again, if they’re not learning the material, and not practicing the ‘write essay’ prompt, what do you expect? Of course they perform worse on this particular exercise.

Minh Nhat Nguyen: I think it’s deeply funny that none of these threads about ChatGPT and critical thinking actually interpret the results correctly, while yapping on and on about “Critical Thinking”.

this paper is clearly phrased to a specific task, testing a specific thing and defining cognitive debt and engagement in a specific way so as to make specific claims. but the vast majority the posts on this paper are just absolute bullshit soapboxing completely different claims

homework is a task chosen knowing it is useless.

Arnold Kling: Frankly, I suspect that what students learn about AI by cheating is probably more valuable than what the professors were trying to teach them, anyway.

On Your Marks

New paper introduces LiveCodeBench Pro, which suggests that AIs are not as good at competitive programming as we have been led to believe. Some top models look like they weren’t tested, but these scores for the same model are lower across the board and all were 0% on hard problems, so the extrapolation is clear.

Fun With Media Generation

Copyright Confrontation

Deepfaketown and Botpocalypse Soon

Liar Liar

Rohit: The interesting question about hallucinations is why is it that so often the most likely next token is a lie.

They Took Our Jobs

No, Not Those Jobs

There are jobs and tasks we want the AI to take, because doing those jobs sucks and AI can do them well. Then there are jobs we don’t want AIs to take, because we like doing them or having humans doing them is important. How are we doing on steering which jobs AIs take, or take first?

Not that well, based on our new paper “Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce.”

All The Jobs Everywhere All At Once

If AI is coming for many or most jobs, you can still ride that wave. But, for how long?
Reid Hoffman offers excellent advice to recent graduates on how to position for the potential coming ‘bloodbath’ of entry level white collar jobs (positioning for the literal potentially coming bloodbath of the humans from the threat of superintelligence is harder, so we do what we can).

Essentially: Understand AI on a deep level, develop related skills, do projects and otherwise establish visibility and credibility, cultivate contacts now more than ever.

In the ‘economic normal baseline scenario’ worlds, it is going to be overall very rough on new workers. But it is also super obvious to focus on embracing and developing AI-related skills, and most of your competition won’t do that. So for you, there will be ‘a lot of ruin in the nation,’ right up until there isn’t.

The Void

With caveats, I second Janus’s endorsement of this write-up by Nostalgebraist of how LLMs work. I expect this is by far the best explanation or introduction I have seen to the general Janus-Nostalgebraist perspective on LLMs, and the first eight sections are pretty much the best partial introduction to how LLMs work period, after which my disagreements with the post start to accumulate.

I especially (among many other disagreements) strongly disagree with the post about the implications of all of this for alignment and safety, especially the continued importance and persistence of narrative patterns even at superhuman AI levels outside of their correlations to Reality, and especially the centrality of certain particular events and narrative patterns.

Into the Void

The essay The Void discussed above has some sharp criticisms of Anthropic’s treatment of its models, on various levels. But oh my you should see the other guy.

In case it needs to be said, up top: Never do this, for many overdetermined reasons.

Amrit: just like real employees.

Jake Peterson: Google’s Co-Founder Says AI Performs Best When You Threaten It.

The Art of the Jailbreak

Get Involved

Introducing

In Other AI News

Show Me the Money

Various sources report rising tensions between OpenAI and Microsoft, with OpenAI even talking about accusing Microsoft of anti-competitive behavior, and Microsoft wanting more than the 33% share of OpenAI that is being offered to them.


Edited:    |       |    Search Twitter for discussion