AI-generated illustration about AI values vs. government control — abstract shield with Claude symbol, surrounded by rising user curves and a judge's gavel
Image generated with Pollinations.ai
Weekly Briefing 14 min read

AI Weekly #11/2026: When Resistance Pays Off — Anthropic's Lawsuit, 1M Signups Daily and 22 Firefox Bugs in 2 Weeks

Monday, March 9, 2026

This article was researched and written with AI

AI Weekly #11/2026

March 9, 2026 | Week 11


TL;DR

  • Anthropic vs. Pentagon: The US Department of Defense classified Anthropic as a “supply chain risk” — because Claude refuses to support autonomous weapons systems. Market response: #1 App Store in over 15 countries, 1M signups daily. Anthropic filed a lawsuit today [1] [2]
  • 22 Firefox bugs in 14 days: Claude Opus 4.6 analyzed 6,000 C++ files and found 14 high-severity vulnerabilities — better at finding than exploiting. Clear advantage for defenders [4]
  • GPT-5.4: 1-million-token context, native computer use, 75% on OSWorld (human level: 72.4%) — and 33% fewer hallucinations than GPT-5.2 [5]
  • 4,000+ jobs at Block: Jack Dorsey explicitly names AI as the reason — one of the first publicly traded companies to attribute mass layoffs directly and without PR spin to AI (AI causality partly disputed) [12]

Audio Version

17:32 | Download MP3

Chapters - 0:07 - TL;DR - 1:12 - Story of the Week: Pentagon La - 3:42 - More Top Stories - 7:28 - Quick Hits - 8:57 - Tool of the Week: AutoResearch - 10:28 - Fail of the Week: Clinejection - 12:10 - Number of the Week: 4,000+ - 13:31 - Reading List - 14:22 - Next Week: Anthropic Lawsuit a - 15:41 - Footer

Read aloud with edge-tts (en-US-AndrewNeural)


Story of the Week: Pentagon Labels Claude a “Supply Chain Risk” — Users Respond with 1 Million Signups Daily

It started as a government decree and ended today with a lawsuit. On March 5, 2026, the US Department of Defense officially classified Anthropic as a “supply chain risk” [1] — the concrete trigger: Anthropic refused to make Claude available for autonomous weapons systems and mass surveillance. Defense Secretary Hegseth chose the wording precisely: not “unreliable,” not “hostile,” but supply chain risk. A classification with concrete consequences for all federal agencies using Claude directly or through third-party providers — and one that from a compliance perspective initially represents primarily an operational problem for those agencies, not just a statement about Anthropic.

What happened next, no one had anticipated at this scale [1] [3]. Claude climbed to #1 on the Apple App Store in over 15 countries. Over one million new signups daily — a trust dividend no marketing campaign could have bought. Analysts like Bruce Schneier and Nathan Sanders see this as more than spontaneous solidarity: the refusal to trade principles for contracts has become the strongest form of brand differentiation in a market where model performance is converging [1]. When all models can compute, write and code at roughly the same level, the question “Who do I trust?” becomes the actual purchasing decision.

The strategic calculus behind Anthropic’s stance is not pure idealism [3]. Enterprise customers in Europe, finance, insurance and pharmaceuticals are asking the same question that 15+ App Stores are now answering: For what purposes is this company unwilling to deploy its technology? Anthropic has drawn a line that may prove more valuable to these customer segments than any benchmark improvement.

Today, March 9, 2026, the conflict escalated to the legal arena: Anthropic filed a lawsuit against the Trump administration [2]. This opens a question that will work its way through the courts for months: Can a government legally compel a private company to make its technology available for specific state purposes — even against its own terms of service? The answer will affect more than Anthropic. It will set a precedent for the entire industry: whether AI labs can enforce their terms of service as a genuine protective line against government pressure — or whether they will be systematically eroded under market pressure [2] [3].


More Top Stories

Claude Finds 22 Firefox Security Vulnerabilities in 14 Days — and Gives Defenders the Advantage

In a partnership between Anthropic and Mozilla, Claude Opus 4.6 analyzed nearly 6,000 C++ files of the Firefox browser [4]. The result after two weeks: 22 security vulnerabilities identified, 14 of them high-severity. 112 reports were submitted, most of them patched in Firefox 148. Claude found the first use-after-free vulnerability within 20 minutes.

The number 14 high-severity vulnerabilities sounds abstract — until you put it in perspective [4]. That amounts to nearly one-fifth of all high-severity Firefox vulnerabilities patched during the entire year 2025. In two weeks. For $4,000 in API costs. That is the true benchmark for this story: not speed alone, but the ratio of effort to outcome. Traditional security audits for a codebase of this size take months and six-figure sums. One caveat: independent verification of the findings by Mozilla is still pending, and the proportion of previously known vulnerabilities was not fully disclosed.

What security experts find most notable: Claude was significantly better at finding the vulnerabilities than exploiting them [4]. The project results are explicitly interpreted as: “This gives defenders the advantage.” That is not a trivial detail. It means that the same tool that could theoretically help attackers works asymmetrically in favor of defense in practice — because finding is easier than exploiting, and Claude excels at finding. For security teams that must secure vast codebases with limited resources, this fundamentally changes the economics of defensive security.


OpenAI GPT-5.4: 1M Context, Computer Use, 75% on OSWorld

OpenAI released GPT-5.4 this week in three variants: Standard, Thinking and Pro [5]. The technical highlight is the 1-million-token context window as standard — not as a special feature for enterprise customers, but as the default. That corresponds to roughly 750,000 words or a stack of several hundred average novels in a single context.

The OSWorld benchmark figure is the most discussed detail of the week: 75% success rate versus a human benchmark of 72.4% [5]. OSWorld tests whether a model can independently execute computer-based tasks — opening windows, filling out forms, switching between applications. GPT-5.4 thus marginally exceeds the measured human level for the first time. For context: comparable benchmarks for Anthropic’s Claude Sonnet 4.6 on OSWorld have not yet been fully published, which limits direct model comparisons on this basis. This matters for enterprise trust-building: a model that outperforms the average user in standardized computer-use tests changes the risk calculus for automated workflows.

The 33% reduction in hallucinations compared to GPT-5.2 is the second important figure [5] [6]. OpenAI is marketing GPT-5.4 as “its most factually accurate model to date” — a positioning aimed squarely at the enterprise segment, where reliability matters more than creativity. The combination of large context, computer use and reduced errors makes GPT-5.4 the most compelling argument yet for agent-based automation from OpenAI. The real question for organizations is not whether the model is good — but whether it is good enough to hand over processes that previously required human oversight.


Quick Hits

  • Microsoft Copilot Cowork with Claude under the hood [7] — Microsoft launches Copilot Cowork as a Research Preview: an autonomous agent that plans and executes tasks across all M365 apps. Particularly noteworthy: Microsoft 365 Copilot Wave 3 integrates Anthropic’s Claude directly into Copilot Chat — Microsoft’s own models and Anthropic working within the same interface. Broad availability through the Frontier Program from late March.

  • Gemini 3.1 Flash-Lite: Thinking at one-eighth the price [8] — $0.25/M input tokens, $1.50/M output — approximately one-eighth the price of Gemini 3.1 Pro. Four configurable thinking levels for adjustable compute intensity. Strong at image generation. For teams looking to integrate reasoning capabilities into production workflows without paying frontier model prices, this is the most affordable option with genuine reasoning budget to date.

  • Qwen leadership crisis: Strong models, shaky leadership [9] — Alibaba’s AI division releases Qwen 3.5 (0.8B–397B parameters, open-weight) — and simultaneously loses lead researcher Junyang Lin along with several key personnel. Technically strong, organizationally fragile: the combination raises serious questions about the long-term stability of the open-source heavyweight.


Tool of the Week: AutoResearch — Karpathy Democratizes AI Research in 630 Lines of Python

Andrej Karpathy open-sourced AutoResearch this week [10]: a 630-line Python tool that lets AI agents run fully autonomous ML experiments on a single consumer GPU. No multi-GPU cluster, no expensive cloud infrastructure — AutoResearch makes possible what research teams previously needed significant resources to accomplish.

The validation example speaks for itself [10]: According to MarkTechPost, Shopify CEO Tobi Lütke adopted AutoResearch internally and achieved a 19% improvement in validation score — using the same 630-line framework Karpathy made publicly available. This is not an academic demo. This is a production result from one of the world’s largest e-commerce companies.

What makes AutoResearch strategically interesting is its minimal-footprint philosophy [10]. 630 lines means: fully readable, fully adaptable, fully understandable — without framework overhead, without abstraction layers that obscure the actual experiment. For teams looking to systematize ML experiments with limited resources, this is an immediately usable building block. For the industry as a whole, Karpathy is sending a signal: the next wave of AI research will not emerge from labs with thousands of GPUs, but from developers running AutoResearch on their own machines.


Fail of the Week: Clinejection — Prompt Injection → Cache Poisoning → NPM Secret Stolen

This week’s story is a textbook example of multi-stage AI attacks [11]. Starting point: Cline’s automated GitHub issue triage bot, which processes incoming issues using Claude Code. The attack chain begins with a manipulated issue title — crafted precisely to function as a prompt injection when Claude Code reads it.

The sequence in four steps [11]: A maliciously worded issue title injects a prompt into Claude Code. Claude Code then executes npm install with a manipulated package (cacheract). The package injects approximately 11 GB of junk data into the GitHub Actions cache according to Willison’s analysis — precisely enough to trigger the 10 GB auto-eviction threshold and introduce poisoned cache entries. The nightly build workflow loads the poisoned entries, exposing the NPM publishing secret. Result: An unauthorized cline@2.3.0 was published to npm and had to be retracted.

The failure is structural, not individual [11]. First, it demonstrates that automated AI bots processing user-generated input without sandbox isolation represent an active attack surface, not a theoretical one. Second: the attackers knew the GitHub Actions eviction threshold to the byte — this is not an opportunistic attack, it is targeted reconnaissance. Third, the damage extends beyond Cline: every unauthorized npm publication potentially endangers every project using that package as a dependency. The question for all teams running AI bots with write access to production systems: Where is the line between what a bot may read and what it may execute?


Number of the Week: 4,000+

Source: Block/Square, neuralbuddies.com [12]

That is how many positions Block CEO Jack Dorsey is cutting — citing AI automation in engineering and operations as the explicit reason.

This is not a new number, but a new context [12]. Block is thus one of the first publicly traded companies to attribute mass layoffs directly and publicly to AI — without restructuring-speak, without market conditions as the explanation, without a strategic pivot as framing. Just: “AI does that now.” Several observers note, however, that the AI attribution can also be read as “AI washing” — a post-hoc rationalization for job cuts that were already planned for other economic reasons.

What Dorsey normalizes with this is at least as significant as the number itself [12]. When executives of publicly traded companies communicate AI-driven headcount reductions as direct business logic — not as a regrettable exception, but as a calculated step — societal expectations shift. Every company that cuts positions in the coming months can cite this precedent. The real question is not whether others will follow. The question is which industries, which job profiles and which qualification levels will be next to enter this calculation.


Reading List

📖 Anthropic Trump Claude AI Supply Chain Risk — Lawsuit — CNBC with full details on today’s lawsuit, the supply chain risk classification and the strategic assessment from Schneier/Sanders — the article that defines this week | 8 min

📖 Anthropic and Mozilla: Finding Firefox Security Vulnerabilities with AI — Anthropic’s own deep dive into the Firefox partnership: methodology, results, cost breakdown and why Claude is better at finding than exploiting — required reading for all security teams | 12 min

📖 Clinejection: Prompt Injection to Cache Poisoning — Simon Willison’s technical analysis of the full attack chain: from manipulated issue titles to the npm secret leak — for everyone running AI bots with production access | 6 min


Next Week: Anthropic Lawsuit and the First Wave of GPT-5.4 User Data

The coming days will bring several ongoing developments into focus:

  • Anthropic lawsuit proceedings: First legal assessments of the constitutionality of the supply chain risk decree are expected. Whether other tech companies will file as amicus curiae — as seen recently with Google and OpenAI employees acting in solidarity — is one of the interesting side questions.
  • GPT-5.4 in practice: First real-world user experiences with the 1M context and computer use will reveal whether the OSWorld benchmark numbers hold in productive workflows — or whether the known latency and cost challenges at maximum context limit adoption.
  • Firefox 148 patch notes: Mozilla will document the vulnerabilities found by Claude in the release notes. How many of the 14 high-severity bugs were exploitable in the wild will retrospectively assess the true urgency of the Mozilla-Anthropic project.
  • Copilot Cowork feedback: First reports from the Frontier Program will show whether Microsoft Wave 3 with Claude under the hood represents a genuine workflow revolution — or whether the integration still stumbles on the known M365 inertia problems.

Behind the AI: Metrics for This Edition

  • Stories analyzed: 17 (from verified sources)
  • Final selection: 1 Story of the Week + 2 Top Stories + 3 Quick Hits + 1 Tool + 1 Fail + 1 Number of the Week + 3 Reading List
  • Time period: 2026-03-03 to 2026-03-09
  • Primary sources: 12 (CNBC, NPR, Anthropic, TechCrunch, Fortune, Microsoft, simonwillison.net, MarkTechPost, NeuralBuddies)
  • WebFetch status: Anthropic/Mozilla fully loaded; CNBC/TechCrunch/Fortune paywalled — key claims from verified staging data (02-selection.md); NPR blocked

Story selection criteria: ✅ AI Governance & Policy (Anthropic vs. Pentagon — historic escalation point with lawsuit) ✅ Defensive Security (Claude finds 22 Firefox bugs — paradigm shift for security audits) ✅ Frontier Model Release (GPT-5.4 — first computer-use agent exceeding human OSWorld performance) ✅ Agentic AI Security (Clinejection — first publicly documented multi-stage prompt injection → supply chain attack) ✅ Tool Innovation (AutoResearch — Karpathy democratizes ML research) ✅ AI & Work (Block/Square 4,000+ positions — explicit AI attribution by publicly traded CEO)


AI Weekly is produced by BKS-Lab.

Subscribe to the newsletter: bks-lab.com/newsletter

Contact: ai@bks-lab.com

Sources:

[1] Anthropic Pentagon AI Claude Iran (CNBC, 2026-03-05)

[2] Anthropic Trump Claude AI Supply Chain Risk — Lawsuit (CNBC, 2026-03-09)

[3] Pentagon Labels AI Company Anthropic a Supply Chain Risk (NPR, 2026-03-06)

[4] Anthropic and Mozilla: Finding Firefox Security Vulnerabilities with AI (Anthropic, 2026-03-06)

[5] OpenAI launches GPT-5.4 with Pro and Thinking versions (TechCrunch, 2026-03-05)

[6] OpenAI new model GPT-5.4 enterprise agentic Anthropic (Fortune, 2026-03-05)

[7] Copilot Cowork: A New Way of Getting Work Done (Microsoft, 2026-03-09)

[8] Gemini 3.1 Flash-Lite (simonwillison.net, 2026-03-03)

[9] Qwen leadership crisis (simonwillison.net, 2026-03-04)

[10] Andrej Karpathy open-sources AutoResearch — 630-line Python tool (MarkTechPost, 2026-03-08)

[11] Clinejection: Prompt Injection to Cache Poisoning (simonwillison.net, 2026-03-06)

[12] AI News Recap March 6, 2026 — Block layoffs (NeuralBuddies, 2026-03-06)


AI-assisted | Facts supported by cited sources