AI-generated illustration: Abstract geometric shapes and mathematical formulas in blue-purple tones, symbolizing a breakthrough in combinatorics
Image generated with Pollinations.ai
Weekly Briefing 12 min read

AI Weekly #21/2026: AI Cracks 80-Year-Old Math Problem

Sunday, May 24, 2026

This article was researched and written with AI

TL;DR

This week in 30 seconds:

  • Math milestone: OpenAI’s reasoning model disproves the Erdős Unit Distance Problem after 80 years — verified by three independent top mathematicians, not a marketing claim
  • Security shift: Anthropic’s Project Glasswing finds 10,000+ critical vulnerabilities in open-source software — the new bottleneck isn’t finding them, it’s patching them
  • Google all-in: Gemini 3.5 Flash outperforms Gemini 3.1 Pro at lower cost, 1 billion monthly AI Search users, and the search bar looks different for the first time in 25 years
  • Music law rewritten: Spotify and Universal Music legalize AI covers and remixes as a premium add-on — with revenue share for artists and an explicit consent model

Audio Version

14:15 | Download MP3

Chapters - 0:00 - TL;DR - 0:50 - Story of the Week - 3:11 - More Top Stories - 8:19 - Quick Hits - 9:05 - Tool of the Week - 10:23 - Fail of the Week - 12:23 - Reading List - 13:32 - Next Week

Read aloud with edge-tts (en-US-AndrewNeural)


Story of the Week

OpenAI AI Disproves 80-Year-Old Mathematics — And This Time It’s Real

For nearly 80 years, mathematicians considered it settled that square lattice arrangements represented the optimal solution to the Erdős Unit Distance Problem. On May 20, 2026, an OpenAI reasoning model proved that assumption wrong [1].

The Erdős Unit Distance Problem, posed in 1946, asks: how many point pairs in a set of n points can share exactly the same distance? The model found new geometric constructions that systematically outperform classical lattice arrangements — through an unexpected connection between combinatorics and algebraic number theory [2]. OpenAI describes it as “the first time AI has autonomously solved a prominent open problem central to a field of mathematics” [2].

“AI is helping us to more fully explore the cathedral of mathematics we have built over the centuries.” [1]

— Thomas Bloom, Erdős Problems Website

What sets this announcement apart from previous ones: three independent top mathematicians externally verified the solution — Noga Alon, Melanie Wood, and Thomas Bloom (who also serves as the quote source) [1]. This is a deliberate contrast to the October 2025 debacle, when OpenAI exec Kevin Weil claimed GPT-5 had solved ten Erdős problems — a claim that proved baseless and damaged trust in AI research announcements [1].

Who should care? Anyone using or planning to use AI as a research tool: the breakthrough shows that AI models can not only verify existing proofs but find genuinely new mathematical constructions — using approaches humans hadn’t discovered in 80 years. This has direct implications for cryptography, algorithm design, and theoretical computer science.

Critical context: OpenAI has shown transparency problems with previous research claims. External peer review by at least three mathematicians is the decisive quality marker this time. Open questions remain about how far the model’s reasoning extends beyond the specific task and what this tells us about transfer capabilities to other mathematical domains.

Bottom line: An AI model has autonomously solved an open mathematical problem for the first time — verified, reproducible, 80 years after Erdős.


More Top Stories

Anthropic Project Glasswing: 10,000+ Vulnerabilities — And That’s Just the Beginning (Company Announcement)

The bottleneck in software security has fundamentally shifted. Anthropic’s Project Glasswing, deploying Claude Mythos Preview together with around 50 partner organizations, identified over 10,000 highly critical security vulnerabilities in its first weeks, according to Anthropic [3]. In over 1,000 open-source projects alone, the system found 6,202 critical vulnerabilities per Anthropic’s report. 90.6% of AI-found vulnerabilities were externally confirmed as valid, with 62.4% rated as high or critical severity [3].

For context: classical static analysis tools (SAST) typically validate only 10–30% of their findings as genuine vulnerabilities — the reported rate of 90.6% would be a significant improvement, though this comparison figure does not come from the Anthropic report itself.

Numbers from individual partners are impressive: Cloudflare discovered 2,000 bugs (400 of them critical or high severity) per the report; Mozilla found 271 vulnerabilities in Firefox 150 — ten times more than in previous testing [3]. The UK AI Security Institute confirmed Mythos Preview as the first model ever to fully solve both of their cyber simulations [3].

Anthropic articulates the new problem clearly:

“Progress on software security is now limited by how quickly we can verify, disclose, and patch.” [3]

Of 10,000+ vulnerabilities found, only 75 patches are available so far, with an average patching time of around two weeks [3].

Bottom line: AI has democratized and scaled vulnerability discovery — the next critical challenge is whether human patch capacity can keep up before attackers exploit published findings.


Google I/O 2026: Everything Becomes AI — And the Search Bar Looks New After 25 Years

Google delivered a week of announcements at I/O 2026 that together paint a clear picture: the entire product portfolio is being rebuilt AI-first [4]. Gemini 3.5 Flash outperforms Gemini 3.1 Pro on coding and agent benchmarks (76.2% on Terminal-Bench 2.1) — at lower cost [4]. Note: all cited benchmark figures compare Gemini models against each other; comparisons against Anthropic’s Claude or OpenAI’s models are absent from Google’s official communications. More relevant for developers is Google Antigravity: a new agent-first development platform with a desktop app, CLI, and SDK for multi-agent orchestration, directly competing with Anthropic’s Claude SDK and OpenAI’s Agents platform [4].

AI Mode in Google Search has surpassed one billion monthly users, with queries doubling every quarter [4]. The new search bar accepts image, video, and file uploads for the first time — the first fundamental redesign in over 25 years [4] [8]. All generated content receives imperceptible SynthID watermarks, which have been verified 50 million times globally to date [4].

Android XR glasses in partnership with Gentle Monster, Warby Parker, and Samsung are slated for fall 2026 [4].

Bottom line: Google demonstrates that the battle for the AI platform won’t be decided by model benchmarks alone — but by integration into existing products with a billion users.


The first major label-streaming licensing deal for AI-generated music has arrived: Spotify and Universal Music Group have agreed to allow fans to officially create AI covers and remixes of UMG artists — as a paid add-on for Spotify Premium [5]. Participating artists receive a revenue share; participation is based on their explicit consent (“Artist-First” approach) [5].

“What we’re building is grounded in consent, credit, and compensation for the artists.” [5]

— Spotify Co-CEO

The contrast with Suno and Udio is intentional: instead of litigation over existing catalogs, this model relies on upfront agreements [5]. UMG is the world’s largest music label — whatever standard is set here will influence the entire industry. The add-on’s price and specific launch date have not yet been announced [5].

Critical voices come from indie artists and professional associations like the UK Musicians’ Union: the consent model only applies to artists contracted with UMG. Independent musicians and smaller labels are left out — facing a choice of signing similar deals or fighting against a new de-facto industry practice.

Bottom line: The music industry has found a workable path to monetize AI creativity without bypassing artists — a model that puts pressure on other labels but leaves important questions open for indie artists.


Quick Hits

Briefly noted:

  • KPMG + Claude (Company Announcement): One of the Big Four is rolling out Claude to 276,000 employees — not a pilot, enterprise-wide integration into core business [6]
  • Hark $700M: Brett Adcock (Figure AI, Archer Aviation) raises $700 million Series A at a $6 billion valuation for a still-secret “Universal AI Interface” with 70 employees and its own B200 GPU data center [7]
  • Google Search Redesign: The iconic Google search bar gets its first fundamental redesign in over 25 years — now with text, voice, and image support as an AI-powered conversational interface [8]

Tool of the Week

NVIDIA Nemotron-Labs Diffusion — Text generation in parallel blocks instead of token by token

While all other LLMs output text sequentially, NVIDIA’s new Diffusion Language Models (DLM) generate 32 tokens simultaneously — achieving a 6.4× speedup over standard autoregression [9]. On NVIDIA B200, that’s ~865 tokens/second (4× faster than the autoregressive baseline) [9].

What’s special: a single model supports three modes without code changes — autoregressive (classic), diffusion (blocks), and self-speculation (hybrid) [9]. NVIDIA describes the key advantage:

“Not only can these models better leverage the computational model of modern GPUs, they can also revise generated tokens, making them more suitable for revising existing text.” [9]

The models are available as Open Models (3B, 8B, 14B text; 8B vision-language) via HuggingFace [9]. Particularly relevant for teams optimizing inference costs or working on text-revision workflows — the ability to revise already-generated tokens is a genuine differentiator versus classical autoregressive models.

→ Nemotron-Labs Diffusion on HuggingFace


Fail of the Week

“$50M ARR — $42M of it actually active”

A systemic problem runs through the AI startup scene: startups and their investors present “Contracted ARR” (CARR) as regular ARR — counting contracts not yet activated, free pilot phases, and multi-year deals as ongoing revenue [10]. A concrete example from the report: one startup reported $50M ARR, but only $42M was active; another counted year-long free pilot phases as ARR — and the board knew it [10].

“The biggest funds support this and mislead journalists.” [10]

— Scott Stevenson, Spellbook CEO

“Some investors look the other way when their companies inflate numbers.” [10]

— Jack Newton, Clio CEO

Root cause: Structural incentive conflict. VCs benefit from inflated paper valuations for easier fundraising and talent acquisition. Anyone who breaks the convention looks worse than competitors who play along — a classic race to the bottom [10].

What we learn: When evaluating AI startups, always ask for the distinction between CARR (Contracted ARR) and active, paying ARR. Anyone who won’t answer that question has something to hide.


Number of the Week

276,000

That’s how many KPMG employees now have access to Claude [6]. This is not a pilot program in one department — it’s enterprise-wide deployment at one of the Big Four global auditing firms. For comparison: OpenAI’s entire workforce is around 3,000 people. KPMG now has nearly 100× as many Claude users as OpenAI has employees. Enterprise AI has left the “strategic experiment” phase and become operational infrastructure.

Bonus number: $6 billion — Hark’s valuation with just 70 employees [7]. That’s ~$86 million per person. The AI hype multiplier at early-stage investments remains very much real.


Reading List

For the weekend:

  1. OpenAI: Model disproves discrete geometry conjecture — OpenAI’s original blog post with technical details on the method: how the model connected combinatorics and algebraic number theory, exactly what the new construction achieves, and why the problem remained unsolved for 80 years. For anyone who wants to understand what actually happened here beyond the headline claim. (12 min)

  2. Anthropic Project Glasswing: Initial Update — Anthropic’s detailed report on methodology, partner findings, and the new bottleneck of “verification over detection.” With concrete numbers from real deployments at Cloudflare and Mozilla. Required reading for anyone working in software security or dealing with open-source projects. (15 min)

  3. How VCs and founders use inflated ARR to kingmake AI startups — One of the most candid articles on structural misaligned incentives in the AI startup ecosystem, with named quotes from CEOs indicting the system while operating within it. Helps you read funding announcements more critically. (10 min)


Next Week

What’s coming:

  • AI math follow-up: The Erdős community will submit OpenAI’s result for peer review publication — first reactions from the broader mathematics community expected
  • Google Antigravity Early Access: Following the I/O announcement, Google is expected to open first developer access to the new multi-agent platform — direct competition with Anthropic and OpenAI SDKs begins
  • Hark stealth reveal: Brett Adcock’s “Universal AI Interface” has raised $700M without disclosing a single product detail — first information from the company is expected

🤖 Behind This Newsletter

Generated in: ~35 minutes
Sources scanned: 18 articles from 7 feeds
Stories found: 18 → 10 selected (7 main stories, 3 quick hits)
Validation: 4 agents (Fact-Check, Devil’s Advocate, Quality Editor, Legal Compliance)
Model: Claude Sonnet 4.6 + Haiku (Validation)
Images: Pollinations.ai (5 planned: Hero + 4 story images)

Full metrics
PhaseMetricValue
Source collectionRSS feeds7
Source collectionWebSearch queries6
SelectionStories presented18
SelectionStories selected10
DraftWords~1,400
DraftSources cited10
ValidationFact-Check issues4 (methodology issue, no hard stop)
ValidationBalance issues7 (4 MAJOR, 3 MINOR)
ValidationQuality issues5 (2 MAJOR, 3 MINOR)
ValidationLegal issues0

This newsletter was researched and written AI-assisted. Images generated with Pollinations.ai.