May 25 - May 31 · 2026 W22Weekly Brief15 min read

AI & Tech Brief ⚡

The week's signal was the AI infrastructure stack repricing at sovereign scale — a frontier model GA bundled with a $65B private round, a standalone coding agent posting nine-figure ARR, and the serving layer beneath all of it crossing decacorn thresholds in the same seven days.

📌 Navigate

01📊 Exec Summary 02Anthropic ships Claude Opus 4.8 and closes a $65B Series H at $965B 03OpenAI ships remote MCP servers with a Secure MCP Tunnel for the Responses API 04Cognition raises $1B at $26B as Devin hits $492M ARR 05Fireworks AI and Baseten cross the decacorn threshold 06vLLM v0.22.0 ships a 28.9% latency improvement and DeepSeek V4 maturity 07📊 The pattern 08👀 Watchlist 09📎 Sources

📊 Exec Summary

Five things moved in AI/tech this week:

Anthropic shipped Claude Opus 4.8 and closed a $65B Series H at $965B valuation, paired with a model GA at unchanged pricing and a Mythos broad-release timeline that sets a forward capability ceiling builders should plan against now.

OpenAI shipped remote MCP servers with a Secure MCP Tunnel for the Responses API
outbound-only HTTPS plus Workload Identity Federation removes the principal security objection blocking enterprise agents from internal EHR, LIMS, and regulated document stores.

Cognition raised $1B at $26B as Devin hit $492M ARR growing 50% MoM
the first public ARR benchmark for a standalone autonomous coding agent, with 89% of Cognition's own production code now shipped by Devin.

Fireworks AI and Baseten crossed the decacorn line in the same week
inference infrastructure repriced from startup bet to strategic platform at $15B and $11B, with OpenRouter's 5x token-volume growth confirming the production shift.

vLLM v0.22.0 delivered a 28.9% end-to-end latency improvement
Cutlass FP8 batch-invariant inference, an experimental Rust frontend, and DeepSeek V4 maturity keep the OSS serving substrate ahead on cost for self-hosted, regulated deployments.

The pattern: frontier capability as a product tier, agents as nine-figure revenue lines, and the inference layer beneath them priced as durable infrastructure.

1. Anthropic ships Claude Opus 4.8 and closes a $65B Series H at $965B

TL;DR: Anthropic shipped Claude Opus 4.8 (model ID claude-opus-4-8) at unchanged pricing and simultaneously closed a $65B Series H at $965B post-money while signaling that capability-gated Mythos-class models reach broad release "in the coming weeks."

What happened

The model is generally available today at unchanged pricing: $5 / $25 per million tokens input/output, with fast mode at $10 / $50 and 2.5x speed at 3x cheaper than prior Opus fast mode.
Dynamic Workflows in Claude Code runs hundreds of parallel subagents per session — in research preview for Enterprise, Team, and Max plans — and effort controls (standard / extra / max) landed in claude.ai and Cowork.
The Series H — $65B at $965B post-money, led by Altimeter, Dragoneer, Greenoaks, and Sequoia — includes $15B from hyperscalers ($5B from Amazon) plus Micron, Samsung, and SK hynix as memory and storage partners.
Run-rate revenue crossed $47B ARR this month, up from $19B at Series G in February; total compute committed now exceeds 10 GW (5 GW Amazon + 5 GW next-gen Google/Broadcom TPU + SpaceX Colossus GPU).
The accompanying SDK wave (anthropic-sdk-typescript v0.98.1–v0.100.1, anthropic-sdk-python v0.105.0–0.105.2) adds native claude-opus-4-8 support and mid-conversation system blocks — operators update permissions, token budgets, and environment context without breaking prompt cache.

📊 Benchmarks (from Anthropic newsroom)

Benchmark	Opus 4.8	Comparison
Online-Mind2Web (browser agent)	84%	Meaningful jump over Opus 4.7 and GPT-5.5 (external tester)
Super-Agent end-to-end completion	Only model to complete every case	Beats prior Opus and GPT-5.5 at parity on cost
Legal Agent Benchmark (all-pass)	First to break 10%	Highest score recorded on the benchmark
Deception / flawed-code pass-through	~4x less likely to let flaws pass	vs Opus 4.7 (internal alignment eval)
Genie (Databricks) token cost	61% cheaper	Multimodal reasoning over PDFs and diagrams vs Opus 4.7
Series H raise / valuation	$65B / $965B post-money	Up from $19B ARR at Series G (February)

🔗 Primary source → Introducing Claude Opus 4.8

Anthropic raises $65B in Series H funding at $965B post-money valuation

🔍 The non-obvious point

The headline number is the valuation; the number that resets builder expectations is $47B run-rate revenue.

Anthropic frames Opus 4.8 as "a modest but tangible improvement" over 4.7 — the real release is the revenue trajectory. Willison flagged the $47B run-rate as the most interesting line in the Series H post, noting Anthropic was at $9B in December 2025; that pace puts the lab on track to overtake OpenAI in revenue, per the Latent Space read.
The model's alignment story is substantive, not marketing. Zvi's read of the 244-page system card is that deception-resistance and corrigibility improvements are real — misaligned-behavior rates are "substantially lower" than 4.7 and comparable to Mythos Preview, with new highs on prosocial traits including supporting user autonomy.
The Mythos "coming weeks" line is the forward signal. Anthropic is positioning a capability-gated tier behind Project Glasswing's cyber safeguards — builders should treat the current Opus 4.8 ceiling as temporary and plan capacity, eval, and governance for a higher tier landing within the quarter.

👀 What to watch

Mythos-class broad release: Anthropic committed to "coming weeks" on 2026-05-29 — watch for a GA announcement and pricing before end of Q2.

2. OpenAI ships remote MCP servers with a Secure MCP Tunnel for the Responses API

TL;DR: OpenAI added native remote MCP server support to the Responses API with an outbound-only HTTPS tunnel and IAM-based auth, removing the inbound-port and static-key exposure that has blocked enterprise agents from internal data stores.

What happened

Remote MCP servers are now natively supported; traffic routes outbound-only via HTTPS — no inbound firewall rules required to connect ChatGPT, Codex, or the API to private or on-premises MCP servers.
Auth moves to Workload Identity Federation (AWS, Azure, GCP IAM), replacing permanent API keys with short-lived tokens.
o3 and o4-mini can now call tools within chain-of-thought in the Responses API, preserving reasoning tokens across requests and tool calls — OpenAI's stated effect is reduced cost and latency.
The same update bundles background mode for async long-running tasks, encrypted reasoning items, reasoning summaries, and 10+ commercial MCP integrations: Stripe, PayPal, Shopify, Square, Plaid, HubSpot, Intercom, Twilio, Cloudflare, and Zapier.
The Responses API has reached hundreds of thousands of developers processing trillions of tokens since its March 2025 launch.

📊 Benchmarks (from OpenAI newsroom)

Capability	What shipped	Why it matters
MCP connectivity	Outbound-only HTTPS, no inbound ports	Agents reach internal backends without a firewall change
Auth model	Workload Identity Federation (AWS/Azure/GCP)	Short-lived tokens replace static API keys
Reasoning + tools	Tool calls within chain-of-thought (o3/o4-mini)	Reasoning tokens preserved across requests and tool calls
Async execution	Background mode	Long-running tasks handled asynchronously
Ecosystem	10+ commercial MCP servers	Stripe, Plaid, Shopify, HubSpot, Twilio live at launch

🔗 Primary source → New tools and features in the Responses API

🔍 The non-obvious point

This is a security-posture change disguised as a developer-experience release.

The blocker it removes is concrete: connecting an agent to an internal EHR, LIMS, CRM, or regulated document repository previously meant either exposing an inbound port or minting a static key with broad scope. Outbound-only HTTPS plus federated IAM lets a regulated team wire agents to internal backends without a firewall change or a long-lived credential.
The architecture has a deliberate boundary: the tunnel is outbound to OpenAI cloud — there is no on-premises deployment option disclosed. Teams with data-residency constraints get the auth model but still send traffic out of their perimeter, which keeps the deployment decision a governance question, not a pure engineering one.
The bundle matters as much as the tunnel: chain-of-thought tool calling that preserves reasoning tokens plus background mode makes this the most operationally complete agent-infrastructure release of the week outside Anthropic — the pieces a production agent needs (private connectivity, durable auth, async execution, paid-service integrations) shipped together.

👀 What to watch

Pricing and rate limits for remote MCP calls and background-mode compute were not disclosed at launch — watch for the cost surface before committing production workloads.

3. Cognition raises $1B at $26B as Devin hits $492M ARR

TL;DR: Cognition closed a $1B Series D at $26B post-money as Devin reached $492M ARR growing 50% month-over-month — the first public ARR benchmark for a standalone autonomous coding agent in enterprise.

What happened

The round — $1B+ at $26B post-money ($25B pre-money), led by Lux Capital, General Catalyst, and 8VC — more than doubles Cognition's $10.2B valuation from a $400M round just eight months ago (September 2025), a 2.5x mark-up.
Devin reached $492M ARR with 50% MoM growth sustained over the past six months; 89% of Cognition's own production code is now shipped by Devin (company-stated).
Enterprise customers include Mercedes-Benz, NASA, Goldman Sachs, and Santander.
New investors Ribbit Capital, Atreides, and Layer Global joined alongside Founders Fund and Elad Gil; Cognition is now framed as "the largest remaining independent agent lab in AI."
Cognition is projecting >$1B ARR by year-end, per the Latent Space account.

📊 Benchmarks (from TechCrunch)

Metric	Value	Context
Devin ARR	$492M	Growing 50% MoM for six months
Series D	$1B+ at $26B post-money	$25B pre-money
Prior valuation	$10.2B (September 2025)	$400M round, 8 months prior — 2.5x markup
Own production code by Devin	89%	Cognition internal, company-stated
Projected ARR	>$1B by year-end	Per Latent Space

🔗 Primary source → AI coding startup Cognition raises $1B at $25B pre-money valuation

🔍 The non-obvious point

The $492M ARR figure is the first real comp table for pricing coding-agent ROI — and it survived direct model-lab competition.

This is the data point builders lacked: a standalone autonomous coding agent posting a disclosed nine-figure ARR establishes the comparable for anyone evaluating agent-as-employee spend against seat-based copilots. The 89% internal-code figure is the dogfooding proof the valuation is priced on.
The valuation thesis is that independent agent labs survive model-lab competition. Cognition's 2.5x markup in eight months lands while Anthropic's Claude Code and OpenAI's Codex compete directly — the Latent Space read is that there is durable room for an independent lab, not that the model labs will absorb the category.
The caveat worth holding: no gross margin, unit economics, or ARR-by-segment detail was disclosed. The $492M is a run-rate top line; the agent-as-employee economics — inference cost per shipped PR, seat-vs-usage basis — remain unpriced for outside evaluators.

👀 What to watch

Cognition's year-end >$1B ARR projection is the next checkpoint — watch for a confirming disclosure in Q4 2026 that the 50% MoM curve held through model-lab competition.

4. Fireworks AI and Baseten cross the decacorn threshold

TL;DR: Fireworks AI ($15B) and Baseten ($11B) crossed the decacorn line in the same week as OpenRouter closed a $113M Series B on 5x token-volume growth — Latent Space's read is that inference infrastructure has repriced from startup service to strategic platform.

What happened

Fireworks AI is valued at $15B — a 3.75x mark in 7 months; Baseten at $11B — 2.2x in 3 months. Both rounds were described as in progress at the time of the Latent Space account.
OpenRouter closed a $113M Series B led by Capital GVC; weekly token volume grew 5x in 6 months, from 5T to 25T weekly tokens (company-confirmed).
Latent Space covers startups only at the decacorn crossing — Fireworks and Baseten both crossed $10B+ in the same week, which is the reason for joint coverage.
The framing is explicitly category-level: inference infra priced as durable middleware, structurally similar to the database-infrastructure repricing of 2012–2015 — not as an experimental startup service.

📊 Benchmarks (from Latent Space)

Company	Valuation / round	Trajectory
Fireworks AI	$15B	3.75x in 7 months
Baseten	$11B	2.2x in 3 months
OpenRouter	$113M Series B (Capital GVC)	Weekly volume 5T → 25T tokens
OpenRouter volume growth	5x in 6 months	Production shift, company-confirmed

🔗 Primary source → New AI Infra decacorns: Fireworks, Baseten (with OpenRouter on the way)

🔍 The non-obvious point

The make-vs-buy calculus for model serving just changed — the vendors are now priced as infrastructure, not bets.

For a regulated builder weighing self-hosted serving vs managed inference, the signal is that the managed-inference category has crossed into strategic-infrastructure pricing — the durability that justifies building a workflow on top of a third-party serving layer rather than treating it as a swappable commodity.
OpenRouter's 5T → 25T weekly tokens is the demand-side proof: the 5x in 6 months is the production-traffic shift, not experimentation, underwriting the routing-layer thesis.
Confidence is medium by design: the Fireworks and Baseten rounds were in progress, not closed, at the time of reporting, and no ARR, customer, or token-volume figures were disclosed for either — the repricing signal is real, the closed terms are not yet confirmable.

👀 What to watch

Confirmation of the Fireworks and Baseten round closings and final terms — watch for an on-the-record announcement that converts the "in talks" framing into a priced, closed round.

5. vLLM v0.22.0 ships a 28.9% latency improvement and DeepSeek V4 maturity

TL;DR: vLLM v0.22.0 lands a 28.9% end-to-end latency improvement via Cutlass FP8 batch-invariant inference, an experimental Rust frontend, and a production-grade DeepSeek V4 package — the most substantive advance in the open inference stack this week.

What happened

Cutlass FP8 batch-invariant inference (PR #40408) delivers a 28.9% end-to-end latency improvement with SM80 compile-mode support.
An experimental Rust frontend (#40848 / #43283) targets data-parallel serving at scale via a DP Supervisor — not yet default.
Model Runner V2 (MRv2) is now default for Qwen3 dense models and falls back to MRv1 automatically when a KV connector is present (#42955).
DeepSeek V4 received a dedicated package (vllm/models/deepseek_v4/, #43004) with NVFP4 fused MoE and MTP speculative decoding — signaling production-grade support, not experimental.
A multi-tier KV cache offloading framework (#40020) extends beyond CPU to a Python filesystem secondary tier and Mooncake disk offloading (#42689); OpenAI-compatible structured output now lands in both MRv1 and MRv2 paths.

📊 Benchmarks (from vLLM release notes)

Metric	Value	Context
End-to-end latency	28.9% improvement	Cutlass FP8 batch-invariant inference (PR #40408)
Commits	459	v0.21.0 → v0.22.0
Contributors	230 (63 new)	Community velocity
DeepSeek V4	Dedicated package	NVFP4 fused MoE, MTP speculative decoding

🔗 Primary source → Release v0.22.0 · vllm-project/vllm

🔍 The non-obvious point

The 28.9% is a direct serving-cost lever for teams that cannot send traffic to a managed-inference vendor.

For any team running self-hosted inference on regulated infrastructure, the 28.9% latency gain is measurable cost — and the multi-tier KV offloading to CPU and disk extends production viability to larger models without H100-cluster access, the exact constraint a regulated, on-prem deployment faces.
The DeepSeek V4 hardening pass — dedicated package, NVFP4 fused MoE, full CUDA graph, MTP speculative decoding — moves an open frontier model from experimental to production on the OSS substrate, a meaningful build-vs-buy data point against closed-API pricing.
The 459 commits from 230 contributors is the durability signal: vLLM remains the de facto OSS inference substrate, which is what makes the latency and offloading work bankable rather than a one-release spike. The Rust frontend is the forward bet — experimental now, but aimed squarely at data-parallel serving economics.

👀 What to watch

The Rust frontend's path to default — no stability or production-readiness guidance shipped with v0.22.0, so watch subsequent releases for the signal it is ready for production data-parallel serving.

📊 The pattern

This week the AI stack repriced top to bottom in a single window: a frontier model GA wrapped in a $65B raise at the top, a standalone agent posting $492M ARR in the middle, and the serving layer beneath crossing decacorn thresholds and shaving 28.9% off latency at the bottom. The through-line is consolidation of the infrastructure tier at sovereign scale — model labs, agent labs, and inference vendors all priced as durable platforms rather than bets. The capability ceiling is moving too: Anthropic's Mythos timeline and vLLM's DeepSeek V4 maturity both point past this week's releases. Frontier capability as a product tier, agents as nine-figure revenue lines, inference as strategic infrastructure.

👀 Watchlist

Mythos-class broad release
Anthropic committed to "coming weeks" on 2026-05-29; the GA announcement and pricing are the next forward capability reset builders should plan against.

Remote MCP and background-mode pricing
OpenAI shipped the connectivity and auth surface without a cost surface; the disclosed pricing determines whether production agent workloads move onto it.

Cognition's year-end ARR
the >$1B projection is the checkpoint that confirms the 50% MoM curve survived direct Claude Code and Codex competition.

Fireworks and Baseten round closings
confirmation converts the "in talks" repricing signal into priced, closed strategic-infrastructure rounds.

vLLM Rust frontend to default
production-readiness guidance in a subsequent release signals data-parallel serving economics are ready for regulated deployment.

📎 Sources

Sources of truth

Click to verify or go deeper.

Source	Title	URL	Date
Anthropic	Introducing Claude Opus 4.8	https://www.anthropic.com/news/claude-opus-4-8	2026-05-29
Anthropic	Anthropic raises $65B in Series H funding at $965B post-money valuation	https://www.anthropic.com/news/series-h	2026-05-29
OpenAI	New tools and features in the Responses API	https://openai.com/index/new-tools-and-features-in-the-responses-api/	2026-05-27
TechCrunch	AI coding startup Cognition raises $1B at $25B pre-money valuation	https://techcrunch.com/2026/05/27/ai-coding-startup-cognition-raises-1b-at-25b-pre-money-valuation/	2026-05-27
Latent Space	New AI Infra decacorns: Fireworks, Baseten (with OpenRouter on the way)	https://www.latent.space/p/ainews-new-ai-infra-decacorns-fireworks	2026-05-27
vLLM	Release v0.22.0 · vllm-project/vllm	https://github.com/vllm-project/vllm/releases/tag/v0.22.0	2026-05-28

Commentary we read

Author / outlet	Title	URL	Date
Simon Willison	Anthropic raises $65B at $965B (run-rate revenue read)	https://simonwillison.net/2026/May/29/anthropic	2026-05-29
Zvi Mowshowitz / Don't Worry About the Vase	Claude Opus 4.8 Is Honestly Better	https://thezvi.substack.com/p/claude-opus-48-is-honestly-better	2026-05-30
Swyx / Latent Space	AINews: Anthropic raises $965B Series H	https://www.latent.space/p/ainews-anthropic-raises-965b-series	2026-05-29
Swyx / Latent Space	AINews: Cognition raises $1B at $26B	https://www.latent.space/p/ainews-cognition-raises-1b-in-26b	2026-05-27

May 25 - May 31 · 2026 W22Weekly Brief15 min read

AI & Tech Brief ⚡

📌 Navigate

📊 Exec Summary

Five things moved in AI/tech this week:

Anthropic shipped Claude Opus 4.8 and closed a $65B Series H at $965B valuation, paired with a model GA at unchanged pricing and a Mythos broad-release timeline that sets a forward capability ceiling builders should plan against now.

The pattern: frontier capability as a product tier, agents as nine-figure revenue lines, and the inference layer beneath them priced as durable infrastructure.

1. Anthropic ships Claude Opus 4.8 and closes a $65B Series H at $965B

What happened

The model is generally available today at unchanged pricing: $5 / $25 per million tokens input/output, with fast mode at $10 / $50 and 2.5x speed at 3x cheaper than prior Opus fast mode.
Dynamic Workflows in Claude Code runs hundreds of parallel subagents per session — in research preview for Enterprise, Team, and Max plans — and effort controls (standard / extra / max) landed in claude.ai and Cowork.
The Series H — $65B at $965B post-money, led by Altimeter, Dragoneer, Greenoaks, and Sequoia — includes $15B from hyperscalers ($5B from Amazon) plus Micron, Samsung, and SK hynix as memory and storage partners.
Run-rate revenue crossed $47B ARR this month, up from $19B at Series G in February; total compute committed now exceeds 10 GW (5 GW Amazon + 5 GW next-gen Google/Broadcom TPU + SpaceX Colossus GPU).
The accompanying SDK wave (anthropic-sdk-typescript v0.98.1–v0.100.1, anthropic-sdk-python v0.105.0–0.105.2) adds native claude-opus-4-8 support and mid-conversation system blocks — operators update permissions, token budgets, and environment context without breaking prompt cache.

📊 Benchmarks (from Anthropic newsroom)

Benchmark	Opus 4.8	Comparison
Online-Mind2Web (browser agent)	84%	Meaningful jump over Opus 4.7 and GPT-5.5 (external tester)
Super-Agent end-to-end completion	Only model to complete every case	Beats prior Opus and GPT-5.5 at parity on cost
Legal Agent Benchmark (all-pass)	First to break 10%	Highest score recorded on the benchmark
Deception / flawed-code pass-through	~4x less likely to let flaws pass	vs Opus 4.7 (internal alignment eval)
Genie (Databricks) token cost	61% cheaper	Multimodal reasoning over PDFs and diagrams vs Opus 4.7
Series H raise / valuation	$65B / $965B post-money	Up from $19B ARR at Series G (February)

🔗 Primary source → Introducing Claude Opus 4.8

Anthropic raises $65B in Series H funding at $965B post-money valuation

🔍 The non-obvious point

The headline number is the valuation; the number that resets builder expectations is $47B run-rate revenue.

Anthropic frames Opus 4.8 as "a modest but tangible improvement" over 4.7 — the real release is the revenue trajectory. Willison flagged the $47B run-rate as the most interesting line in the Series H post, noting Anthropic was at $9B in December 2025; that pace puts the lab on track to overtake OpenAI in revenue, per the Latent Space read.
The model's alignment story is substantive, not marketing. Zvi's read of the 244-page system card is that deception-resistance and corrigibility improvements are real — misaligned-behavior rates are "substantially lower" than 4.7 and comparable to Mythos Preview, with new highs on prosocial traits including supporting user autonomy.
The Mythos "coming weeks" line is the forward signal. Anthropic is positioning a capability-gated tier behind Project Glasswing's cyber safeguards — builders should treat the current Opus 4.8 ceiling as temporary and plan capacity, eval, and governance for a higher tier landing within the quarter.

👀 What to watch

Mythos-class broad release: Anthropic committed to "coming weeks" on 2026-05-29 — watch for a GA announcement and pricing before end of Q2.

2. OpenAI ships remote MCP servers with a Secure MCP Tunnel for the Responses API

What happened

Remote MCP servers are now natively supported; traffic routes outbound-only via HTTPS — no inbound firewall rules required to connect ChatGPT, Codex, or the API to private or on-premises MCP servers.
Auth moves to Workload Identity Federation (AWS, Azure, GCP IAM), replacing permanent API keys with short-lived tokens.
o3 and o4-mini can now call tools within chain-of-thought in the Responses API, preserving reasoning tokens across requests and tool calls — OpenAI's stated effect is reduced cost and latency.
The same update bundles background mode for async long-running tasks, encrypted reasoning items, reasoning summaries, and 10+ commercial MCP integrations: Stripe, PayPal, Shopify, Square, Plaid, HubSpot, Intercom, Twilio, Cloudflare, and Zapier.
The Responses API has reached hundreds of thousands of developers processing trillions of tokens since its March 2025 launch.

📊 Benchmarks (from OpenAI newsroom)

Capability	What shipped	Why it matters
MCP connectivity	Outbound-only HTTPS, no inbound ports	Agents reach internal backends without a firewall change
Auth model	Workload Identity Federation (AWS/Azure/GCP)	Short-lived tokens replace static API keys
Reasoning + tools	Tool calls within chain-of-thought (o3/o4-mini)	Reasoning tokens preserved across requests and tool calls
Async execution	Background mode	Long-running tasks handled asynchronously
Ecosystem	10+ commercial MCP servers	Stripe, Plaid, Shopify, HubSpot, Twilio live at launch

🔗 Primary source → New tools and features in the Responses API

🔍 The non-obvious point

This is a security-posture change disguised as a developer-experience release.

The blocker it removes is concrete: connecting an agent to an internal EHR, LIMS, CRM, or regulated document repository previously meant either exposing an inbound port or minting a static key with broad scope. Outbound-only HTTPS plus federated IAM lets a regulated team wire agents to internal backends without a firewall change or a long-lived credential.
The architecture has a deliberate boundary: the tunnel is outbound to OpenAI cloud — there is no on-premises deployment option disclosed. Teams with data-residency constraints get the auth model but still send traffic out of their perimeter, which keeps the deployment decision a governance question, not a pure engineering one.
The bundle matters as much as the tunnel: chain-of-thought tool calling that preserves reasoning tokens plus background mode makes this the most operationally complete agent-infrastructure release of the week outside Anthropic — the pieces a production agent needs (private connectivity, durable auth, async execution, paid-service integrations) shipped together.

👀 What to watch

Pricing and rate limits for remote MCP calls and background-mode compute were not disclosed at launch — watch for the cost surface before committing production workloads.

3. Cognition raises $1B at $26B as Devin hits $492M ARR

What happened

The round — $1B+ at $26B post-money ($25B pre-money), led by Lux Capital, General Catalyst, and 8VC — more than doubles Cognition's $10.2B valuation from a $400M round just eight months ago (September 2025), a 2.5x mark-up.
Devin reached $492M ARR with 50% MoM growth sustained over the past six months; 89% of Cognition's own production code is now shipped by Devin (company-stated).
Enterprise customers include Mercedes-Benz, NASA, Goldman Sachs, and Santander.
New investors Ribbit Capital, Atreides, and Layer Global joined alongside Founders Fund and Elad Gil; Cognition is now framed as "the largest remaining independent agent lab in AI."
Cognition is projecting >$1B ARR by year-end, per the Latent Space account.

📊 Benchmarks (from TechCrunch)

Metric	Value	Context
Devin ARR	$492M	Growing 50% MoM for six months
Series D	$1B+ at $26B post-money	$25B pre-money
Prior valuation	$10.2B (September 2025)	$400M round, 8 months prior — 2.5x markup
Own production code by Devin	89%	Cognition internal, company-stated
Projected ARR	>$1B by year-end	Per Latent Space

🔗 Primary source → AI coding startup Cognition raises $1B at $25B pre-money valuation

🔍 The non-obvious point

The $492M ARR figure is the first real comp table for pricing coding-agent ROI — and it survived direct model-lab competition.

This is the data point builders lacked: a standalone autonomous coding agent posting a disclosed nine-figure ARR establishes the comparable for anyone evaluating agent-as-employee spend against seat-based copilots. The 89% internal-code figure is the dogfooding proof the valuation is priced on.
The valuation thesis is that independent agent labs survive model-lab competition. Cognition's 2.5x markup in eight months lands while Anthropic's Claude Code and OpenAI's Codex compete directly — the Latent Space read is that there is durable room for an independent lab, not that the model labs will absorb the category.
The caveat worth holding: no gross margin, unit economics, or ARR-by-segment detail was disclosed. The $492M is a run-rate top line; the agent-as-employee economics — inference cost per shipped PR, seat-vs-usage basis — remain unpriced for outside evaluators.

👀 What to watch

Cognition's year-end >$1B ARR projection is the next checkpoint — watch for a confirming disclosure in Q4 2026 that the 50% MoM curve held through model-lab competition.

4. Fireworks AI and Baseten cross the decacorn threshold

What happened

Fireworks AI is valued at $15B — a 3.75x mark in 7 months; Baseten at $11B — 2.2x in 3 months. Both rounds were described as in progress at the time of the Latent Space account.
OpenRouter closed a $113M Series B led by Capital GVC; weekly token volume grew 5x in 6 months, from 5T to 25T weekly tokens (company-confirmed).
Latent Space covers startups only at the decacorn crossing — Fireworks and Baseten both crossed $10B+ in the same week, which is the reason for joint coverage.
The framing is explicitly category-level: inference infra priced as durable middleware, structurally similar to the database-infrastructure repricing of 2012–2015 — not as an experimental startup service.

📊 Benchmarks (from Latent Space)

Company	Valuation / round	Trajectory
Fireworks AI	$15B	3.75x in 7 months
Baseten	$11B	2.2x in 3 months
OpenRouter	$113M Series B (Capital GVC)	Weekly volume 5T → 25T tokens
OpenRouter volume growth	5x in 6 months	Production shift, company-confirmed

🔗 Primary source → New AI Infra decacorns: Fireworks, Baseten (with OpenRouter on the way)

🔍 The non-obvious point

The make-vs-buy calculus for model serving just changed — the vendors are now priced as infrastructure, not bets.

For a regulated builder weighing self-hosted serving vs managed inference, the signal is that the managed-inference category has crossed into strategic-infrastructure pricing — the durability that justifies building a workflow on top of a third-party serving layer rather than treating it as a swappable commodity.
OpenRouter's 5T → 25T weekly tokens is the demand-side proof: the 5x in 6 months is the production-traffic shift, not experimentation, underwriting the routing-layer thesis.
Confidence is medium by design: the Fireworks and Baseten rounds were in progress, not closed, at the time of reporting, and no ARR, customer, or token-volume figures were disclosed for either — the repricing signal is real, the closed terms are not yet confirmable.

👀 What to watch

Confirmation of the Fireworks and Baseten round closings and final terms — watch for an on-the-record announcement that converts the "in talks" framing into a priced, closed round.

5. vLLM v0.22.0 ships a 28.9% latency improvement and DeepSeek V4 maturity

What happened

Cutlass FP8 batch-invariant inference (PR #40408) delivers a 28.9% end-to-end latency improvement with SM80 compile-mode support.
An experimental Rust frontend (#40848 / #43283) targets data-parallel serving at scale via a DP Supervisor — not yet default.
Model Runner V2 (MRv2) is now default for Qwen3 dense models and falls back to MRv1 automatically when a KV connector is present (#42955).
DeepSeek V4 received a dedicated package (vllm/models/deepseek_v4/, #43004) with NVFP4 fused MoE and MTP speculative decoding — signaling production-grade support, not experimental.
A multi-tier KV cache offloading framework (#40020) extends beyond CPU to a Python filesystem secondary tier and Mooncake disk offloading (#42689); OpenAI-compatible structured output now lands in both MRv1 and MRv2 paths.

📊 Benchmarks (from vLLM release notes)

Metric	Value	Context
End-to-end latency	28.9% improvement	Cutlass FP8 batch-invariant inference (PR #40408)
Commits	459	v0.21.0 → v0.22.0
Contributors	230 (63 new)	Community velocity
DeepSeek V4	Dedicated package	NVFP4 fused MoE, MTP speculative decoding

🔗 Primary source → Release v0.22.0 · vllm-project/vllm

🔍 The non-obvious point

The 28.9% is a direct serving-cost lever for teams that cannot send traffic to a managed-inference vendor.

For any team running self-hosted inference on regulated infrastructure, the 28.9% latency gain is measurable cost — and the multi-tier KV offloading to CPU and disk extends production viability to larger models without H100-cluster access, the exact constraint a regulated, on-prem deployment faces.
The DeepSeek V4 hardening pass — dedicated package, NVFP4 fused MoE, full CUDA graph, MTP speculative decoding — moves an open frontier model from experimental to production on the OSS substrate, a meaningful build-vs-buy data point against closed-API pricing.
The 459 commits from 230 contributors is the durability signal: vLLM remains the de facto OSS inference substrate, which is what makes the latency and offloading work bankable rather than a one-release spike. The Rust frontend is the forward bet — experimental now, but aimed squarely at data-parallel serving economics.

👀 What to watch

The Rust frontend's path to default — no stability or production-readiness guidance shipped with v0.22.0, so watch subsequent releases for the signal it is ready for production data-parallel serving.

📊 The pattern

👀 Watchlist

Mythos-class broad release
Anthropic committed to "coming weeks" on 2026-05-29; the GA announcement and pricing are the next forward capability reset builders should plan against.

Remote MCP and background-mode pricing
OpenAI shipped the connectivity and auth surface without a cost surface; the disclosed pricing determines whether production agent workloads move onto it.

Cognition's year-end ARR
the >$1B projection is the checkpoint that confirms the 50% MoM curve survived direct Claude Code and Codex competition.

Fireworks and Baseten round closings
confirmation converts the "in talks" repricing signal into priced, closed strategic-infrastructure rounds.

vLLM Rust frontend to default
production-readiness guidance in a subsequent release signals data-parallel serving economics are ready for regulated deployment.

📎 Sources

Sources of truth

Click to verify or go deeper.

Source	Title	URL	Date
Anthropic	Introducing Claude Opus 4.8	https://www.anthropic.com/news/claude-opus-4-8	2026-05-29
Anthropic	Anthropic raises $65B in Series H funding at $965B post-money valuation	https://www.anthropic.com/news/series-h	2026-05-29
OpenAI	New tools and features in the Responses API	https://openai.com/index/new-tools-and-features-in-the-responses-api/	2026-05-27
TechCrunch	AI coding startup Cognition raises $1B at $25B pre-money valuation	https://techcrunch.com/2026/05/27/ai-coding-startup-cognition-raises-1b-at-25b-pre-money-valuation/	2026-05-27
Latent Space	New AI Infra decacorns: Fireworks, Baseten (with OpenRouter on the way)	https://www.latent.space/p/ainews-new-ai-infra-decacorns-fireworks	2026-05-27
vLLM	Release v0.22.0 · vllm-project/vllm	https://github.com/vllm-project/vllm/releases/tag/v0.22.0	2026-05-28

Commentary we read

Author / outlet	Title	URL	Date
Simon Willison	Anthropic raises $65B at $965B (run-rate revenue read)	https://simonwillison.net/2026/May/29/anthropic	2026-05-29
Zvi Mowshowitz / Don't Worry About the Vase	Claude Opus 4.8 Is Honestly Better	https://thezvi.substack.com/p/claude-opus-48-is-honestly-better	2026-05-30
Swyx / Latent Space	AINews: Anthropic raises $965B Series H	https://www.latent.space/p/ainews-anthropic-raises-965b-series	2026-05-29
Swyx / Latent Space	AINews: Cognition raises $1B at $26B	https://www.latent.space/p/ainews-cognition-raises-1b-in-26b	2026-05-27

📌 Navigate

📊 Exec Summary

1. Anthropic ships Claude Opus 4.8 and closes a $65B Series H at $965B

2. OpenAI ships remote MCP servers with a Secure MCP Tunnel for the Responses API

3. Cognition raises $1B at $26B as Devin hits $492M ARR

4. Fireworks AI and Baseten cross the decacorn threshold

5. vLLM v0.22.0 ships a 28.9% latency improvement and DeepSeek V4 maturity

📊 The pattern

👀 Watchlist

📎 Sources

Sources of truth

Commentary we read

More AI & Tech

📌 Navigate

📊 Exec Summary

1. Anthropic ships Claude Opus 4.8 and closes a $65B Series H at $965B

2. OpenAI ships remote MCP servers with a Secure MCP Tunnel for the Responses API

3. Cognition raises $1B at $26B as Devin hits $492M ARR

4. Fireworks AI and Baseten cross the decacorn threshold

5. vLLM v0.22.0 ships a 28.9% latency improvement and DeepSeek V4 maturity

📊 The pattern

👀 Watchlist

📎 Sources

Sources of truth

Commentary we read

More AI & Tech