Feb 2 - Feb 8 · 2026 W06Weekly Brief9 min read

AI & Tech Brief ⚡

Agentic coding became the primary competitive arena in AI this week: Anthropic and OpenAI shipped flagship models within minutes of each other on February 5, xAI's Grok Imagine 1.0 claimed the 1 video leaderboard slot on launch day, and Google opened real-time interactive world generation to its top subscribers — all in a single week that made clear the 2026 AI race is about autonomous execution, not chat.

📌 Navigate

01📊 Exec Summary 021️⃣ Claude Opus 4.6: 1M context, adaptive controls, agent teams 032️⃣ GPT-5.3-Codex: New SWE-Bench Pro SOTA, self-developed, 25% faster 043️⃣ Grok Imagine 1.0: 1 Video Leaderboard at Launch, $4.20/min Native Audio 054️⃣ Google Project Genie: Real-Time Interactive World Generation, AI Ultra Gated 06📊 The pattern 07👀 Watchlist 08📎 Sources

📊 Exec Summary

Agentic coding became the primary competitive arena in AI this week: Anthropic and OpenAI shipped flagship models within minutes of each other on February 5, xAI's Grok Imagine 1.0 claimed the #1 video leaderboard slot on launch day, and Google opened real-time interactive world generation to its top subscribers — all in a single week that made clear the 2026 AI race is about autonomous execution, not chat.

Four things moved in AI/tech this week:

Anthropic drops Claude Opus 4.6 — 1M context, adaptive controls
first Opus-class model with 1M tokens and long-running autonomous task support

OpenAI ships GPT-5.3-Codex minutes later — new SWE-Bench Pro SOTA
56.8% on SWE-Bench Pro, 64.7% on OSWorld (+26.5pp), the first model instrumental in creating itself

xAI Grok Imagine 1.0 goes to #1 on video leaderboards at launch
10-second 720p video with native audio; pricing and usage figures are reported in secondary coverage

Google opens Project Genie (Genie 3) — real-time world generation for AI Ultra
photorealistic interactive environments from text prompts at 20-24fps, auto-regressive and frame-by-frame

The pattern: Every major lab shipped a flagship product this week, all targeting autonomous execution — writing code, generating video, building interactive worlds — not answering questions.

1️⃣ Claude Opus 4.6: 1M context, adaptive controls, agent teams

TL;DR: Anthropic's most capable model lands with 1M token context and adaptive reasoning controls — shipped in the same Feb. 5 launch window as OpenAI's Codex release.

What happened

Released February 5, 2026; Anthropic moved release ahead of OpenAI's same-day launch window
Context window: 1 million tokens in beta — first Opus-class model to hit 1M
Long-running agent sessions are a key selling point
New adaptive reasoning controls let the model self-allocate compute to the hardest subproblems rather than applying uniform effort across a task
Agent teams mode (research preview in Claude Code): multiple Claude agents coordinate in parallel, designed for read-heavy tasks like large codebase reviews
Expanded safety tooling with behavioral controls for enterprise operators
Ranks #1 on Finance Agent benchmark; strong across agentic coding per Anthropic internal evals

🔍 The non-obvious point

The long-running task behavior is the number to watch — not the benchmark scores. Benchmarks measure performance at a moment; task horizon measures how long the model can stay useful before it drifts, loses context, or requires human re-engagement.

A 14.5-hour ceiling means Claude Opus 4.6 can plausibly run overnight on a codebase audit, a compliance document review, or a multi-step research task — without a human in the loop
The 1M token context is what makes that horizon credible: the model can hold an entire large codebase, regulatory dossier, or clinical dataset in memory for the full session
Agent teams mode signals Anthropic is building toward multi-agent parallelism as a product primitive, not just a capability demonstration

👀 What to watch

Anthropic's agent teams mode exits research preview — milestone to watch for teams building multi-agent production pipelines in regulated and technical domains

🔗 Primary source → MarkTechPost: Anthropic Releases Claude Opus 4.6

2️⃣ GPT-5.3-Codex: New SWE-Bench Pro SOTA, self-developed, 25% faster

TL;DR: OpenAI ships the first model that was instrumental in building itself, setting new records on SWE-Bench Pro and OSWorld while running 25% faster than its predecessor — released within minutes of Anthropic's Claude Opus 4.6 drop.

What happened

Released February 5, 2026, minutes after Anthropic's Claude Opus 4.6 — both labs had planned a synchronized 10am PST launch
Sets new SWE-Bench Pro public score: 56.8% (GPT-5.2-Codex was 56.4%; GPT-5.2 was 55.6%)
Terminal-Bench 2.0: 77.3%
OSWorld-Verified: 64.7% — a +26.5 percentage point jump vs GPT-5.2-Codex
Achieves SOTA results with fewer tokens than prior models
25% faster than GPT-5.2-Codex due to infrastructure and inference stack improvements
First model that was instrumental in creating itself: Codex team used early builds to debug training, manage deployment, and diagnose evaluations
Extends Codex's scope from code agent to full professional computer-use agent

📊 Benchmarks

Benchmark	GPT-5.3-Codex	GPT-5.2-Codex
SWE-Bench Pro (public)	56.8%	56.4%
Terminal-Bench 2.0	77.3%	—
OSWorld-Verified	64.7%	~38.2% (est.)

🔗 Primary source → OpenAI: Introducing GPT-5.3-Codex

🔍 The non-obvious point

The self-development claim is the headline, but the OSWorld jump is the operative number.

OSWorld measures performance on real computer tasks — browser use, file management, OS-level operations — not synthetic code problems; a +26.5pp jump suggests genuine capability expansion, not benchmark optimization
The "instrumental in creating itself" framing positions Codex as the first model in a recursive improvement loop at production scale — not in a lab
Combined with the launch timing, this is OpenAI signaling that agentic coding is now the primary competitive surface, and they intend to fight for it benchmark by benchmark

👀 What to watch

GPT-5.3-Codex-Spark (Cerebras variant at 1,000+ tokens/sec) is the next product signal — watch for broader access announcement

3️⃣ Grok Imagine 1.0: #1 Video Leaderboard at Launch, $4.20/min Native Audio

TL;DR: xAI ships Grok Imagine 1.0 on February 2, with secondary reporting claiming the top spot on Artificial Analysis video and image-to-video leaderboards — and pricing aggressive enough to undercut Sora and Veo on day one.

What happened

Grok Imagine API launched January 28; Imagine 1.0 shipped February 2 with audio and extended video
Video: 10 seconds at 720p (up from 8 seconds), native audio with synchronized dialogue, ambience, and sound effects
Capabilities: text-to-image, text-to-video, image-to-video, video editing (restyle, add/remove objects, motion control)
Pricing: reported $4.20/minute including audio via API; $0.05/second — significantly below Sora and Veo
Ranked #1 on Artificial Analysis overall video generation and image-to-video leaderboards in secondary reporting
Reported 1.245 billion videos generated in prior 30 days as of February 2
First availability outside X platform — now accessible via partner API integrations
xAI (SpaceX+xAI entity) reportedly valued at $1.1T combined; IPO signals accumulating alongside Anthropic and OpenAI

🔗 Primary source → xAI: Grok Imagine API (official API launch; pricing and usage figures are from secondary reporting)

🔍 The non-obvious point

The pricing is the strategic move, not the leaderboard rank.

Reported $4.20/min with audio pulls video generation into the cost range where product builders will run it in production — Sora and Veo sit above the threshold where most apps do real volume
Reaching #1 on Artificial Analysis on launch day removes the quality objection; the residual question is latency and uptime at scale, which API partners will stress-test in Q1
Reported 1.245B videos in 30 days is consumer adoption, not enterprise adoption — the API launch is xAI's attempt to convert consumer reach into developer infrastructure before the IPO narrative solidifies

👀 What to watch

March 2026 Grok Imagine major update (confirmed from search results) — watch for longer duration, higher resolution, or enterprise SLA announcements

4️⃣ Google Project Genie: Real-Time Interactive World Generation, AI Ultra Gated

TL;DR: Google DeepMind rolls out Genie 3-powered Project Genie to AI Ultra subscribers on January 29 — text or image prompts generate photorealistic interactive environments navigable in real time, frame-by-frame at 20-24fps.

What happened

Available to Google AI Ultra subscribers ($250/month) in the US (18+) starting January 29, 2026
Genie 3 is an auto-regressive model that generates interactive environments frame-by-frame from world descriptions and user actions
Resolution: 720p photorealistic worlds; interaction rate: 20-24fps
Three generation modes: world sketching, exploration, remixing
Current exploration limit: 60 seconds per session (compute-intensive auto-regressive architecture)
Memory: environments stay consistent for several minutes, with specific interactions recalled for up to one minute
Not available outside Google AI Ultra in US

🔗 Primary source → Google DeepMind Blog: Project Genie

🔍 The non-obvious point

Project Genie isn't a game engine — it's a real-time generative world model, and the constraint is compute, not capability.

The 60-second exploration limit and $250/month paywall are not product decisions; they are signals that Genie 3's inference cost at 20-24fps is still prohibitive for broader deployment
When inference cost drops (or Google builds dedicated hardware), the same model becomes a real-time simulation platform for training data generation, robotics environments, and interactive media — not just a consumer demo
Google releasing this while simultaneously building Gemini 3.1 suggests the Genie architecture lives in a separate research-to-product track — worth watching if it converges with Gemini's multimodal roadmap

👀 What to watch

Broader access expansion beyond AI Ultra — any announcement removing the $250/month gate would signal Google is ready to scale inference

📊 The pattern

This was the week agentic autonomy became the explicit product, not a feature. Anthropic and OpenAI both shipped models designed to run unsupervised for hours on professional tasks; xAI shipped video generation at API pricing low enough to put it in production workflows; Google previewed real-time interactive environments that only need cheaper inference to become simulation infrastructure. Every major lab signaled the same direction: the next year of competition is about what the model can do without you, not what it can tell you when you ask.

👀 Watchlist

GPT-5.3-Codex-Spark / Cerebras partnership
1,000+ tokens/sec variant; watch for broader developer access announcement indicating when OpenAI expects high-speed agentic coding to reach production

Claude Opus 4.6 agent teams GA
exit from research preview will be the signal that Anthropic believes multi-agent coordination is stable enough for production regulated use cases

Grok Imagine API SLA + enterprise terms
March 2026 update confirmed; watch for duration extension, higher resolution, and uptime guarantees that would make it viable for content production at scale

Google Project Genie compute costs
any access tier below $250/month signals inference cost is dropping fast enough to open the generative world model to developers

📎 Sources

Sources of truth

Source	Title	Link
MarkTechPost	Anthropic Releases Claude Opus 4.6 with 1M Context, Agentic Coding, Adaptive Reasoning Controls	Link
OpenAI	Introducing GPT-5.3-Codex	Link
xAI	Grok Imagine API	Link
Google DeepMind	Project Genie	Link

Also consider reading

Author / Outlet	Title	Link
Artificial Analysis	Video Generation and Image-to-Video Leaderboards	—
Anthropic	Claude Code — Agent Teams Mode (Research Preview)	—
OpenAI	SWE-Bench Pro and Terminal-Bench 2.0 Results	—

Feb 2 - Feb 8 · 2026 W06Weekly Brief9 min read

AI & Tech Brief ⚡

📌 Navigate

📊 Exec Summary

Agentic coding became the primary competitive arena in AI this week: Anthropic and OpenAI shipped flagship models within minutes of each other on February 5, xAI's Grok Imagine 1.0 claimed the #1 video leaderboard slot on launch day, and Google opened real-time interactive world generation to its top subscribers — all in a single week that made clear the 2026 AI race is about autonomous execution, not chat.

Four things moved in AI/tech this week:

Anthropic drops Claude Opus 4.6 — 1M context, adaptive controls
first Opus-class model with 1M tokens and long-running autonomous task support

OpenAI ships GPT-5.3-Codex minutes later — new SWE-Bench Pro SOTA
56.8% on SWE-Bench Pro, 64.7% on OSWorld (+26.5pp), the first model instrumental in creating itself

xAI Grok Imagine 1.0 goes to #1 on video leaderboards at launch
10-second 720p video with native audio; pricing and usage figures are reported in secondary coverage

Google opens Project Genie (Genie 3) — real-time world generation for AI Ultra
photorealistic interactive environments from text prompts at 20-24fps, auto-regressive and frame-by-frame

The pattern: Every major lab shipped a flagship product this week, all targeting autonomous execution — writing code, generating video, building interactive worlds — not answering questions.

1️⃣ Claude Opus 4.6: 1M context, adaptive controls, agent teams

TL;DR: Anthropic's most capable model lands with 1M token context and adaptive reasoning controls — shipped in the same Feb. 5 launch window as OpenAI's Codex release.

What happened

Released February 5, 2026; Anthropic moved release ahead of OpenAI's same-day launch window
Context window: 1 million tokens in beta — first Opus-class model to hit 1M
Long-running agent sessions are a key selling point
New adaptive reasoning controls let the model self-allocate compute to the hardest subproblems rather than applying uniform effort across a task
Agent teams mode (research preview in Claude Code): multiple Claude agents coordinate in parallel, designed for read-heavy tasks like large codebase reviews
Expanded safety tooling with behavioral controls for enterprise operators
Ranks #1 on Finance Agent benchmark; strong across agentic coding per Anthropic internal evals

🔍 The non-obvious point

A 14.5-hour ceiling means Claude Opus 4.6 can plausibly run overnight on a codebase audit, a compliance document review, or a multi-step research task — without a human in the loop
The 1M token context is what makes that horizon credible: the model can hold an entire large codebase, regulatory dossier, or clinical dataset in memory for the full session
Agent teams mode signals Anthropic is building toward multi-agent parallelism as a product primitive, not just a capability demonstration

👀 What to watch

Anthropic's agent teams mode exits research preview — milestone to watch for teams building multi-agent production pipelines in regulated and technical domains

🔗 Primary source → MarkTechPost: Anthropic Releases Claude Opus 4.6

2️⃣ GPT-5.3-Codex: New SWE-Bench Pro SOTA, self-developed, 25% faster

What happened

Released February 5, 2026, minutes after Anthropic's Claude Opus 4.6 — both labs had planned a synchronized 10am PST launch
Sets new SWE-Bench Pro public score: 56.8% (GPT-5.2-Codex was 56.4%; GPT-5.2 was 55.6%)
Terminal-Bench 2.0: 77.3%
OSWorld-Verified: 64.7% — a +26.5 percentage point jump vs GPT-5.2-Codex
Achieves SOTA results with fewer tokens than prior models
25% faster than GPT-5.2-Codex due to infrastructure and inference stack improvements
First model that was instrumental in creating itself: Codex team used early builds to debug training, manage deployment, and diagnose evaluations
Extends Codex's scope from code agent to full professional computer-use agent

📊 Benchmarks

Benchmark	GPT-5.3-Codex	GPT-5.2-Codex
SWE-Bench Pro (public)	56.8%	56.4%
Terminal-Bench 2.0	77.3%	—
OSWorld-Verified	64.7%	~38.2% (est.)

🔗 Primary source → OpenAI: Introducing GPT-5.3-Codex

🔍 The non-obvious point

The self-development claim is the headline, but the OSWorld jump is the operative number.

OSWorld measures performance on real computer tasks — browser use, file management, OS-level operations — not synthetic code problems; a +26.5pp jump suggests genuine capability expansion, not benchmark optimization
The "instrumental in creating itself" framing positions Codex as the first model in a recursive improvement loop at production scale — not in a lab
Combined with the launch timing, this is OpenAI signaling that agentic coding is now the primary competitive surface, and they intend to fight for it benchmark by benchmark

👀 What to watch

GPT-5.3-Codex-Spark (Cerebras variant at 1,000+ tokens/sec) is the next product signal — watch for broader access announcement

3️⃣ Grok Imagine 1.0: #1 Video Leaderboard at Launch, $4.20/min Native Audio

What happened

Grok Imagine API launched January 28; Imagine 1.0 shipped February 2 with audio and extended video
Video: 10 seconds at 720p (up from 8 seconds), native audio with synchronized dialogue, ambience, and sound effects
Capabilities: text-to-image, text-to-video, image-to-video, video editing (restyle, add/remove objects, motion control)
Pricing: reported $4.20/minute including audio via API; $0.05/second — significantly below Sora and Veo
Ranked #1 on Artificial Analysis overall video generation and image-to-video leaderboards in secondary reporting
Reported 1.245 billion videos generated in prior 30 days as of February 2
First availability outside X platform — now accessible via partner API integrations
xAI (SpaceX+xAI entity) reportedly valued at $1.1T combined; IPO signals accumulating alongside Anthropic and OpenAI

🔗 Primary source → xAI: Grok Imagine API (official API launch; pricing and usage figures are from secondary reporting)

🔍 The non-obvious point

The pricing is the strategic move, not the leaderboard rank.

Reported $4.20/min with audio pulls video generation into the cost range where product builders will run it in production — Sora and Veo sit above the threshold where most apps do real volume
Reaching #1 on Artificial Analysis on launch day removes the quality objection; the residual question is latency and uptime at scale, which API partners will stress-test in Q1
Reported 1.245B videos in 30 days is consumer adoption, not enterprise adoption — the API launch is xAI's attempt to convert consumer reach into developer infrastructure before the IPO narrative solidifies

👀 What to watch

March 2026 Grok Imagine major update (confirmed from search results) — watch for longer duration, higher resolution, or enterprise SLA announcements

4️⃣ Google Project Genie: Real-Time Interactive World Generation, AI Ultra Gated

What happened

Available to Google AI Ultra subscribers ($250/month) in the US (18+) starting January 29, 2026
Genie 3 is an auto-regressive model that generates interactive environments frame-by-frame from world descriptions and user actions
Resolution: 720p photorealistic worlds; interaction rate: 20-24fps
Three generation modes: world sketching, exploration, remixing
Current exploration limit: 60 seconds per session (compute-intensive auto-regressive architecture)
Memory: environments stay consistent for several minutes, with specific interactions recalled for up to one minute
Not available outside Google AI Ultra in US

🔗 Primary source → Google DeepMind Blog: Project Genie

🔍 The non-obvious point

Project Genie isn't a game engine — it's a real-time generative world model, and the constraint is compute, not capability.

The 60-second exploration limit and $250/month paywall are not product decisions; they are signals that Genie 3's inference cost at 20-24fps is still prohibitive for broader deployment
When inference cost drops (or Google builds dedicated hardware), the same model becomes a real-time simulation platform for training data generation, robotics environments, and interactive media — not just a consumer demo
Google releasing this while simultaneously building Gemini 3.1 suggests the Genie architecture lives in a separate research-to-product track — worth watching if it converges with Gemini's multimodal roadmap

👀 What to watch

Broader access expansion beyond AI Ultra — any announcement removing the $250/month gate would signal Google is ready to scale inference

📊 The pattern

👀 Watchlist

GPT-5.3-Codex-Spark / Cerebras partnership
1,000+ tokens/sec variant; watch for broader developer access announcement indicating when OpenAI expects high-speed agentic coding to reach production

Claude Opus 4.6 agent teams GA
exit from research preview will be the signal that Anthropic believes multi-agent coordination is stable enough for production regulated use cases

Google Project Genie compute costs
any access tier below $250/month signals inference cost is dropping fast enough to open the generative world model to developers

📎 Sources

Sources of truth

Source	Title	Link
MarkTechPost	Anthropic Releases Claude Opus 4.6 with 1M Context, Agentic Coding, Adaptive Reasoning Controls	Link
OpenAI	Introducing GPT-5.3-Codex	Link
xAI	Grok Imagine API	Link
Google DeepMind	Project Genie	Link

Also consider reading

Author / Outlet	Title	Link
Artificial Analysis	Video Generation and Image-to-Video Leaderboards	—
Anthropic	Claude Code — Agent Teams Mode (Research Preview)	—
OpenAI	SWE-Bench Pro and Terminal-Bench 2.0 Results	—

📌 Navigate

📊 Exec Summary

1️⃣ Claude Opus 4.6: 1M context, adaptive controls, agent teams

2️⃣ GPT-5.3-Codex: New SWE-Bench Pro SOTA, self-developed, 25% faster

3️⃣ Grok Imagine 1.0: #1 Video Leaderboard at Launch, $4.20/min Native Audio

4️⃣ Google Project Genie: Real-Time Interactive World Generation, AI Ultra Gated

📊 The pattern

👀 Watchlist

📎 Sources

Sources of truth

Also consider reading

More AI & Tech

📌 Navigate

📊 Exec Summary

1️⃣ Claude Opus 4.6: 1M context, adaptive controls, agent teams

2️⃣ GPT-5.3-Codex: New SWE-Bench Pro SOTA, self-developed, 25% faster

3️⃣ Grok Imagine 1.0: #1 Video Leaderboard at Launch, $4.20/min Native Audio

4️⃣ Google Project Genie: Real-Time Interactive World Generation, AI Ultra Gated

📊 The pattern

👀 Watchlist

📎 Sources

Sources of truth

Also consider reading

More AI & Tech