AI & Tech Review ⚡
Q2 2024 made multimodal AI the baseline with GPT-4o shipping free-tier vision and audio at as low as 232 ms latency (320 ms average), while Claude 3.5 Sonnet topped coding benchmarks at 1/5 the cost of Opus. Apple entered the race with on-device AI on a limited set of devices, NVIDIA crossed $3T market cap, and the Superalignment team dissolution at OpenAI signaled that safety research competes with product velocity for resources.
📌 Navigate
📋 Exec Summary
Q2 2024 made multimodal AI the baseline with GPT-4o shipping free-tier vision and audio at as low as 232 ms latency (320 ms average), while Claude 3.5 Sonnet topped coding benchmarks at 1/5 the cost of Opus. Apple entered the race with on-device AI on a limited set of devices, NVIDIA crossed $3T market cap, and the Superalignment team dissolution at OpenAI signaled that safety research competes with product velocity for resources.
📊 What Moved
Multimodal AI became a shipping product
GPT-4o (May 13) collapsed text, vision, and audio into a single model served free to hundreds of millions. Audio response latency dropped to as low as 232 ms, with 320 ms average; API costs fell 50% vs. GPT-4 Turbo.
Claude 3.5 Sonnet set new benchmarks
Anthropic shipped on June 20, topping GPQA, MMLU, and HumanEval at 2x the speed and 1/5 the cost of Opus. Artifacts (inline rendered code/docs/designs) signaled a shift from chat to collaborative workspace. Default coding assistant overnight.
Gemini 1.5 Pro hit 2M-token context
Google I/O (May 14) delivered a 1M-token public-preview context window, plus Flash for latency-sensitive workloads and Project Astra for real-time multimodal agents. The 2M-token context was available via waitlist/private preview. AI Overviews began rolling into US search results.
Apple entered the race
WWDC (June 10) introduced Apple Intelligence: on-device AI across iPhone/iPad/Mac, blending local inference with Private Cloud Compute and an OpenAI/Siri partnership. On-device inference became a first-class distribution channel on supported devices.
NVIDIA crossed $3T market cap
Briefly the world's most valuable company (June 18, $3.3T). Q1 FY25 revenue hit $26B, triple YoY. The AI capex cycle showed no deceleration.
📈 Trend Arcs
Arc 1: The Multimodal Default
Velocity: Accelerating
Q1 introduced multimodal models as premium features. Q2 made them baseline. GPT-4o shipped vision, audio, and text natively at free-tier pricing. Gemini 1.5 Pro paired multimodality with a 1M-token public-preview window and 2M-token waitlist access. Claude 3.5 Sonnet advanced vision capabilities while dominating code benchmarks. By quarter close, any new model lacking multimodal input was positioned as incomplete.
Where it stands at quarter close: Multimodal is table stakes for frontier models. The competition has shifted from "can it do multimodal" to "how fast and how cheap."
Arc 2: The On-Device Inference Push
Velocity: Accelerating
Microsoft announced Copilot+ PCs on May 20 requiring 40+ TOPS NPUs — a hardware floor that excludes most existing PCs. Apple Intelligence, announced June 10, tied AI features to Apple silicon (iPhone 15 Pro/Pro Max and M1-or-later iPad and Mac). Qualcomm's Snapdragon X Elite became the reference chip for Windows AI PCs. The implication: on-device AI is now a hardware upgrade cycle driver, not a software-only play.
Where it stands at quarter close: Two platform incumbents (Apple, Microsoft) have committed to shipping AI at the OS level. The install base is small but the directional bet is clear — edge inference is a distribution moat.
Arc 3: Safety vs. Speed — The Governance Fracture
Velocity: Accelerating
OpenAI dissolved its Superalignment team in May after co-founder Ilya Sutskever and safety lead Jan Leike both resigned. Leike publicly stated that "safety culture and processes have taken a backseat to shiny products." The EU AI Act moved toward enforcement timelines while U.S. governance remained fragmented. Mistral continued its open-weights strategy, arguing that European data sovereignty and transparency provide a different safety paradigm.
Where it stands at quarter close: The industry's largest lab has signaled, through personnel loss and team restructuring, that safety research competes with product velocity for resources. No regulatory framework has filled the gap.
🗺️ Landscape Shift
| Player | Quarter open | Quarter close | What changed |
|---|---|---|---|
| OpenAI | GPT-4 Turbo dominant; board drama unresolved | GPT-4o shipped; free-tier multimodal; Superalignment team dissolved; Sutskever and Leike departed | Product velocity up, safety credibility down |
| Anthropic | Claude 3 family (Opus/Sonnet/Haiku) | Claude 3.5 Sonnet tops benchmarks; Artifacts launched; coding use case captured | Established as the builder's preferred model |
| Gemini 1.0 Ultra in limited access | Gemini 1.5 Pro public preview with 1M context; 2M via waitlist; Flash for cost; AI Overviews in Search; Project Astra demo | Regained competitive positioning with context length advantage | |
| Apple | No AI strategy visible | Apple Intelligence announced; limited-device on-device + cloud architecture; OpenAI partnership | Entered the race with hardware-integrated distribution |
| Microsoft | Copilot integrated in Office | Copilot+ PCs announced; NPU hardware floor set; Recall feature previewed | Shifted AI from cloud service to PC hardware requirement |
| NVIDIA | ~$2.2T market cap | $3.3T market cap; world's most valuable company (briefly) | AI capex cycle validated at unprecedented scale |
| Mistral | Mixtral 8x7B; Series A momentum | Mistral Large competing on benchmarks; EU AI sovereignty narrative strengthened | Positioned as Europe's frontier lab alternative |
💰 Funding & Deal Pattern
NVIDIA's revenue trajectory
$26B quarterly (triple YoY) served as the clearest demand signal. Hyperscalers committed multi-year capex exceeding $100B collectively; every major cloud provider announced expanded GPU cluster availability.
Mistral AI
Continued raising at escalating valuations, positioning European sovereign AI as a fundable thesis. Open-weights strategy attracted both commercial partnerships and French government backing.
Microsoft Copilot+ PCs
OEM hardware partnerships becoming a funding/GTM channel for on-device AI startups. 40+ TOPS NPU requirement creates a hardware upgrade cycle benefiting chip designers and device manufacturers.
Application-layer deals
Increasingly demanded proof of retention and revenue, not just model access. "Wrapper" startups raising on demo alone showed strain; investors shifted toward full-stack AI companies with data moats or workflow integration.
Open-source momentum
Mixtral and Llama ecosystems shifted VC attention toward fine-tuning, deployment tooling, and inference optimization. Investable surface expanded from "build a foundation model" to "make existing models deployable."
🔍 Counter-Narrative
- The consensus: GPT-4o's free tier democratizes AI and expands the builder ecosystem. The reality: When the incumbent gives away the capability startups charge for, AI wrapper addressable markets shrink. Margin compression hits faster than expected; the real opportunity shifts to workflow integration the horizontal platform cannot replicate.
- The consensus: The Superalignment team dissolution was internal politics. The reality: Losing both Sutskever (co-founder) and Leike (safety lead) within days is structural, not personal. Leike's critique that safety was "sailing against the wind" for compute suggests systemic under-investment. If the EU AI Act enforcement tightens in 2025, labs that skipped safety infrastructure face compliance debt that cannot be resolved quickly.
📐 Builder's Benchmark
| Metric | Q1 2024 | Q2 2024 | Delta |
|---|---|---|---|
| GPT-4-class API cost (per 1M input tokens) | $10.00 (GPT-4 Turbo) | $5.00 (GPT-4o) | -50% |
| Claude cost (per 1M input tokens, best model) | $15.00 (Opus) | $3.00 (3.5 Sonnet) | -80% |
| Max context window (production) | 128K (GPT-4 Turbo) | 1M public preview / 2M waitlist (Gemini 1.5 Pro) | +7.8x |
| HumanEval top score (code gen) | ~86% (GPT-4 Turbo) | ~92% (Claude 3.5 Sonnet) | +6 pts |
| Audio response latency (GPT-4o) | N/A | 232 ms best case / 320 ms avg | New capability |
| NPU floor for "AI PC" | None defined | 40+ TOPS (Copilot+) | New hardware bar |
| NVIDIA quarterly revenue | $22.1B (Q4 FY24) | $26.0B (Q1 FY25) | +18% QoQ |
👀 What to Watch
July-August 2024
Apple Intelligence developer beta drops; first real benchmarks of on-device inference quality and latency
August 2024
Recursion-Exscientia merger announcement expected; signals AI-bio consolidation wave
September 2024
EU AI Act compliance deadlines begin phased enforcement; watch for lab responses on model deployment in EU
Q3 2024
OpenAI expected to reveal next-generation model; rumored reasoning/agent capabilities beyond GPT-4o
Ongoing
NVIDIA earnings (August) will confirm whether the $26B quarterly revenue pace sustains or was a one-time pull-forward