2024 Q3Quarterly Review8 min read

AI & Tech Review ⚡

Q3 2024 delivered a decisive answer to the open-vs-closed debate: both sides escalated simultaneously. Meta shipped Llama 3.1 405B, the largest open-weight model to date, proving that permissive-license models can compete at the frontier. Three weeks later OpenAI responded with o1-preview, the first production "reasoning" model that spends inference-time compute on chain-of-thought before answering. Meanwhile Anthropic kept Claude 3.5 Sonnet competitive on coding benchmarks. The quarter's subtext: raw parameter counts matter less than how models allocate compute, and the application layer -- coding agents, inference hardware, agentic frameworks -- is where value is consolidating fastest.

📌 Navigate

01📋 Exec Summary 02📊 What Moved 03📈 Trend Arcs 04🗺️ Landscape Shift 05💰 Funding & Deal Pattern 06🔍 Counter-Narrative 07📐 Builder's Benchmark 08👀 What to Watch 09📎 Sources

📋 Exec Summary

📊 What Moved

Open-weight models reach frontier scale
Meta released Llama 3.1 405B (July 23), trained on 15T tokens across 16K GPUs. Ranked first in instruction-following, second in math/reasoning on SEAL leaderboard. License change allowing output-based training catalyzed a derivative ecosystem within weeks.

Reasoning becomes a model primitive
OpenAI's o1-preview (September 12) introduced inference-time chain-of-thought, trading latency for accuracy. PhD-level performance in physics/chemistry/biology; 83rd percentile on Codeforces. o1-mini offered 80% lower cost for coding. Scaling laws now apply to inference compute, not just training.

Tool use pressure builds
Anthropic's Claude 3.5 Sonnet stayed competitive on coding benchmarks while the industry moved toward deeper software interaction. SWE-bench Verified jumped from 33.4% to 49.0%. Replit, Canva, Asana began integrating for multi-step workflow automation.

Coding agents become the default developer surface
Cursor AI Series A ($60M, $400M valuation, August). GitHub previewed Copilot Workspace (issues to specs to PRs). The coding agent category (Cursor, Copilot, Replit, Codeium) became the most commercially visible AI application of the quarter.

Inference hardware competition intensifies
Cerebras filed for IPO. SambaNova shipped SN40L (520 MB SRAM, 64 GB HBM3, 1.5 TB DDR5). Groq scaled LPU clusters. NVIDIA Blackwell B200 delayed from October to December on yield issues. Inference cost, not training cost, is the binding constraint.

📈 Trend Arcs

1. Open-Weight Models as Infrastructure

Velocity: Accelerating

Llama 3.1 405B proved that open-weight models can match closed frontier performance on most benchmarks. The license change permitting output-based training catalyzed a derivative ecosystem. Enterprise adoption of open models for on-premise and sovereign deployments surged, particularly in regulated industries.

Key enablers:

Meta's new license allows Llama outputs to train other models -- the first permissive frontier license
128K context window matches closed-model capabilities
Community fine-tunes appeared within days, targeting coding, instruction-following, and multilingual tasks

Where it stands: Open-weight models are no longer "catching up" -- they are a parallel frontier. The gap is narrowing to specialized capabilities (reasoning, tool use) where closed labs still lead, but the commodity layer of text generation is effectively open.

2. Inference-Time Compute as the New Scaling Axis

Velocity: Emerging rapidly

OpenAI o1 demonstrated that spending more compute at inference (chain-of-thought reasoning) can substitute for larger training runs. This created a new cost curve: higher per-query cost for dramatically better accuracy on hard problems.

Hardware implications:

Inference chips from Cerebras, Groq, and SambaNova become strategically important as the inference-to-training compute ratio shifts
Memory bandwidth, not raw FLOPs, becomes the binding constraint for reasoning workloads
The inference-as-a-service market bifurcates: speed-optimized (Groq LPU) vs. accuracy-optimized (SambaNova SN40L)

Where it stands: Early. o1-preview is the only production reasoning model. But the approach validates a second scaling axis, and every major lab is expected to ship reasoning variants by mid-2025.

3. Agentic Coding as First Killer App

Velocity: Accelerating

Cursor, GitHub Copilot Workspace, and the CrewAI/AutoGen/LangGraph framework explosion all converged on the same thesis: developers are the first users willing to cede control to AI agents for multi-step tasks. The economics are compelling -- developer time is expensive, code is verifiable, and feedback loops are tight.

Market signals:

Cursor's path from launch to $400M valuation -- fastest AI-native IDE trajectory
GitHub Copilot Workspace converts issues to specs, plans, and PRs -- full agentic loop
CrewAI, AutoGen, LangGraph all reached production-ready status -- orchestration layer commoditizing

Where it stands: Product-market fit is real. The question is whether coding agents commoditize or whether network effects (codebase context, team memory) create defensibility. The framework layer is commoditizing fast, suggesting value accrues to the application layer and the model layer, not middleware.

🗺️ Landscape Shift

Dimension	Start of Q3	End of Q3	Direction
Largest open-weight model	Llama 3 70B	Llama 3.1 405B	Frontier-competitive open models
Inference paradigm	Single-pass generation	Chain-of-thought reasoning (o1)	Latency-accuracy tradeoff as product choice
Model-environment interaction	Text/code output only	Tool-use pressure building; GUI-level computer use still ahead	Models operating inside software
Developer tooling	Copilot autocomplete	Agentic IDEs (Cursor, Copilot Workspace)	Multi-step autonomous coding
Inference hardware	NVIDIA GPU monopoly	Cerebras IPO, Groq/SambaNova scaling	Specialized inference silicon emerging
AI agent frameworks	LangChain dominant	CrewAI, AutoGen, LangGraph proliferation	Multi-agent orchestration standardizing
Anthropic funding	~$4B cumulative from Amazon	~$4B cumulative from Amazon by quarter close	Hyperscaler-lab integration deepening

💰 Funding & Deal Pattern

Anthropic
Cumulative $4B from Amazon ($1.25B Sep 2023, $2.75B Mar 2024) by quarter close. AWS became primary training partner. Investment-for-cloud-commitment became the template for lab-hyperscaler partnerships.

Cursor AI
$60M Series A at $400M valuation (August). Fastest path from coding tool to unicorn trajectory in the AI-native IDE space.

Cerebras
Filed for IPO, seeking public-market validation of the inference-chip thesis. Revenue growth strong but customer concentration a risk factor.

NVIDIA
Blackwell production delayed but projected 450K B200 units in Q4 2024 (~$10B potential revenue from a single product line).

AI agent frameworks
CrewAI, AutoGen (Microsoft), LangGraph (LangChain) all reached production-ready status. No dominant funding round, but collective activity confirmed multi-agent orchestration becoming standard infrastructure.

Inference-as-a-service
Groq's LPU-based API gained traction for latency-sensitive apps; SambaNova's SN40L targeted enterprises needing full-precision inference without quantization accuracy loss.

Signal: capital flowing to both foundation-model providers and pick-and-shovel layer (chips, dev tools), while pure "wrapper" applications face increasing skepticism.

🔍 Counter-Narrative

The consensus: o1's reasoning approach is the next frontier capability. The reality: 90% of production use cases need fast, cheap, "good enough" responses where single-pass models remain superior. Enterprises discovered 5-10x higher latency and cost for tasks that don't need PhD-level reasoning. Risk: reasoning models become impressive demos while open-weight commodity models eat the volume market.
The consensus: CrewAI, AutoGen, and LangGraph prove multi-agent orchestration is production-ready. The reality: Error rates compound across agent steps, context windows overflow on complex tasks, and debugging multi-agent systems is qualitatively harder than single-model pipelines. Most production deployments still use single-model, single-turn architectures with deterministic orchestration -- because reliability at scale demands it.

📐 Builder's Benchmark

Metric	Q2 2024	Q3 2024	Delta
Largest open-weight model (params)	70B (Llama 3)	405B (Llama 3.1)	+5.8x
SWE-bench Verified (best public)	33.4% (Claude 3.5 Sonnet)	49.0% (Claude 3.5 Sonnet updated)	+15.6 pp
Coding agent IDE valuation ceiling	Seed stage	$400M (Cursor Series A)	New category
Inference chip IPO candidates	0	1 (Cerebras filed)	Market validation
AI agent frameworks (major)	LangChain + early others	CrewAI, AutoGen, LangGraph mature	3+ production-ready
o1 reasoning benchmark (Codeforces)	N/A	83rd percentile	New capability class

👀 What to Watch

o1 full release and pricing
the gap between preview and production will determine whether reasoning models are a research curiosity or a commercial category; the full o1 model is expected in December

Llama 3.1 derivative ecosystem
whether open-weight fine-tunes can match closed-model quality on specialized tasks (coding, reasoning, tool use) within one quarter; the license change makes this structurally possible for the first time

Cursor vs. Copilot vs. Claude Code
three distinct approaches to agentic coding (IDE-native, platform-integrated, model-native); market share in Q4 will signal which paradigm wins the developer workflow

Blackwell volume availability
delayed GPU shipments create a window for Cerebras/Groq/SambaNova to win inference workloads; first Blackwell servers reportedly shipping to Microsoft in early December

EU AI Act first enforcement provisions
the Act entered into force August 1; February 2025 brings the first bans on unacceptable-risk AI systems, with fines up to EUR 35M or 7% of global turnover

📎 Sources

Source	URL
Meta -- Introducing Llama 3.1	https://ai.meta.com/blog/meta-llama-3-1/
OpenAI -- Introducing o1	https://openai.com/index/introducing-openai-o1-preview/
Anthropic -- Claude 3.5 Sonnet and Computer Use	https://www.anthropic.com/news/3-5-models-and-computer-use
GitHub Blog -- Copilot Workspace	https://github.blog/news-insights/product-news/github-copilot-workspace/
TechCrunch -- Copilot Workspace Preview	https://techcrunch.com/2024/04/29/copilot-workspace-is-githubs-take-on-ai-powered-software-engineering/
Cursor -- Sacra Revenue Profile	https://sacra.com/c/cursor/
CNBC -- Amazon $4B Anthropic Investment	https://www.cnbc.com/2024/11/22/amazon-to-invest-another-4-billion-in-anthropic-openais-biggest-rival.html
TechCrunch -- Anthropic $4B from Amazon	https://techcrunch.com/2024/11/22/anthropic-raises-an-additional-4b-from-amazon-makes-aws-its-primary-cloud-partner/
SambaNova -- SN40L Inference Chip	https://sambanova.ai/blog/sn40l-chip-best-inference-solution
NVIDIA Newsroom -- Blackwell Platform	https://nvidianews.nvidia.com/news/nvidia-blackwell-platform-arrives-to-power-a-new-era-of-computing
CNBC -- NVIDIA Blackwell B200	https://www.cnbc.com/2024/03/18/nvidia-announces-gb200-blackwell-ai-chip-launching-later-this-year.html
Yahoo Finance -- Blackwell 450K Units Q4	https://finance.yahoo.com/news/nvidia-expected-produce-450-000-145414205.html
DataCamp -- CrewAI vs LangGraph vs AutoGen	https://www.datacamp.com/tutorial/crewai-vs-langgraph-vs-autogen
EU AI Act -- Implementation Timeline	https://artificialintelligenceact.eu/implementation-timeline/
InfoQ -- Meta Llama 3.1 405B	https://www.infoq.com/news/2024/07/meta-releases-llama31-405b/
OpenAI o1 -- Wikipedia	https://en.wikipedia.org/wiki/OpenAI_o1

2024 Q3Quarterly Review8 min read

AI & Tech Review ⚡

📌 Navigate

01📋 Exec Summary 02📊 What Moved 03📈 Trend Arcs 04🗺️ Landscape Shift 05💰 Funding & Deal Pattern 06🔍 Counter-Narrative 07📐 Builder's Benchmark 08👀 What to Watch 09📎 Sources

📋 Exec Summary

📊 What Moved

📈 Trend Arcs

1. Open-Weight Models as Infrastructure

Velocity: Accelerating

Key enablers:

Meta's new license allows Llama outputs to train other models -- the first permissive frontier license
128K context window matches closed-model capabilities
Community fine-tunes appeared within days, targeting coding, instruction-following, and multilingual tasks

2. Inference-Time Compute as the New Scaling Axis

Velocity: Emerging rapidly

Hardware implications:

Inference chips from Cerebras, Groq, and SambaNova become strategically important as the inference-to-training compute ratio shifts
Memory bandwidth, not raw FLOPs, becomes the binding constraint for reasoning workloads
The inference-as-a-service market bifurcates: speed-optimized (Groq LPU) vs. accuracy-optimized (SambaNova SN40L)

3. Agentic Coding as First Killer App

Velocity: Accelerating

Market signals:

Cursor's path from launch to $400M valuation -- fastest AI-native IDE trajectory
GitHub Copilot Workspace converts issues to specs, plans, and PRs -- full agentic loop
CrewAI, AutoGen, LangGraph all reached production-ready status -- orchestration layer commoditizing

🗺️ Landscape Shift

Dimension	Start of Q3	End of Q3	Direction
Largest open-weight model	Llama 3 70B	Llama 3.1 405B	Frontier-competitive open models
Inference paradigm	Single-pass generation	Chain-of-thought reasoning (o1)	Latency-accuracy tradeoff as product choice
Model-environment interaction	Text/code output only	Tool-use pressure building; GUI-level computer use still ahead	Models operating inside software
Developer tooling	Copilot autocomplete	Agentic IDEs (Cursor, Copilot Workspace)	Multi-step autonomous coding
Inference hardware	NVIDIA GPU monopoly	Cerebras IPO, Groq/SambaNova scaling	Specialized inference silicon emerging
AI agent frameworks	LangChain dominant	CrewAI, AutoGen, LangGraph proliferation	Multi-agent orchestration standardizing
Anthropic funding	~$4B cumulative from Amazon	~$4B cumulative from Amazon by quarter close	Hyperscaler-lab integration deepening

💰 Funding & Deal Pattern

Cursor AI
$60M Series A at $400M valuation (August). Fastest path from coding tool to unicorn trajectory in the AI-native IDE space.

Cerebras
Filed for IPO, seeking public-market validation of the inference-chip thesis. Revenue growth strong but customer concentration a risk factor.

NVIDIA
Blackwell production delayed but projected 450K B200 units in Q4 2024 (~$10B potential revenue from a single product line).

Inference-as-a-service
Groq's LPU-based API gained traction for latency-sensitive apps; SambaNova's SN40L targeted enterprises needing full-precision inference without quantization accuracy loss.

Signal: capital flowing to both foundation-model providers and pick-and-shovel layer (chips, dev tools), while pure "wrapper" applications face increasing skepticism.

🔍 Counter-Narrative

The consensus: o1's reasoning approach is the next frontier capability. The reality: 90% of production use cases need fast, cheap, "good enough" responses where single-pass models remain superior. Enterprises discovered 5-10x higher latency and cost for tasks that don't need PhD-level reasoning. Risk: reasoning models become impressive demos while open-weight commodity models eat the volume market.
The consensus: CrewAI, AutoGen, and LangGraph prove multi-agent orchestration is production-ready. The reality: Error rates compound across agent steps, context windows overflow on complex tasks, and debugging multi-agent systems is qualitatively harder than single-model pipelines. Most production deployments still use single-model, single-turn architectures with deterministic orchestration -- because reliability at scale demands it.

📐 Builder's Benchmark

Metric	Q2 2024	Q3 2024	Delta
Largest open-weight model (params)	70B (Llama 3)	405B (Llama 3.1)	+5.8x
SWE-bench Verified (best public)	33.4% (Claude 3.5 Sonnet)	49.0% (Claude 3.5 Sonnet updated)	+15.6 pp
Coding agent IDE valuation ceiling	Seed stage	$400M (Cursor Series A)	New category
Inference chip IPO candidates	0	1 (Cerebras filed)	Market validation
AI agent frameworks (major)	LangChain + early others	CrewAI, AutoGen, LangGraph mature	3+ production-ready
o1 reasoning benchmark (Codeforces)	N/A	83rd percentile	New capability class

👀 What to Watch

EU AI Act first enforcement provisions
the Act entered into force August 1; February 2025 brings the first bans on unacceptable-risk AI systems, with fines up to EUR 35M or 7% of global turnover

📎 Sources

Source	URL
Meta -- Introducing Llama 3.1	https://ai.meta.com/blog/meta-llama-3-1/
OpenAI -- Introducing o1	https://openai.com/index/introducing-openai-o1-preview/
Anthropic -- Claude 3.5 Sonnet and Computer Use	https://www.anthropic.com/news/3-5-models-and-computer-use
GitHub Blog -- Copilot Workspace	https://github.blog/news-insights/product-news/github-copilot-workspace/
TechCrunch -- Copilot Workspace Preview	https://techcrunch.com/2024/04/29/copilot-workspace-is-githubs-take-on-ai-powered-software-engineering/
Cursor -- Sacra Revenue Profile	https://sacra.com/c/cursor/
CNBC -- Amazon $4B Anthropic Investment	https://www.cnbc.com/2024/11/22/amazon-to-invest-another-4-billion-in-anthropic-openais-biggest-rival.html
TechCrunch -- Anthropic $4B from Amazon	https://techcrunch.com/2024/11/22/anthropic-raises-an-additional-4b-from-amazon-makes-aws-its-primary-cloud-partner/
SambaNova -- SN40L Inference Chip	https://sambanova.ai/blog/sn40l-chip-best-inference-solution
NVIDIA Newsroom -- Blackwell Platform	https://nvidianews.nvidia.com/news/nvidia-blackwell-platform-arrives-to-power-a-new-era-of-computing
CNBC -- NVIDIA Blackwell B200	https://www.cnbc.com/2024/03/18/nvidia-announces-gb200-blackwell-ai-chip-launching-later-this-year.html
Yahoo Finance -- Blackwell 450K Units Q4	https://finance.yahoo.com/news/nvidia-expected-produce-450-000-145414205.html
DataCamp -- CrewAI vs LangGraph vs AutoGen	https://www.datacamp.com/tutorial/crewai-vs-langgraph-vs-autogen
EU AI Act -- Implementation Timeline	https://artificialintelligenceact.eu/implementation-timeline/
InfoQ -- Meta Llama 3.1 405B	https://www.infoq.com/news/2024/07/meta-releases-llama31-405b/
OpenAI o1 -- Wikipedia	https://en.wikipedia.org/wiki/OpenAI_o1

📌 Navigate

📋 Exec Summary

📊 What Moved

📈 Trend Arcs

1. Open-Weight Models as Infrastructure

2. Inference-Time Compute as the New Scaling Axis

3. Agentic Coding as First Killer App

🗺️ Landscape Shift

💰 Funding & Deal Pattern

🔍 Counter-Narrative

📐 Builder's Benchmark

👀 What to Watch

📎 Sources

More AI & Tech

📌 Navigate

📋 Exec Summary

📊 What Moved

📈 Trend Arcs

1. Open-Weight Models as Infrastructure

2. Inference-Time Compute as the New Scaling Axis

3. Agentic Coding as First Killer App

🗺️ Landscape Shift

💰 Funding & Deal Pattern

🔍 Counter-Narrative

📐 Builder's Benchmark

👀 What to Watch

📎 Sources

More AI & Tech