AI & Tech Review ⚡
Q1 2020 is the quarter the scaling hypothesis went from fringe to foundation. The Kaplan et al. paper quantified power-law relationships between compute, parameters, and language model performance, giving OpenAI the confidence to train GPT-3 at 175B parameters. Microsoft entered the parameter race with Turing-NLG, while Hugging Face consolidated the practitioner ecosystem around BERT-class tools. The gap between frontier labs and the open-source community was widening at the top and narrowing at the application layer.
📌 Navigate
📋 Exec Summary
Q1 2020 is the quarter the scaling hypothesis went from fringe to foundation. The Kaplan et al. paper quantified power-law relationships between compute, parameters, and language model performance, giving OpenAI the confidence to train GPT-3 at 175B parameters. Microsoft entered the parameter race with Turing-NLG, while Hugging Face consolidated the practitioner ecosystem around BERT-class tools. The gap between frontier labs and the open-source community was widening at the top and narrowing at the application layer.
📊 What Moved
The Kaplan paper rewrote the rules of the game — and almost no one noticed
On January 23, 2020, Jared Kaplan, Sam McCandlish, Tom Henighan, and their colleagues at OpenAI posted "Scaling Laws for Neural Language Models" to arXiv. The paper was not loud.
BERT had conquered NLP; the question was what came next
By January 2020, Google's BERT — published in late 2018 — was the dominant paradigm across NLP tasks. It had beaten human baselines on SQuAD and set state-of-the-art on eleven standard benchmarks.
Microsoft announced the largest published language model — and it was already about to be lapped
On February 13, 2020, Microsoft Research published Turing-NLG, a 17-billion-parameter autoregressive language model trained on 16 NVIDIA DGX-2 systems using their new DeepSpeed library and ZeRO optimizer. Microsoft called it "the largest model ever published" and demonstrated it on summarization and question-answering tasks where it outperformed GPT-2's 1.5 billion parameters substantially.
The compute infrastructure for the coming decade was being laid in Q1 2020
NVIDIA's A100 GPU was not yet released (it would come in May 2020 with the DGX A100 announcement), meaning the dominant training hardware was still the V100. Cloud providers — AWS, Google Cloud, and Azure — were the primary delivery vehicles for ML compute.
The open-source / closed gap was narrowing at the application layer but widening at the frontier
Hugging Face's Transformers library, which had launched in late 2019, was becoming the dominant open-source vehicle for deploying BERT-class models. PyTorch had overtaken TensorFlow as the research-preferred framework.
📈 Trend Arcs
Arc 1: The Scaling Hypothesis Goes From Fringe to Foundation
Velocity: Accelerating
The idea that raw scale in compute and parameters was the primary driver of capability had been argued informally for several years — most prominently by Richard Sutton's 2019 essay "The Bitter Lesson," which argued that search and learning algorithms that leverage compute have always beaten human-designed approaches. The Kaplan et al. paper in January 2020 transformed this philosophical position into a quantitative claim with actionable engineering implications. Across January and February, the paper circulated among researchers at OpenAI, DeepMind, Google Brain, and leading academic groups. It was not widely covered in the AI media ecosystem — no TechCrunch piece, no Twitter viral moment at the time of publication. The reaction was concentrated in the people who understood what power laws over seven orders of magnitude implied: that scaling was the strategy, not a strategy.
By March, the Kaplan paper had become a planning document inside OpenAI for GPT-3. The decision to train at 175 billion parameters — a scale that would have seemed implausible eighteen months earlier — was downstream of the confidence this paper gave the team that the relationship would hold. At the same time, other organizations began internalizing the same logic. Google Brain's team was already operating at multi-hundred-billion-parameter scales internally. The quarter ended with a small but consequential group of researchers converging on the same insight: the next two to three years would be won by whoever could marshal the most compute with the best data pipelines.
Where it stands at quarter close: The scaling hypothesis has moved from contested claim to working assumption among frontier researchers. It has not yet leaked into mainstream AI discourse, leaving a significant insight gap between the frontier and the broader community.
Arc 2: The Language Model Arms Race Reshapes Lab Rankings
Velocity: Accelerating
In January 2020, the conventional wisdom in AI was that DeepMind and Google Brain were the dominant research organizations. DeepMind had won the protein folding race with AlphaFold's strong showing at CASP13 in 2018, was publishing at a high rate in Nature and Science, and had the backing of Alphabet's resources. Google Brain had the largest internal compute budget of any research organization and had produced the transformer architecture itself (2017) and BERT (2018). OpenAI, by contrast, was a company of approximately 100–130 people, had made headlines for the GPT-2 "too dangerous to release" story in February 2019, and was widely seen as a second-tier research organization that was better at communications than at science.
The Turing-NLG announcement in February 2020 showed that Microsoft was entering the parameter race directly. Crucially, however, it also surfaced the new battlefield: generative, autoregressive language models rather than the BERT-style bidirectional models that had dominated NLP benchmarks for two years. The shift in the arms race was subtle but real — from "who can get the best fine-tuned benchmark score" to "who can train the largest coherent generative system." This is a contest that favored compute scale over research elegance, which is precisely what the Kaplan paper had quantified. By March 2020, OpenAI's trajectory — not yet publicly visible — was positioned to displace Google and DeepMind from their perceived leadership position within six months.
Where it stands at quarter close: Google Brain and DeepMind are the publicly perceived leaders. OpenAI is building the model that will invert this perception by June. Microsoft has revealed enough of its infrastructure ambitions to signal it is a serious third actor.
Arc 3: The Developer Ecosystem Bifurcates
Velocity: Steady
Q1 2020 saw a maturing bifurcation in how developers accessed and used AI capabilities. The Hugging Face Transformers library had become the de facto standard for loading, fine-tuning, and running BERT-class models, giving the practitioner layer of the market (the developers building applications rather than training foundation models) access to state-of-the-art NLP capabilities without requiring frontier-scale infrastructure. Meanwhile, the companies closest to the frontier — OpenAI, Google, Microsoft — were operating at scales entirely out of reach for the practitioner layer.
This bifurcation created two distinct markets that were not yet well-understood as separate. The practitioner market — fine-tuning BERT for document classification, named entity recognition, sentiment analysis — was growing rapidly and democratizing. The frontier market — training from scratch at billion-parameter scales — was consolidating rapidly around three to five organizations. The practitioner market drove Hugging Face's growth; the frontier market drove the $1 billion investments into OpenAI and the internal compute buildouts at Google and Microsoft. Throughout Q1, the API model for delivering frontier capabilities to practitioners had not yet been implemented — GPT-3's API, which would create a third layer between the two, would not arrive until mid-2020.
Where it stands at quarter close: The practitioner ecosystem is healthy and growing on BERT-class tools. The frontier is concentrating rapidly. The bridge between them — large-model APIs — does not yet exist.
🗺️ Landscape Shift
| Player | Position at quarter open | Position at quarter close | What changed |
|---|---|---|---|
| OpenAI | Second-tier research lab, ~100 people, known for GPT-2 and safety framing | Same public position, but internally training GPT-3 on Kaplan paper's scale predictions | Scaled compute commitment dramatically; Kaplan paper published; training trajectory locked in |
| Google Brain | Perceived dominant player; holds transformer patent lineage; massive internal compute | Same public position | Turing-NLG showed a competitor racing on parameters; Brain's internal work not publicly visible |
| DeepMind | Most prestigious research org; AlphaFold/game-playing strength | Same | Protein folding strength not yet translatable to language model leadership |
| Microsoft | Strategic AI investor (OpenAI, $1B July 2019); Azure ML platform | Published Turing-NLG; revealed DeepSpeed infrastructure; positioned as serious third lab | Showed willingness to compete on parameters and infrastructure, not just invest |
| Hugging Face | Growing open-source library; Transformers gaining practitioner adoption | Solidifying as default practitioner toolkit | BERT variants proliferating through their platform; practitioner market accelerating |
| NVIDIA | GPU provider; V100 dominant training hardware | V100 still dominant; A100 not yet released | Beneficiary of scaling race regardless of winner; compute demand rising |
💰 Funding & Deal Pattern
Q1 2020 venture data for AI is complicated by two overlapping signals: the general market euphoria that prevailed through January and February (the S&P 500 peaked on February 19), followed by the collapse that began February 20 as COVID spread beyond China. Total VC deployment in AI for Q1 2020 ran at the elevated pace set in 2019, with OECD data showing AI commanded approximately 21% of global venture investment in 2020 overall — roughly $75 billion for the full year.
The most significant AI-adjacent capital event of the quarter was Schrödinger's IPO on February 6, 2020, raising $232 million at a $1.1 billion market cap with a 68% first-day pop. Schrödinger is primarily a computational chemistry company (physics-based molecular simulation) that had been incorporating machine learning into its platform.
The COVID-related market crash beginning late February created a bifurcated environment: late-stage deals that had been in process closed under pressure (some at reduced valuations or with extended timelines), while early-stage AI investments continued at pre-pandemic pacing. The market had not yet processed that software and AI companies would recover faster and more completely than any other sector.
Capital concentration in this period heavily favored infrastructure (cloud GPU access, MLOps tooling) and application-layer AI (enterprise search, NLP APIs, computer vision vendors). Foundation model training as a standalone investment thesis did not exist yet — OpenAI's frontier work was funded through Microsoft's strategic relationship, not through traditional venture.
Round sizes for AI application-layer companies were compressing as the COVID shock reduced near-term revenue visibility. Companies that had planned Q2 raises pulled forward or delayed.
🔍 The Counter-Narrative
The consensus: Google Brain and DeepMind were the AI leaders because they published the most impressive research. The reality: Research publication and product deployment are different optimization targets. The organization doing less impressive science but more aggressive deployment -- OpenAI, via its API strategy -- was building the entity that would define commercial AI. The Kaplan paper was not a research contribution; it was an engineering specification for a decade of development.
The consensus: COVID was not yet relevant to AI strategy in Q1 2020. The reality: The pandemic would force remote work, accelerate cloud adoption by 3-5 years, and create acute need for the kind of information retrieval and synthesis that LLMs would provide. Q1 2020 was the last quarter AI strategy could be planned without the pandemic variable. Almost no AI company's planning documents mentioned COVID.
📐 Builder's Benchmark
Compute costs (Q1 2020 baseline):
- Training a BERT-base model (~110M parameters) from scratch: approximately $7,000–$15,000 on cloud V100s
- Fine-tuning BERT-base on a downstream task with a few thousand labeled examples: $50–$200 on a single V100 instance
- Training a GPT-2-scale model (1.5B parameters) from scratch: estimated $40,000–$80,000 on V100s
- Training at Turing-NLG scale (17B parameters): approximately $1–3 million on V100s
- Training at GPT-3 scale (175B parameters, extrapolated): $4–12 million, not feasible for most organizations
Performance benchmarks (state of the art at quarter close):
- GLUE benchmark: 90.9 (human baseline ~87) — BERT-large and fine-tuned variants
- SuperGLUE benchmark: ~75–80 range for top models; human baseline ~89 (gap still significant)
- SQuAD v2 F1: ~90 for best BERT variants
- Machine translation (WMT): approaching human parity on high-resource language pairs
Adoption curves:
- Hugging Face Transformers library: 5,000+ GitHub stars added in Q1; becoming default NLP toolkit
- PyTorch adoption in research: estimated 75%+ of new ML papers using PyTorch over TensorFlow by Q1 2020
- Cloud ML service adoption: growing 40–60% year-over-year across AWS SageMaker, Google AI Platform, Azure ML
Time-to-ship metrics:
- BERT fine-tuning to production-ready NLP feature: 2–4 weeks for an experienced ML engineer
- Custom computer vision model from labeled data to API endpoint: 3–8 weeks
- Standalone AI product from idea to first external users: 6–18 months (infrastructure complexity still high)
Open-source vs. closed gap:
- Application layer: open-source tools (Hugging Face, PyTorch) nearly at parity with commercial offerings for BERT-class tasks
- Frontier layer: closed — only organizations with direct access to frontier models (OpenAI, Google, Microsoft internal) can access capabilities beyond BERT scale
- The gap would widen dramatically when GPT-3 was released in May 2020; Q1 is the last quarter in which the open/closed capability gap was manageable
👀 What to Watch
OpenAI model announcement (expected Q2 2020): GPT-3 development is running on the Kaplan paper's scale predictions. Watch for either a paper, a product announcement, or API access signals. The benchmark to watch is whether it crosses the 100B parameter threshold that no public model has reached. Timeframe: April–June 2020.
COVID-driven cloud adoption acceleration: Track AWS, GCP, and Azure Q1 earnings calls (April 2020) for signals that enterprise cloud migration timelines are compressing. The AI workload question is whether forced remote work converts into permanent ML infrastructure migration. The AI-specific signal: look for AI-as-a-service revenue line items in earnings commentary.
DeepMind AlphaFold 2 announcement: CASP14 protein structure prediction competition is scheduled for late 2020. DeepMind's trajectory from AlphaFold1 at CASP13 strongly suggests a major improvement is in development. A dominant showing at CASP14 would crystallize DeepMind's lead in scientific AI while OpenAI leads commercial AI — the two-track reality of the field.
Hugging Face's fundraising and ecosystem expansion: The company is growing rapidly as the practitioner ecosystem's default toolkit. Watch for a Series A/B announcement that prices the practitioner layer of the AI market. This would establish a market reference for open-source ML tooling and could attract competition from cloud providers who may prefer to control this layer.
US regulatory response to COVID tech: Federal agencies will face pressure to accelerate or bypass normal procurement timelines for AI-driven public health tools (contact tracing, epidemiological modeling, diagnostic support). The regulatory and procurement norms established in this emergency will set precedents for AI deployment in regulated contexts.
📎 Sources
Key references for this quarter. Links provided where available; historical entries may reference publications by title and date.
| Source | Reference | Link |
|---|---|---|
| Kaplan et al. | "Scaling Laws for Neural Language Models" (arXiv, January 2020) | https://arxiv.org/abs/2001.08361 |
| Devlin et al. | "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2018) | https://arxiv.org/abs/1810.04805 |
| Microsoft Research | Turing-NLG 17B parameter model announcement (February 2020) | https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/ |
| Rajpurkar et al. | SQuAD 2.0 benchmark and leaderboard | https://rajpurkar.github.io/SQuAD-explorer/ |
| Wang et al. | SuperGLUE benchmark | https://super.gluebenchmark.com/ |
| Richard Sutton | "The Bitter Lesson" (March 2019) | http://www.incompleteideas.net/IncIdeas/BitterLesson.html |
| Hugging Face | Transformers library (GitHub) | https://github.com/huggingface/transformers |
| Microsoft-OpenAI | $1B strategic investment announcement (July 2019) | https://openai.com/index/microsoft-invests-in-and-partners-with-openai/ |
| NVIDIA | V100 GPU specifications and DGX systems | https://www.nvidia.com/en-us/data-center/v100/ |
| Schrödinger | IPO filing (February 6, 2020, NYSE: SDGR) | https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&company=schrodinger&CIK=&type=S-1&dateb=&owner=include&count=40&search_text=&action=getcompany |
| Liu et al. | "RoBERTa: A Robustly Optimized BERT Pretraining Approach" (2019) | https://arxiv.org/abs/1907.11692 |
| Lan et al. | "ALBERT: A Lite BERT for Self-supervised Learning" (2019) | https://arxiv.org/abs/1909.11942 |
| Sanh et al. | "DistilBERT, a distilled version of BERT" (2019) | https://arxiv.org/abs/1910.01108 |
| OECD | AI venture capital investment data (2020) | https://oecd.ai/en/data |
| Rajbhandari et al. | "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models" (2020) | https://arxiv.org/abs/1910.02054 |