AI & Tech Brief ⚡
The week's signal was AI agents entering regulated, domain-specific work — not as demos, but as tools that shipped lab-credible outputs, survived a one-month build constraint, and reached alpha with a plugin architecture. DeepMind published five peer-institution case studies of Co-Scientist generating validated hypotheses across aging, liver disease, and infectious disease; AWS documented a two-developer team building an ADME-Tox prediction app in 60 hours using Kiro; Simon Willison shipped Datasette Agent with ten ecosystem releases in one week; and Google I/O 2026 made Gemini 3.5 Flash generally available while launching Antigravity 2.0 as the agent-first development platform.
📌 Navigate
📊 Exec Summary
The week's signal was AI agents entering regulated, domain-specific work — not as demos, but as tools that shipped lab-credible outputs, survived a one-month build constraint, and reached alpha with a plugin architecture. DeepMind published five peer-institution case studies of Co-Scientist generating validated hypotheses across aging, liver disease, and infectious disease; AWS documented a two-developer team building an ADME-Tox prediction app in 60 hours using Kiro; Simon Willison shipped Datasette Agent with ten ecosystem releases in one week; and Google I/O 2026 made Gemini 3.5 Flash generally available while launching Antigravity 2.0 as the agent-first development platform.
Four things moved in AI/tech this week:
DeepMind Co-Scientist ships five case studies with lab-validated results
peer institutions (Calico, Stanford, Cambridge) used Co-Scientist to propose >20 novel genetic factors for cellular aging reversal, with multiple factors validated in the lab and post-screen analysis collapsed from six months to days.
AWS Kiro builds an ADME-Tox prediction app in 60 hours with 3 people
spec-driven agentic IDE plus Bedrock AgentCore produced a working drug-discovery tool that returns head-to-head compound comparisons in under one minute, with Amazon Nova Act now HIPAA eligible.
Simon Willison launches Datasette Agent with 10 alpha releases in one week
natural-language-to-SQL agent with a plugin architecture, local model support, and a live demo running on Gemini 3.1 Flash-Lite — a low-ops structured-data query layer that skips custom RAG.
Google I/O 2026 ships Gemini 3.5 Flash GA and Antigravity 2.0
cheapest capable model in its speed tier goes generally available, persistent background agents via Gemini Spark, and Antigravity 2.0 positions Google as a direct competitor in the agentic development platform category at 900M MAU and a $200/month Ultra subscription.
The pattern: AI agents as lab instruments, agentic IDEs as regulated-domain builders, natural-language SQL as a data primitive, and platform-scale pricing as adoption leverage.
1. DeepMind Co-Scientist ships five case studies with lab-validated results
TL;DR: Google DeepMind published five simultaneous case studies showing Co-Scientist deployed at Calico Life Sciences, Stanford, Cambridge, and other peer institutions — the first cohort-scale evidence of an AI research agent producing lab-validated hypotheses across multiple disease areas in a single release window.
What happened
- Five independent research groups published results in the same week, spanning cellular aging (Abudayyeh/Gootenberg), aging leads (Calico Life Sciences), liver fibrosis drug repurposing (Stanford geneticist), liver disease mechanisms (Filippo Menolascina), and infectious disease genetic triggers (Clare Bryant, Cambridge).
- Co-Scientist scanned tens of thousands of papers and proposed >20 novel genetic factors targeting cellular senescence reversal in skin, hair, and muscle tissue.
- Multiple proposed factors were lab-validated — cells entered a younger state with improved overall function.
- Post-screen literature analysis — connecting experimental results to scattered scientific literature — collapsed from up to six months to a few days.
- Co-Scientist is now accessible via Gemini for Science at labs.google/science and integrates with Google Antigravity for deployment.
📊 Benchmarks (from DeepMind blog cluster)
| Metric | Value | Context |
|---|---|---|
| Literature scan depth | Tens of thousands of papers | Per primary case study — hypothesis generation input |
| Novel genetic factors proposed | >20 | Targeting cellular senescence reversal |
| Lab validation | Multiple factors validated | Cells driven into younger state with improved function |
| Post-screen analysis time | 6 months → a few days | Researcher estimate for literature-to-results interpretation |
| Disease areas covered | 5 | Aging, liver fibrosis, liver disease, infectious disease, cellular aging |
| Research institutions | Calico, Stanford, Cambridge, and others | Five independent groups, one coordinated release |
🔗 Primary source → Fast-tracking genetic leads to reverse cellular aging
Additional case studies: Opening new paths in aging research | Uncovering repurposed medicines to fight liver fibrosis | Accelerating discovery of liver disease mechanisms | Finding the molecular switches behind new infectious diseases
🔍 The non-obvious point
The AI/Tech story here is the platform architecture, not the biology. DeepMind is demonstrating that a multi-agent hypothesis-generation system can produce outputs credible enough for peer-institution labs to validate experimentally — and it shipped five independent proof points at once to make that case at cohort scale.
- The "AI research partner" framing is deliberate positioning. DeepMind is not claiming autonomous discovery — every case study emphasizes researcher collaboration and lab validation. This is the same force-multiplier framing as Deep Think (W20), now extended to biology. For AI builders, the design pattern is clear: generate-then-validate workflows where the AI proposes and the domain expert gates.
- The absence of failure-rate data is the most important gap. No rejected-hypothesis statistics, no compute cost per case study, no model version details. Builders evaluating whether to integrate Co-Scientist into their own research workflows have no way to estimate the signal-to-noise ratio or the cost of running it. The >20 factors proposed and "multiple validated" framing implies some fraction failed — the size of that fraction determines whether this is a 10x productivity tool or a 2x one.
- The coordinated five-institution release is a go-to-market move, not a scientific event. Publishing five case studies simultaneously across aging, liver disease, and infectious disease is how you demonstrate breadth of applicability to pharma and biotech procurement — it reads as a platform pitch. Builders should treat this as Co-Scientist's product launch moment, not a peer-reviewed evidence milestone.
👀 What to watch
- First peer-reviewed publication from any of the five case studies — the journal acceptance converts the blog-level evidence into citable scientific validation.
- Co-Scientist pricing and API availability via Gemini for Science — the first public access terms determine whether this is a tool builders can integrate or a Google-internal capability.
- Failure-rate or rejected-hypothesis disclosure from any participating lab — the first quantitative negative data sets the real efficiency benchmark.
2. AWS Kiro builds an ADME-Tox app in 60 hours with 3 people
TL;DR: AWS published a detailed case study of Kiro — its agentic IDE — used by two developers and one scientist to build a working ADME-Tox drug discovery prediction app in ~60 combined hours across four weeks, with no team expansion. The system returns head-to-head compound comparisons in under one minute using Claude Sonnet 4.6 via Bedrock AgentCore, integrating 12+ open-source chemistry tools.
What happened
- Team composition: one Principal Solutions Architect (bioinformatics background), one full-stack developer with no prior ADME experience, and one career scientist — averaging 5 hours/week each.
- Build duration: ~4 weeks, ~60 combined hours — the constraint was deliberate: "no expanding the team when it got hard."
- ADME-Tox query latency: under 1 minute in testing (74 seconds end-to-end for a full COX-2 inhibitor comparison: 24,916 tokens in, 4,204 tokens out); 1-2 minutes in live demos at the AWS Life Sciences Symposium (April 2026).
- Underlying model: Claude Sonnet 4.6 via Amazon Bedrock AgentCore, us-west-2 region.
- The system integrates 12+ open-source data sources including ADMET_AI, RDKit, ChemProp, DeepChem, NCATS ADME Portal, ToxCast/Tox21, ChEMBL API, PubChem API, and DrugBank API.
- Authors tested on COX-2 inhibitors (system flagged Vioxx market removal from literature) and sulfonamide antibacterials (flagged nephrotoxicity risk).
- Separately, Amazon Nova Act is now HIPAA eligible — opening browser-automation agents to regulated healthcare workflows.
- Amazon Bio Discovery launched April 2026 — connects 40+ AI biology models via no-code interface for antibody therapeutic design.
- Authors report a sub-3-hour rebuild in a hackathon setting with prior knowledge of the architecture.
📊 Benchmarks (from AWS Industries blog)
Figure 1: Multi-agentic framework — Kiro powers and AgentCore primitives
| Metric | Value | Context |
|---|---|---|
| Team size | 2 developers + 1 scientist | No team expansion; 5 hrs/week each |
| Build duration | ~60 combined hours (4 weeks) | Parallel with other responsibilities |
| Query latency (tested) | Under 1 minute (74s for full comparison) | 24,916 tokens in / 4,204 out via Claude Sonnet 4.6 |
| Query latency (live demo) | 1-2 minutes | AWS Life Sciences Symposium, April 2026 |
| Open-source tools integrated | 12+ | ADMET_AI, RDKit, ChemProp, DeepChem, NCATS, ChEMBL, PubChem, DrugBank |
| Hackathon rebuild time | Under 3 hours | With prior architecture knowledge |
| Industry: preclinical ADME-Tox attrition | ~90% of candidates eliminated | Sun et al baseline |
| Industry: Phase II-III safety failures | 30-45% | Due to poor ADME-Tox profiles |
Head-to-head toxicology comparison using ChEMBL, RDKit, PubChem via Claude Sonnet 4.6 / Bedrock AgentCore
🔗 Primary source → From code to chemistry: using Kiro to tackle ADME-Tox, a key drug discovery challenge
🔍 The non-obvious point
The AI/Tech story is Kiro as an agentic IDE that collapses the build cost of regulated-domain tooling. The ADME-Tox application is the proof point, but the platform capability — spec-driven development where autonomous agents translate scientific requirements to working code — is the durable signal.
- The "no prior ADME experience" detail is the load-bearing claim. If a full-stack developer with no domain expertise can ship a working drug discovery tool in 60 hours because Kiro's autonomous agent handles the domain-to-code translation, that changes the staffing model for every regulated-domain software project. The constraint is also the caveat: all test cases used known approved drugs with known outcomes — no prospective validation, no comparison against established ADME-Tox tools like Schrodinger's predictor.
- Kiro's spec-driven workflow is architecturally distinct from other agentic coding tools. The blog explicitly states: "Kiro's spec-driven development and use of autonomous agents allowed us to move from scientific requirements to working agents without losing the thread between what the science needed and what the code did." This is a requirements-first pattern, not a prompt-first pattern — the spec is the artifact that persists. Builders evaluating Kiro vs. Codex vs. Claude Code should attend to this workflow difference.
- The Nova Act HIPAA eligibility is a quiet but consequential infrastructure move. Browser-automation agents in regulated healthcare means AWS now has a compliance-cleared path for agentic workflows that touch patient-facing systems — a prerequisite for the regulated-industry AI stack that AWS is assembling across Bedrock, Kiro, HealthLake, and Bio Discovery.
👀 What to watch
- Kiro GA and standalone availability — the general availability date determines whether this is an AWS-internal capability showcase or a tool builders can adopt.
- First prospective validation of the ADME-Tox app against novel compounds — the first result on an unknown compound converts this from retrospective demonstration to production evidence.
- Nova Act HIPAA deployment case study — the first published healthcare workflow using browser-automation agents sets the compliance precedent.
3. Simon Willison launches Datasette Agent with 10 alpha releases
TL;DR: Simon Willison launched Datasette Agent (0.1a4) — an extensible AI assistant that lets users query any Datasette database in natural language, generates and executes SQL, and exposes a plugin architecture for visualization and tool extensions. Ten releases shipped across the Datasette ecosystem in a single week, including the underlying Datasette 1.0a30 moving toward stable.
What happened
- Datasette Agent launched at 0.1a3/0.1a4 — natural language to SQL with plugin hooks for charts (Observable Plot), sprites, image generation, and cost accounting.
- Live demo running at agent.datasette.io against example databases including WRI global-power-plants and Willison's blog backup.
- Demo model: Gemini 3.1 Flash-Lite — chosen for low cost, speed, and reliable SQLite query generation.
- Local model support confirmed: a single
uvxcommand runs Datasette Agent against LM Studio with gemma-4-26b-a4b. - Plugin architecture ships at launch: datasette-agent-charts (Observable Plot visualizations), datasette-agent-sprites, datasette-agent-openai-imagegen, datasette-llm-accountant (token cost tracking).
- Both Claude Code and OpenAI Codex are being used to build new plugins — Willison notes they write plugins by referencing the datasette-agent repo.
- Datasette Agent is driving a major LLM 0.32 refactor that will add agent abstractions to the broader LLM Python library.
- Planned rollout to Datasette Cloud users.
- Ten ecosystem releases in one week: datasette 1.0a30, datasette-agent 0.1a3/0.1a4, datasette-fixtures 0.1a0, datasette-agent-sprites 0.1a0, datasette-agent-charts 0.1a1/0.1a2, datasette-llm 0.1a8, datasette-llm-accountant 0.1a4, llm-gemini 0.32/0.32a0.
📊 Benchmarks (from Willison's blog post)
| Metric | Value | Context |
|---|---|---|
| Ecosystem releases (one week) | 10 | Core platform + agent + plugins + model provider |
| Agent version | 0.1a4 | Alpha — extensible, plugin-ready |
| Datasette core version | 1.0a30 | Moving toward 1.0 stable |
| Demo model | Gemini 3.1 Flash-Lite | Low cost, fast, reliable for SQL generation |
| Local model tested | gemma-4-26b-a4b via LM Studio | Single uvx command to run locally |
| Plugins at launch | 4 | Charts, sprites, image generation, cost accounting |
🔗 Primary source → Datasette Agent
Companion announcement: Datasette Agent on datasette.io
🔍 The non-obvious point
Datasette Agent is not a chatbot bolted onto a database — it is a natural-language-to-SQL runtime with a plugin system, and the plugin system is the part that matters for builders.
- The plugin architecture is the durable primitive. The agent itself does one thing well (translate natural language to SQL and execute it), but the plugin hooks — charts, sprites, cost accounting, image generation — mean the community can extend the agent's output modalities without forking the core. For teams sitting on structured research data (assay results, clinical trial registries, compound libraries), this is a query layer that skips building a custom RAG pipeline.
- The local model support changes the compliance calculus. Running Datasette Agent against LM Studio with a local model means no data leaves the machine. For regulated-domain teams that cannot send structured data to a cloud LLM, this is a natural-language query interface that stays inside the perimeter. Willison's note that "open weight models released in the past six months are increasingly able to handle" reliable SQL generation is the enabling claim.
- The LLM library refactor driven by Datasette Agent is a second-order signal. Willison's LLM Python library is widely used for model evaluation and prompt scripting. Adding agent abstractions to LLM 0.32 means the agent pattern — tool-calling, multi-step execution, cost tracking — gets standardized in a library that many builders already depend on.
👀 What to watch
- Datasette Cloud rollout with Agent included — the hosted availability date determines whether this is a developer tool or an end-user product.
- SQL accuracy benchmarks across model backends — the first published comparison (Flash-Lite vs. local gemma vs. Claude) sets expectations for production reliability.
- First regulated-domain deployment (clinical data, assay databases) — the first builder using Datasette Agent on real research data converts the demo into a workflow.
4. Google I/O 2026 ships Gemini 3.5 Flash GA and Antigravity 2.0
TL;DR: Google I/O 2026 shipped Gemini 3.5 Flash to general availability, launched Gemini Omni (any-input-to-any-output multimodal), introduced Gemini Spark (persistent background agents), and debuted Antigravity 2.0 as an agent-first development platform. Gemini Ultra subscription dropped from $250 to $200/month; monthly active users reached 900M.
What happened
Gemini 3.5 Flash GA
now generally available; multiple independent commentators (Simon Willison, Zvi Mowshowitz) describe it as the best model at its speed tier.
Gemini Omni
any-input-to-any-output multimodal model; no context window or throughput specs disclosed at launch.
Gemini Spark
persistent background agents running 24/7, positioned as the first product built on the 3.5 model family and Antigravity. Google's framing: "personalized AI agents you can set up to work in the background, 24/7, to find what you need at exactly the right moment."
Antigravity 2.0
agent-first development platform powering Search, apps, and custom experiences. Replaces the prior IDE framing with an explicit "agent-native platform" positioning.
Gemini Ultra subscription dropped from $250 to $200/month
a 20% price cut aimed at adoption.
- 900M monthly active users on Gemini — disclosed at I/O.
- Information agents in Search rolling out this summer (Google AI Pro and Ultra subscribers first), with generative UI capabilities (dynamic layouts and interactive visuals) free for all users.
- Eighth-generation TPU announced for infrastructure scaling.
📊 Benchmarks (from Google I/O keynote)
| Metric | Value | Context |
|---|---|---|
| Gemini MAU | 900M | Disclosed at I/O 2026 |
| Ultra subscription | $200/month | Down from $250 — 20% reduction |
| Gemini 3.5 Flash | GA | Best speed-tier model per Willison, Zvi |
| Gemini Omni | Any-input-to-any-output | No context window specs at launch |
| Gemini Spark | Persistent background agents | 24/7 agent product on 3.5 + Antigravity |
| Antigravity 2.0 | Agent-first dev platform | Replaces IDE framing |
| TPU generation | 8th gen | Infrastructure scaling |
🔗 Primary source → I/O 2026: Welcome to the agentic Gemini era
🔍 The non-obvious point
The model is the headline; the platform play is the strategic move. Antigravity 2.0 positions Google as a direct competitor to Replit, Vercel, and the emerging agentic IDE category — with 900M users as the distribution lever.
- Gemini 3.5 Flash GA changes the cost-capability tradeoff at the API layer. Simon Willison noted it is more expensive than Gemini 2.0 Flash but likely the best model at its speed tier — a real production option against GPT-4o mini. Zvi Mowshowitz separately reached the same conclusion. For builders choosing between frontier API providers for latency-sensitive workloads, Flash 3.5 is now the benchmark to beat. The absence: no API pricing disclosed in the primary — the cost comparison against GPT-4o mini requires waiting for the developer pricing page.
- Antigravity 2.0 is the strategic repositioning. The Latent Space breakdown notes the internal codename NanoBanana for Gemini Omni video and frames Antigravity 2.0 as replacing the IDE framing entirely. Google is not building a coding assistant; it is building an agent-native platform that powers Search, apps, and custom developer experiences. This puts it in direct competition with Replit Agent, AWS Kiro, and Anthropic's Claude Code at the platform layer, not the model layer.
- The $200/month price point and 900M MAU are adoption-first signals. The 20% Ultra price cut is competitive with OpenAI's ChatGPT Pro ($200/month) and below Anthropic's Max plan pricing. Google is competing on distribution scale and price accessibility, not pure capability differentiation — which means the model API pricing (when disclosed) will likely be aggressive.
👀 What to watch
- Gemini 3.5 Flash API pricing — the developer pricing disclosure is the decision point for builders evaluating Flash vs. GPT-4o mini vs. Claude Haiku for production workloads.
- Antigravity 2.0 developer SDK or GA availability — the first external builder access determines whether this competes with Kiro and Replit Agent or stays inside Google's product surface.
- Gemini Spark external availability and access controls — the persistent background agent product is the highest-stakes launch; safety and access gating details will shape enterprise adoption.
📊 The pattern
Four releases this week shared a structural characteristic: AI agents entering domain-specific work with enough specificity to be evaluated on output quality, not capability demos. DeepMind published cohort-scale lab evidence of Co-Scientist generating validated genetic hypotheses. AWS documented a constrained build where an agentic IDE and a three-person team shipped a regulated-domain prediction tool in 60 hours. Willison launched a natural-language-to-SQL agent with a plugin architecture designed for community extension and local-model deployment. Google made its cheapest capable model generally available and repositioned its entire platform as agent-native. The thread connecting them: agents are now evaluated by what they ship — validated hypotheses, working ADME-Tox apps, SQL queries against real databases, persistent background tasks — not by what they promise.
👀 Watchlist
Co-Scientist API pricing and external access
the first public access terms for Gemini for Science determine whether Co-Scientist is a tool builders can integrate or a Google-internal capability demonstrated through blog posts.
Kiro general availability
AWS has showcased the agentic IDE through case studies; the GA date is the builder decision point.
Gemini 3.5 Flash API pricing
the developer cost per token sets the real competitive benchmark against GPT-4o mini and Claude Haiku for latency-sensitive production workloads.
Datasette Agent SQL accuracy benchmarks
the first published comparison across model backends (Flash-Lite vs. local models vs. Claude) quantifies production reliability.
Antigravity 2.0 developer SDK
external builder access to Google's agent-native platform determines whether it competes with Kiro, Replit Agent, and Claude Code at the platform layer.
📎 Sources
Sources of truth
Click to verify or go deeper.
Commentary we read
| Author / outlet | Title | URL | Date |
|---|---|---|---|
| Simon Willison | Gemini 3.5 Flash | https://simonwillison.net/2026/May/19/gemini-35-flash/ | 2026-05-19 |
| Zvi Mowshowitz | Gemini 3.5 Flash looks good for how fast it is | https://thezvi.substack.com/p/gemini-35-flash-looks-good-for-how | 2026-05-19 |
| Latent Space (Swyx) | Google I/O 2026: Gemini 3.5 Flash, Omni, Spark, and Antigravity 2.0 | https://www.latent.space/p/ainews-google-io-2026-gemini-35-flash | 2026-05-20 |
| Datasette.io | Datasette Agent announcement | https://datasette.io/blog/2026/datasette-agent/ | 2026-05-21 |
| AWS Industries | Highlights from the 2026 AWS Life Sciences Symposium | https://aws.amazon.com/blogs/industries/highlights-from-the-2026-aws-life-sciences-symposium-research-and-drug-discovery/ | 2026-05-18 |
| AWS Industries | FHIR-powered Care Continuum on AWS HealthLake | https://aws.amazon.com/blogs/industries/fhir-powered-care-continuum-on-aws-healthlake/ | 2026-05-18 |