NewsDatabasesDeveloper Tools

DuckDB 2.0 Is Coming: What DuckCon #7 Revealed

DuckDB yellow duck with graduation cap surrounded by SQL windows and database symbols on blue background
DuckDB 2.0 is coming this fall with the Quack protocol going GA and DuckLake inlining

DuckDB 2.0 is landing this fall, and DuckCon #7 last week in Amsterdam confirmed the database is no longer just an embedded analytics toy. The DuckCon #7 “State of the Duck” keynote by co-creators Hannes Mühleisen and Mark Raasveldt mapped out a roadmap that moves DuckDB from single-process workhorse to production-grade, multi-user analytics server. If you’re using DuckDB in any serious capacity — or wondering whether you should — here’s what actually matters.

Quack Goes GA: DuckDB Gets a Client-Server Protocol

The most significant change coming in DuckDB 2.0 is the graduation of the Quack protocol from beta to production. Quack, announced in May, lets DuckDB instances talk to each other over HTTP — finally solving the single-writer, single-process constraint that has defined DuckDB since day one.

The design is smart. Quack is HTTP-native, which means it works with every piece of infrastructure your team already uses: load balancers, API gateways, firewalls, intrusion detection systems. There’s no custom TCP protocol to punch through, no special port to expose. Underneath, Quack serializes DuckDB’s internal vector blocks directly — no transcoding, no format conversion, lower overhead than most database wire protocols.

The browser story is particularly interesting. DuckDB-Wasm already lets you run SQL in the browser. With Quack, a browser-side DuckDB instance can connect directly to a remote DuckDB server — analytics without a backend API layer, no Node.js or Python middleware translating between formats.

One important caveat: Quack is still beta and breaking changes are expected before DuckDB 2.0 goes GA. If you’re experimenting, that’s fine. If you’re building a production system on the current Quack API, wait for 2.0.

Multi-User DuckDB Is Now Actually Viable

Quack was always missing one thing for production use: real authentication. The original release shipped with stub auth — a default secret token that was fine for local development and nothing else.

That changed in June. The community-built quack-oauth extension from DataZooDE adds a proper OAuth 2.1/OIDC layer on top of Quack. It supports real token validation — JWKS key caching, RFC7662 introspection, Google and GitHub token validation — plus multiple grant flows and claims-driven authorization that can gate ATTACH, SELECT, and COPY operations based on token claims.

Multi-user DuckDB deployments — where different teams query the same analytics database with different permission levels — are now viable without a proxy layer in front. DuckDB is becoming infrastructure.

The Small File Problem Is Solved

If you’ve tried to use DuckDB for streaming workloads, you’ve hit the small file problem. Insert data in small batches and you end up with thousands of tiny Parquet files. Querying them is slow; managing them is painful. It’s pushed DuckDB into batch-only territory for most data lake workflows.

DuckLake’s data inlining, shipped in v1.0 this April, fixes this at the architecture level. Small batches get stored directly in the SQL catalog metadata instead of as Parquet files. When you query, DuckLake transparently merges catalog-resident rows with on-disk Parquet — no application changes needed. Deletes and updates create tombstone records rather than rewriting Parquet, keeping write costs low. The result: continuous streaming into DuckDB data lakes is now practical.

DuckDB Is Becoming the Data Layer for AI Agents

DuckCon #7 featured a Spotify talk that illustrated where this is all heading. Spotify built a SQL layer over user listening history specifically for agentic access — AI agents can query a user’s personalized dataset via standard SQL without a custom API or ad hoc data fetching logic.

MotherDuck formalized this pattern with Flights, launched June 10. Flights lets AI agents — Claude, ChatGPT, Gemini — build, deploy, and schedule Python data ingestion pipelines through the Model Context Protocol. Agents don’t just query data; they own the ingestion pipeline.

The DuckDB angle is practical: it’s fast, embeddable, runs anywhere, speaks standard SQL, and handles datasets that break pandas. For agents that need structured, queryable data without spinning up a Postgres cluster, DuckDB is the obvious default. That trajectory was already visible in the community before DuckCon; the conference made it official.

What to Do Before DuckDB 2.0 Ships

If you’re already using DuckDB in production, three things to act on now:

  • Pin your DuckDB version. DuckDB 2.0 will include breaking changes in the extension API and Quack protocol. If you’re running automated upgrades, stop before 2.0 lands without testing first.
  • Evaluate DuckLake for streaming workloads. If you’ve been working around DuckDB’s small file limitation with batch jobs, DuckLake inlining changes the equation. Run a benchmark against your current setup.
  • Look at SQLFrame if you’re running PySpark. The migration is effectively two lines of code — swap your import. DuckDB runs locally, no cluster required, and the analytical query performance difference is significant.

DuckDB 2.0 is not a version bump. It’s a category change: from embedded tool to deployable analytics server. The core principles — simple setup, no external dependencies, SQL-first — stay intact. What changes is the ceiling on what you can build with it.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:News