TL;DR: DuckLake stores the catalog and metadata in a single SQL database instead of object storage files. That one change drops metadata query latency from seconds to milliseconds and lets you ingest data tens of times per second. The session covers the architecture, a live setup demo, and the specific settings that make DuckLake fast in production.
What makes DuckLake different
Most lakehouse formats follow the Iceberg pattern: a database-backed catalog talks to metadata files in object storage, which point to manifest lists, which point to manifests, which point to data files. That's four sequential round trips in object storage before you read a single byte of actual data. At 100ms per request, you're waiting seconds just to start a query.
DuckLake skips all of that. The catalog and metadata both live in a relational database — DuckDB, SQLite, Postgres, or MotherDuck. When a query runs, it hits the database once, gets back a precise list of files to read, and goes straight to the Parquet data. That database call takes single-digit milliseconds. Complex queries that used to take multiple seconds now finish in hundreds of milliseconds.
This also fixes the small files problem. Ingests write to the database first rather than creating a new Parquet file for every insert, so you can ingest thirty times per second without drowning in tiny files.
Getting started
Three commands get you running:
INSTALL ducklake— adds the extensionATTACH— connects a catalog database- Start using it with standard SQL
The setup demo walks through a live example on MotherDuck, including attach options for managed and bring-your-own-bucket storage.
Production tuning
The second half of the session covers six settings worth changing before you go to production:
- Parquet v2 — improved compression with broad ecosystem compatibility
- ZStandard compression — better than the default Snappy, still widely supported
- Row group size — target 8MB per column for cloud storage so DuckDB can read in efficient chunks
- Data inlining threshold — batches small inserts into the catalog DB before flushing to Parquet; controls the write frequency vs. file count tradeoff
- Partitioning — hundreds to low thousands of partitions is the sweet spot; millions create too many small files
- Clustering and sorting — aim for about ten row groups per file; sorting on the right columns can give a 10x read speedup
For a deeper look at the open lakehouse stack architecture, the session also covers where DuckLake fits relative to Iceberg and Delta Lake, and how ACID transactions across tables work in practice.



