MotherDuck Research

Advancing the future of data + AI.

Results on Results: Building New Results from Cached Partial Results
Results on Results: Building New Results from Cached Partial Results
CachingData LineageDuckDBMotherDuck

Results on Results: Building New Results from Cached Partial Results

An intelligent recovery framework for real-time SQL previews. When composing partial cached results produces too few rows, a Data Lineage heuristic selects the cheapest upstream dependency to re-fetch — enabling fluid exploratory analysis in MotherDuck's hybrid architecture.

Query-Log-Informed Schema Descriptions and their Impact on Text-to-SQL
Query-Log-Informed Schema Descriptions and their Impact on Text-to-SQL
DuckDBMotherDuckSchema DocumentationText-to-SQL

Query-Log-Informed Schema Descriptions and their Impact on Text-to-SQL

Automatically generating schema documentation from historical query logs to improve LLM-powered Text-to-SQL. Tested on both the BIRD benchmark and MotherDuck’s production data warehouse, query pattern descriptions boost SQL generation accuracy by up to 16% on real-world data.

Towards Efficient Data Wrangling with LLMs using Code Generation
Towards Efficient Data Wrangling with LLMs using Code Generation
Code GenerationData TransformationData WranglingEntity MatchingLLMMotherDuck

Towards Efficient Data Wrangling with LLMs using Code Generation

Instead of applying LLMs to every row, generate code once and run it on millions of rows. Up to 37-point F1 improvement on data transformations at a fraction of the cost.