TL;DR: Data agents can query your warehouse in plain English, but without context they return inconsistent answers. English-based context outperforms semantic models for accuracy and ease of maintenance. SQL snippets speed up specific query patterns. Start with a blank context file and build it up as you find gaps.
Why context matters for data agents
Data agents let you query a data warehouse in plain English, but the same question asked ten times can produce ten different answers. LLMs don't understand your business definitions, data semantics, or domain logic unless you spell them out. This session covers what good context looks like and how to manage it as your team grows.
Three formats compared: English, semantic models, and SQL
Using benchmark data from the Dabstep sample dataset, the session tests three context formats against each other. English context — plain definitions and schema explanations — gets full accuracy because LLMs handle natural language best. Semantic models like Cube, LookML, and Omni perform worst because LLMs weren't trained on those structured languages. SQL context, using question-answer pairs, matches English on accuracy and runs faster for queries that resemble the provided examples.
Write context in English, add SQL snippets for frequent or complex queries, and if you already have a semantic model, translate it to English rather than feeding it to an LLM raw.
Writing effective context
Good context adds information the LLM can't guess from column names. Don't restate the obvious. Describing "company_name" as "the name of the company" wastes tokens and adds nothing. Define business-specific terms and clarify ambiguous metrics instead. Negative prompting ("don't use average") tends to backfire on current models, making them use the wrong function more often. Frame instructions positively. Database views work well as SQL context because LLMs can both reference and introspect them through the information schema.
Scaling context across your organization
The session breaks adoption into three stages. Solo explorers can start with a blank context file and build it up as they find gaps. Teams need shared context through GitHub sync, MotherDuck's MCP server, or tools like Virgil, plus lightweight ownership over who updates what. Organizations supporting both explorers and pedestrians — non-technical users who expect reliable answers — need mandatory test suites, granular access control, and alerts when questions fall outside test coverage. For more on building analytics agents with MotherDuck, see the docs.



