Chapter 2

8 min read

Getting Started with DuckDB

INFOThis is a summary of a book chapter from DuckDB in Action, published by Manning. Download the complete book for free to read the complete chapter.
'DuckDB In Action' book cover

DuckDB has risen to prominence in the early 2020s as a powerful, embedded OLAP database. While the database can be used from within almost any programming environment, the quickest way to get familiar with its capabilities is through the Command-Line Interface (CLI). This chapter focuses on installing the tool, mastering its built-in commands, and building efficient data pipelines without leaving the terminal.

2.1 Supported Environments

DuckDB is designed to be highly portable and dependency-free. It is available for a wide range of programming languages and operating systems, supporting both Intel/AMD and ARM architectures.

DuckDB Language & Platform Support

CategorySupported Options
Operating SystemsLinux, Windows, macOS (Intel & ARM)
LanguagesPython, R, Java, JavaScript, Go, Rust, Node.js, Julia, C/C++, Swift
Database InterfacesODBC, JDBC, WASM

While DuckDB is deeply integrated into these languages, this chapter focuses exclusively on the Command Line Interface (CLI), which allows users to execute SQL interactively or as part of shell scripts.

2.2 Installing the DuckDB CLI

The DuckDB CLI is an "embedded database" in the truest sense. There is no server to install, no background daemon to configure, and no client-server connection to manage. The installation is a "copy-to" installation—it consists of a single binary executable named duckdb. You can find the full list of packages on the installation page.

DuckDB CLI vs Other Database CLIs

FeatureDuckDB CLIsqlite3psql (PostgreSQL)mysql CLI
Server RequiredNoNoYesYes
InstallationSingle binarySingle binaryRequires serverRequires server
Query Remote FilesYes (HTTP, S3, GCS)NoNoNo
In-Process AnalyticsYesLimitedNoNo
Direct Parquet/CSVYesNoNoNo
Memory ModeDefaultOptionalN/AN/A

2.2.1 macOS

For macOS users, the official recommendation is to use the Homebrew package manager, which handles updates automatically.

Copy code

brew install duckdb

2.2.2 Linux and Windows

For Linux and Windows, the CLI is distributed as a zip file via GitHub Releases. The process involves:

  1. Visiting the DuckDB GitHub releases page.
  2. Downloading the package appropriate for your architecture (e.g., duckdb_cli-linux-amd64.zip).
  3. Unzipping the file.
  4. Running the binary directly.

Example for Linux (AMD64):

Copy code

wget https://github.com/duckdb/duckdb/releases/download/v1.4.3/duckdb_cli-linux-amd64.zip unzip duckdb_cli-linux-amd64.zip ./duckdb -version

2.3 Using the DuckDB CLI

Launching the DuckDB CLI is incredibly simple, requiring just the execution of the duckdb command. By default, the database operates in transient mode, storing all data in memory, which is lost upon exiting the CLI using .quit or .exit.

DuckDB CLI interface running SQL queries on a terminal

For persistent storage, you can launch the tool with a filename (e.g., duckdb my_data.db) or use the .open command within the interface.

2.3.1 SQL statements

The CLI efficiently handles standard SQL statements. To execute a command, you must end it with a semicolon (;) and a newline. The interface supports multi-line input, allowing you to paste complex queries comfortably.

2.3.2 Dot commands

In addition to SQL, the CLI supports "Dot Commands"—special internal instructions identified by a leading period (.). These do not require a semicolon.

Essential DuckDB Dot Commands

CommandDescription
.helpDisplay a list of all available commands
.open FILENAMEClose current database and open a new file
.read FILENAMERead and execute SQL from an external file
.tablesList all available tables and views
.timer on/offToggle display of query execution time
.mode MODEChange output format (duckbox, line, json, csv)
.maxrows NSet maximum rows to display (duckbox format)
.excelOpen next query result in a spreadsheet
.quit or .exitExit the CLI (also Ctrl+D)

Pro Tip: The .duckdbrc Config File You can configure the CLI to run specific commands every time it launches by creating a .duckdbrc file in your home directory. This is perfect for setting a custom prompt or enabling timers by default.

2.3.3 CLI arguments

The CLI is also a powerful tool for automation. You can pass flags to control behavior without entering interactive mode.

DuckDB CLI Arguments Reference

ArgumentDescription
-readonlyOpen the database in read-only mode
-jsonSet output mode to JSON
-lineSet output mode to line-by-line
-unsignedAllow loading of unsigned extensions
-s COMMAND or -c COMMANDRun a command and exit immediately

For example, to output a query directly to JSON:

Copy code

duckdb --json -c 'SELECT * FROM my_table'

2.4 DuckDB's Extension System

To keep the core binary lightweight, DuckDB utilizes a modular extension system. Extensions are packages that add functionality—such as support for specific file formats (Parquet, Excel, JSON), geospatial data, or full-text search—only when you need them.

ExtensionPurposePre-installed
parquetRead/write Parquet columnar filesYes
jsonJSON file support and functionsYes
icuInternational character handlingYes
ftsFull-text search capabilitiesYes
httpfsQuery files from HTTP/S3/GCSNo (auto-loads)
postgres_scannerQuery PostgreSQL databases directlyNo
sqlite_scannerQuery SQLite databases directlyNo
spatialGeospatial data types and functionsNo
excelRead/write Excel filesNo
motherduckConnect to MotherDuck cloud serviceNo

You can inspect the current state of your extensions using the duckdb_extensions() function:

Copy code

SELECT extension_name, loaded, installed FROM duckdb_extensions();

Installing and Loading

The lifecycle of an extension involves two steps:

  1. INSTALL extension_name; – Downloads the extension to your local environment.
  2. LOAD extension_name; – Activates the extension for the current session.

Note: Since version 0.8, DuckDB often autoloads known extensions (like httpfs or parquet) if it detects they are needed by your query.

2.5 Analyzing a CSV File with the DuckDB CLI

DuckDB excels at "Data Engineering in a Box." It can query remote files directly from HTTP servers, S3, Google Cloud Storage, or HDFS without a manual download step. We can demonstrate this using a dataset containing the total population figures for several countries (https://mng.bz/KZKZ).

Common Use Cases for DuckDB CLI

Use CaseDescriptionExample
Data ExplorationQuickly analyze CSV/Parquet files without setupSELECT * FROM 'data.csv' LIMIT 10;
ETL PipelinesTransform data between formats in shell scriptsduckdb -s "COPY (...) TO 'out.parquet'"
File Format ConversionConvert CSV to Parquet or vice versaCOPY (FROM 'input.csv') TO 'output.parquet'
Remote Data AnalysisQuery files directly from S3/HTTPFROM 's3://bucket/data.parquet'
Ad-hoc ReportingGenerate quick reports to CSV/JSONduckdb -csv -s "SELECT ..." > report.csv

Direct Remote Querying

Using the httpfs extension, you can query a remote CSV file simply by providing its URL. If the URL ends in a known extension like .csv, DuckDB infers the format automatically. For URLs without extensions (like shortened links), use read_csv_auto:

Copy code

SELECT count(*) FROM read_csv_auto('https://bit.ly/3KoiZR0');

2.5.1 Result Modes

When exploring datasets with many columns, the default table view (duckbox) might truncate data. DuckDB provides multiple output modes to suit different use cases.

DuckDB Output Modes Comparison

Mode TypeModesBest For
Table-basedduckbox, box, ascii, table, list, column, csvFew columns, structured display
Line-basedline, json, jsonlinesMany columns, initial data exploration
Specialhtml, insert, trash (no output)Export, debugging

Switch modes using .mode:

Copy code

.mode line SELECT * FROM read_csv_auto('...') LIMIT 1;

The line mode displays each column on its own row—ideal for exploring datasets with 10+ columns that would otherwise be truncated.

Real-World Example: Western Europe Population Data

The book demonstrates querying a countries dataset and filtering for Western Europe. Here's the actual output from the example:

Sample Output: Western Europe Demographics

CountryPopulationBirthrateDeathrate
Andorra71,2018.716.25
Austria8,192,8808.749.76
Belgium10,379,06710.3810.27
Denmark5,450,66111.1310.36
Faroe Islands47,24614.058.70

This data was extracted with a single command—no tables created, no data imported:

Copy code

duckdb -csv \ -s "SELECT Country, Population, Birthrate, Deathrate FROM read_csv_auto('https://bit.ly/3KoiZR0') WHERE trim(region) = 'WESTERN EUROPE'" \ > western_europe.csv

Building Data Pipelines

The CLI can function as a robust step in a data pipeline. You can read data from a remote source, apply SQL transformations, and write the result to a local file in a single command.

Writing to CSV:

Copy code

duckdb -csv -s "SELECT * FROM remote_file WHERE region='Europe'" > western_europe.csv

Writing to Parquet: DuckDB can also export directly to the compressed Parquet format using the COPY ... TO syntax:

Copy code

duckdb -s "COPY (SELECT * FROM remote_file) TO 'output.parquet' (FORMAT PARQUET)"

2.6 Summary

  • Availability: DuckDB is available as a library for Python, R, Java, C++, and many others, but the CLI is a standalone, dependency-free entry point.
  • CLI Control: The CLI uses dot commands (.mode, .timer, .read) to control the environment and output formats.
  • Display Modes: You can visualize data in various formats, including duckbox (tables) and line (row-oriented).
  • Remote Data: The httpfs extension allows direct SQL querying of files hosted on HTTP servers or S3.
  • Pipelines: The CLI can act as a processing engine, querying external datasets and writing results to standard output or files (CSV, Parquet) without creating persistent tables.

Continue Reading: Chapter 1: Why DuckDB | Chapter 3: Executing SQL Queries

Related: Learn how MotherDuck extends DuckDB to the cloud with serverless analytics.

'DuckDB In Action' book cover

FAQS

How do I install the DuckDB CLI?

DuckDB CLI installation is simple—it's a single binary with no dependencies. On macOS, use Homebrew: brew install duckdb. On Linux/Windows, download the appropriate zip file from the GitHub releases page, unzip it, and run the duckdb executable directly. No installers or libraries are required.

What are DuckDB dot commands?

Dot commands are special CLI-only instructions that begin with a period (.) and don't require a semicolon. They control the CLI environment rather than executing SQL. Key commands include .help (list commands), .tables (show tables), .mode (change output format), .timer on/off (show query timing), .read (execute SQL from a file), and .open (switch database files). Type .help for the full list.

How do I query a remote CSV file with DuckDB?

DuckDB can query remote files directly using the httpfs extension. Simply use the URL in your query: SELECT * FROM 'https://example.com/data.csv';. If the URL doesn't end in .csv, use read_csv_auto(): SELECT * FROM read_csv_auto('https://bit.ly/shortlink');. The httpfs extension auto-loads when needed (DuckDB 0.8+) and supports HTTP, HTTPS, S3, and Google Cloud Storage.

What are DuckDB extensions and how do I use them?

Extensions add optional functionality to DuckDB while keeping the core lightweight. Some extensions (like parquet, json, icu) come pre-installed. Others (like httpfs, spatial, postgres_scanner) need to be installed with INSTALL extension_name; and loaded with LOAD extension_name;. Since DuckDB 0.8, many extensions auto-load when needed. Check installed extensions with SELECT * FROM duckdb_extensions();.

How do I change the output format in DuckDB CLI?

Use the .mode command to switch output formats. Table-based modes (good for few columns): duckbox (default), box, ascii, table, csv, list. Line-based modes (good for many columns): line, json, jsonlines. For example, .mode line displays each column on its own row—ideal for exploring wide datasets. You can also use CLI flags like duckdb --json for non-interactive use.

How do I use DuckDB in shell scripts and data pipelines?

DuckDB CLI works seamlessly in pipelines. Use the -s or -c flag to run a command and exit: duckdb -csv -s "SELECT * FROM data.csv WHERE region='Europe'" > output.csv. For Parquet output, use COPY: duckdb -s "COPY (SELECT * FROM data) TO 'output.parquet' (FORMAT PARQUET)". DuckDB reads from stdin and writes to stdout, making it composable with other Unix tools.

What is the .duckdbrc config file?

The .duckdbrc file (located at $HOME/.duckdbrc) runs automatically when DuckDB CLI starts. Use it to set default configurations like custom prompts, enable timers, or run initialization SQL. Example content:

.timer on
.mode duckbox
.prompt 'DuckDB> '

This saves you from repeatedly typing the same setup commands each session.

How do I export query results to Parquet format?

Use the COPY ... TO syntax with FORMAT PARQUET: COPY (SELECT * FROM my_table WHERE condition) TO 'output.parquet' (FORMAT PARQUET);. This works from the CLI or scripts. For CSV export, you can pipe output directly: duckdb -csv -s "SELECT * FROM data" > output.csv. Parquet files are compressed and columnar, making them ideal for analytical workloads.

Why does DuckDB say 'Table does not exist' when querying a URL?

This error occurs when DuckDB can't determine the file format from the URL. If your URL doesn't end in a recognized extension (like .csv or .parquet), DuckDB treats it as a table name. Solution: Use the explicit reader function: SELECT * FROM read_csv_auto('https://bit.ly/shortlink'); instead of SELECT * FROM 'https://bit.ly/shortlink';. This tells DuckDB to parse the URL as a CSV file regardless of its extension.

How does DuckDB CLI compare to sqlite3 and psql?

DuckDB CLI is unique among database CLIs: (1) No server required—unlike psql or mysql, DuckDB runs entirely in-process; (2) Direct file querying—query CSV, Parquet, and JSON files without importing them first; (3) Remote file support—query files directly from HTTP, S3, or GCS URLs; (4) Analytical focus—optimized for OLAP workloads with columnar storage and vectorized execution. It's closest to sqlite3 in simplicity but far more powerful for analytics.

What are common use cases for the DuckDB CLI?

The DuckDB CLI excels at: (1) Data exploration—quickly analyze CSV/Parquet files without setup; (2) ETL pipelines—transform data between formats in shell scripts; (3) File format conversion—convert CSV to Parquet with a single command; (4) Remote data analysis—query files on S3 or HTTP without downloading; (5) Ad-hoc reporting—generate quick reports to CSV or JSON. It's particularly valuable when you need SQL power without database infrastructure.

FAQs Lottie