Why REST and JDBC Are Killing Your Data Stack — Flight SQL to the Rescue

2025/06/13 - 8 min read

Why REST and JDBC Are Killing Your Data Stack — Flight SQL to the Rescue

2025/06/13 - 8 min read

Data pipelines today feel like an underground fight: you build them fast, but the real battle starts when you try to serve the results. Welcome to Flight Club.

The first rule of Flight Club? You do not talk to REST.

The second rule? You definitely do not talk to REST.

The third rule? If your pipeline goes limp, chokes on JSON, or taps out on throughput, the session is over.

DuckDB changed how we do local analytics — the lovechild of SQLite and a supercomputer, delivering screaming-fast OLAP without the servers, clusters, or life-ruining setup scripts.

But modern data teams don't just analyze. They integrate, connect, and serve. From BI dashboards to ML pipelines to that one stakeholder who still loves their pivot tables, the need to expose DuckDB cleanly over a network keeps surfacing.

Picture this: Your team has built a lightning-fast DuckDB analytics pipeline that crunches billions of records in seconds. But when it's time to serve those insights to your dashboards or ML models? You're forced to squeeze that beautiful columnar data through the rusty pipes of REST or JDBC. It's like putting a Ferrari engine in a horse-drawn carriage.

The Problem with REST and JDBC

The problem? REST is duct tape. JDBC is legacy glue. Both are leaky, brittle, and built for another era.

REST: Forces your columnar data into bloated JSON, then makes you parse it back. Up to 90% of your time? Spent on serialization, not computation.
JDBC: Still thinks in rows when the world has moved to columns. Like trying to stream Netflix through a dial-up modem.

That's where Apache Arrow Flight SQL comes in.

Not another framework to learn. Not a platform to buy into. A protocol — lean, typed, binary-native. Fire SQL queries and stream columnar data with zero-copy swagger.

It doesn't just work. It flies.

No more encoding rows into JSON just to decode them faster than you can say "technical debt." No more pretending analytics engines are web servers. Flight SQL treats data like it's 2025: fast, typed, and unapologetically direct.

Two open-source servers — Hatch and GizmoSQL — are already strapping rockets to DuckDB with Arrow Flight SQL. Different vibes, same mission: Give DuckDB wings. Let it serve, stream, and scale like the compute beast it is.

In this post, we'll break it down: Why Arrow + Flight SQL is stupidly fast (we're talking 20+ Gb/s per core), how Flight SQL powers real-time pipelines without breaking a sweat, what Hatch and GizmoSQL bring to the DuckDB party, and how local-first analytics just became a distributed superpower.

No REST. No bloat. Just protocol-native performance. Welcome to Flight Club.

Understanding Arrow Flight SQL

Arrow: A Data Format That Doesn't Suck

Apache Arrow is the Usain Bolt of data formats—columnar, in-memory, and built for speed. It's designed to shuttle structured data across tools and languages without breaking a sweat.

Column-first layout → SIMD-friendly (Single Instruction, Multiple Data), enabling parallel processing at the CPU level
Language-neutral → C++, Go, Python, Rust, Java, and probably Klingon soon
Shared format → Zero-copy data sharing between processes—point at data instead of copying it
Vector-ready → Perfect for batching, scanning, and ML inference

Arrow isn't just a format. It's a shared memory model that says, "Why copy data when you can just point at it?"

Flight: gRPC for Tables, No Bloat

Arrow Flight is the network protocol that makes Arrow feel like it's teleporting. Forget JSON blobs or binary spaghetti—Flight streams Arrow batches over gRPC like a data wizard slinging spells.

It's gRPC for tables, with:

Zero-copy Arrow IPC streaming → Data moves at ludicrous speed, no serialization tax
Schema-first descriptors → No guesswork, just precision
Built-in parallelism → Because waiting is for suckers
Cross-language clients → Pick your poison, it just works

Here's a real-world example:

Copy code
# Traditional REST/JDBC way:
# 1. Query database (1-2s)
# 2. Serialize to JSON/rows (0.5-1s)
# 3. Transfer over network (0.2-0.5s)
# 4. Deserialize back to usable format (0.5-1s)
# Total: 2.2-4.5s

# Flight SQL way:
# 1. Query database (1-2s)
# 2. Stream Arrow batches directly (0.1-0.2s)
# Total: 1.1-2.2s

No ORMs, JDBC or REST nonsense. Just fast, typed, structured streams that respect your time.

Flight SQL: SQL with Wings

Flight SQL takes Arrow Flight and slaps SQL semantics on it. Send a query, get an Arrow table back. No middleman, no drama.

SQL queries → Arrow tables, no detours
Standardized protobuf interfaces → Predictable, not a puzzle
Typed parameters, prepared statements, metadata reflection → It's like SQL grew up and got a job

This isn't your grandma's database driver. It's SQL for pipelines, built for machines, not GUIs.

Protocol	Median Round Trip	Payload Format	Peak Throughput
REST	75 ms	JSON (yawn)	1-2 Gb/s
JDBC	52 ms	Binary (meh)	5-10 Gb/s
Flight SQL	18 ms	Arrow IPC (wow)	20+ Gb/s

Flight SQL doesn't just win; it laps the competition while sipping coffee.

Meet the Flight Club Members

Two open-source projects are bringing Flight SQL to DuckDB, and they're as different as a duck and a goose. Both get the job done.

Hatch: The Purist's Choice

Hatch is Go-based, Arrow-native, and built for people who think "composable" is a personality trait. It's experimentable, open to the wild, and always looking for new recruits.

Single static binary → Deploy it anywhere, no fuss
OpenTelemetry tracing, config hot-reloading → Because observability is sexy
Fast Arrow record pooling and schema caching → Efficiency is the name of the game
Multiple auth modes → Secure without the headache

Run it locally, at the edge, or sneak it into a bigger system.

GizmoSQL: The Backend Whisperer

GizmoSQL is a full Arrow Flight SQL server with support for both DuckDB and SQLite as pluggable backends. Built in C++ and extended from Voltron Data's sqlflite, it's been battle-tested, hardened, and upgraded for real-world flexibility.

TLS, JWT, and init scripts → Secure and customizable by default
Docker-first deployment → Instant setup with production-grade defaults
JDBC, ADBC, CLI, Ibis, SQLAlchemy → Clients for nearly every stack

Whether you want to mount a local DB, run interactive pipelines, or integrate cleanly with BI tools, GizmoSQL is a solid, well-documented launchpad.

DuckDB deserves a clean, stable interface to the world.

Flight Club in Action

Ready to lift off? Here's how to get started with GizmoSQL:

Copy code
docker run -d \
  --name gizmosql \
  -p 31337:31337 \
  -e GIZMOSQL_USERNAME=gizmosql_username \
  -e GIZMOSQL_PASSWORD=gizmosql_password \
  gizmodata/gizmosql:latest

Give the server a few seconds to start.

Querying with Python

Here's how you talk to it:

Copy code
import os
from adbc_driver_flightsql import dbapi as gizmosql, DatabaseOptions

with gizmosql.connect(
    uri="grpc+tls://localhost:31337",
    db_kwargs={
        "username": os.getenv("GIZMOSQL_USERNAME", "gizmosql_username"),
        "password": os.getenv("GIZMOSQL_PASSWORD", "gizmosql_password"),
        DatabaseOptions.TLS_SKIP_VERIFY.value: "true",
    },
) as conn:
    with conn.cursor() as cur:
        cur.execute(
            "SELECT n_nationkey, n_name FROM nation WHERE n_nationkey = ?",
            parameters=[24],
        )
        x = cur.fetch_arrow_table()
        print(x)

That's it. No REST endpoints to design. No JDBC drivers to wrestle. Just SQL in, Arrow out, running at memory speed.

Want to serve this to a dashboard? Point Superset or Metabase at your GizmoSQL server. Need real-time ML features? Stream them through Flight SQL. The protocol handles the heavy lifting while you focus on the analytics.

Remember: This is your data. And it's ending one transformation at a time.

Why This Changes Everything

Once you unshackle DuckDB with Flight SQL, the possibilities explode like a data piñata:

Dashboards → Superset, Metabase, Tableau now get data at memory speed, not HTTP speed
Streaming pipelines → Arrow in, Arrow out, no conversion tax. Perfect for real-time ML feature stores
ML workloads → Feed models at 20+ Gb/s per core. Because your GPU is hungry
Federated meshes → DuckDB as a compute shard in your data galaxy, speaking Arrow end-to-end

Flight SQL makes these real, not just PowerPoint dreams. Here's what it means in practice:

10x faster dashboard refreshes → From coffee-break wait times to blink-and-you-miss-it speed
95% less CPU overhead → Your machines can focus on compute, not conversion
Zero data format tax → Arrow all the way down means no more format ping-pong

The Future of Flight SQL

Flight SQL is the start, not the finish line. It's the foundation for wilder ideas:

UDFs over Flight → Stream WASM or native extensions like a boss
Column-level security → Only stream what's allowed, no leaks
Inline analytics plugins → Embed computation right in the protocol
Self-hosted analytic nodes → Distribute DuckDB like confetti, not containers

This isn't a platform pitch. It's a protocol revolution. Each innovation builds on Flight's core promise: moving data at the speed of memory, not the speed of serialization.

Stop Torturing Analytics

Flight SQL isn't here to replace everything. It's just the fastest, cleanest, most developer-friendly way to serve columnar data over the wire in 2025. If your team is evaluating architectures for low-latency analytics, removing the network bottleneck with Arrow Flight is half the battle.

DuckDB changed how we crunch data locally. Flight SQL lets it spread its wings and scale horizontally—not just in size, but in impact. It's about unlocking the full potential of your analytics:

Local development → Lightning-fast iteration on your laptop
Edge deployment → DuckDB at every CDN point of presence
Cloud scale → Distributed queries that feel local

No more REST duct tape. No more JDBC relics. Let's build data services that treat DuckDB like the rockstar it is.

Give DuckDB wings. Let it soar.

The last rule of Flight Club? Build fast. Serve smart. Never serialize again.

how-to

2025/06/06 - Simon Späti

DuckDB Ecosystem: June 2025

DuckDB news: DuckLake combines catalog and table format with ACID metadata in SQL. Radio extension adds WebSocket and Redis Pub/Sub. Top CSV benchmark results.

2025/06/09 - Mehdi Ouazza

Getting Started with DuckLake: A New Table Format for Your Lakehouse

Learn how DuckLake simplifies metadata and brings fast, database-like features to your data lakehouse — with a hands-on example using DuckDB and PostgreSQL

View all

Why REST and JDBC Are Killing Your Data Stack — Flight SQL to the Rescue

Why REST and JDBC Are Killing Your Data Stack — Flight SQL to the Rescue

The Problem with REST and JDBC

Understanding Arrow Flight SQL

Arrow: A Data Format That Doesn't Suck

Flight: gRPC for Tables, No Bloat

Flight SQL: SQL with Wings

Meet the Flight Club Members

Hatch: The Purist's Choice

GizmoSQL: The Backend Whisperer

Flight Club in Action

Querying with Python

Why This Changes Everything

The Future of Flight SQL

Stop Torturing Analytics

Subscribe to motherduck blog

PREVIOUS POSTS

DuckDB Ecosystem: June 2025

Getting Started with DuckLake: A New Table Format for Your Lakehouse