The fast in-process analytical database. SQL extensions, Parquet-native, zero external dependencies.
The most common DuckDB operations at a glance. DuckDB is an in-process SQL OLAP database management system designed for analytical workloads. Think of it as "SQLite for analytics."
duckdb
Open an in-memory database
duckdb my_data.duckdb
Persistent database on disk
SELECT * FROM 'data.parquet';
Query Parquet files directly
SELECT * FROM 'data.csv';
Auto-detect CSV schema
SELECT * FROM 'data.json';
Newline-delimited or array JSON
COPY tbl TO 'out.parquet';
Write query results to Parquet
DESCRIBE my_table;
Show column names and types
SHOW TABLES;
List all tables in current database
| Command | Description |
|---|---|
.help | Show all dot commands |
.tables | List all tables |
.schema | Show schema for all tables |
.mode | Set output mode (box, csv, json, markdown, table, line) |
.timer on | Show query execution time |
.maxrows N | Set maximum rows displayed |
.open file.db | Open a different database file |
.quit | Exit the CLI |
.changes on | Show number of changed rows after DML |
.columns | Column-oriented output mode |
DuckDB is designed for minimal friction. No server process, no external dependencies, no configuration files. A single binary or library is all you need.
brew install duckdb
Latest stable release
sudo apt install duckdb
Or download binary from GitHub
winget install DuckDB.cli
Also available via Chocolatey
pip install duckdb
Python 3.7+ supported
npm install duckdb
Native addon via node-gyp
cargo add duckdb
Via duckdb-rs crate
-- In-memory (data lost on exit)
duckdb
-- Persistent database file
duckdb analytics.duckdb
-- Read-only mode
duckdb -readonly analytics.duckdb
-- Attach additional databases
ATTACH 'other.duckdb' AS other_db;
SELECT * FROM other_db.main.my_table;
-- Show all settings
SELECT * FROM duckdb_settings();
-- Set memory limit
SET memory_limit = '4GB';
-- Set number of threads
SET threads = 4;
-- Enable progress bar for long queries
SET enable_progress_bar = true;
-- Set default sort order
SET default_order = 'DESC';
-- Preserve insertion order (default: true)
SET preserve_insertion_order = true;
DuckDB supports a rich extension ecosystem. Core extensions are auto-loaded on first use; community extensions require explicit installation.
-- Install an extension
INSTALL httpfs;
-- Load an extension
LOAD httpfs;
-- Install + load in one step (auto-loadable extensions)
INSTALL spatial;
LOAD spatial;
-- List installed extensions
SELECT * FROM duckdb_extensions();
-- Update all extensions
UPDATE EXTENSIONS;
-- Install from community repository
INSTALL delta FROM community;
| Extension | Purpose | Auto-load |
|---|---|---|
httpfs | Read files over HTTP/S3 | Yes |
json | JSON reading/writing | Yes |
parquet | Parquet support | Yes |
icu | Unicode collations & timezone | Yes |
fts | Full-text search | No |
spatial | Geospatial operations | No |
excel | Read/write Excel files | No |
sqlite_scanner | Read from SQLite databases | No |
postgres_scanner | Read from PostgreSQL | No |
delta | Read Delta Lake tables | No |
DuckDB can directly query files without importing them first. Parquet, CSV, JSON, and Excel files can all be read with a simple SELECT statement, or loaded into tables with COPY or CREATE TABLE AS.
-- Query a single Parquet file
SELECT * FROM 'sales_2024.parquet';
-- Query multiple files with glob
SELECT * FROM 'sales_*.parquet';
-- Query a directory of Parquet files
SELECT * FROM read_parquet('data/**/*.parquet');
-- Hive-partitioned Parquet
SELECT * FROM read_parquet(
's3://bucket/data/**/*.parquet',
hive_partitioning = true
);
-- Get Parquet metadata without reading data
SELECT * FROM parquet_metadata('data.parquet');
SELECT * FROM parquet_schema('data.parquet');
-- Export to Parquet
COPY (
SELECT * FROM sales WHERE year = 2024
) TO 'sales_2024.parquet' (FORMAT PARQUET);
-- Export with compression
COPY tbl TO 'out.parquet' (
FORMAT PARQUET,
COMPRESSION ZSTD,
ROW_GROUP_SIZE 100000
);
-- Auto-detect everything (delimiter, types, headers)
SELECT * FROM 'data.csv';
-- Explicit options
SELECT * FROM read_csv(
'data.csv',
header = true,
delim = '|',
dateformat = '%m/%d/%Y',
columns = {
'id': 'INTEGER',
'name': 'VARCHAR',
'amount': 'DECIMAL(10,2)'
}
);
-- Read multiple CSVs with glob
SELECT * FROM read_csv('logs_*.csv', union_by_name = true);
-- Include filename column
SELECT *, filename
FROM read_csv('data/*.csv', filename = true);
-- Export to CSV
COPY tbl TO 'output.csv' (HEADER, DELIMITER ',');
-- Export with gzip compression
COPY tbl TO 'output.csv.gz' (HEADER);
-- Newline-delimited JSON
SELECT * FROM 'data.ndjson';
-- JSON array format
SELECT * FROM read_json('data.json', format = 'array');
-- Explicit JSON schema
SELECT * FROM read_json(
'events.json',
columns = {
'ts': 'TIMESTAMP',
'event': 'VARCHAR',
'payload': 'JSON'
}
);
-- Extract nested JSON fields
SELECT
data->'$.user.name' AS name,
data->'$.user.age' AS age
FROM read_json('users.json');
-- Export to JSON
COPY tbl TO 'output.json';
COPY tbl TO 'output.ndjson' (FORMAT JSON, ARRAY false);
INSTALL spatial; -- or excel
SELECT * FROM st_read('data.xlsx');
Requires spatial or excel extension
INSTALL sqlite_scanner;
LOAD sqlite_scanner;
SELECT * FROM sqlite_scan('app.db', 'users');
Query SQLite files directly
INSTALL postgres_scanner;
LOAD postgres_scanner;
SELECT * FROM postgres_scan(
'host=localhost dbname=prod', 'orders');
Query Postgres tables without ETL
CREATE TABLE sales AS
SELECT * FROM 'sales.parquet'
WHERE amount > 100;
Import file data into a DuckDB table
| Option | Values | Description |
|---|---|---|
FORMAT | CSV, PARQUET, JSON | Output format |
HEADER | true/false | Include column headers (CSV) |
DELIMITER | Character | Field separator (CSV) |
COMPRESSION | GZIP, ZSTD, SNAPPY, LZ4 | Compression algorithm |
PARTITION_BY | Column list | Hive-style partitioning |
PER_THREAD_OUTPUT | true/false | One file per thread |
ROW_GROUP_SIZE | Integer | Parquet row group size |
DuckDB extends standard SQL with numerous ergonomic features. These extensions reduce boilerplate, improve readability, and enable analytical patterns that are cumbersome in other databases.
-- EXCLUDE columns from SELECT *
SELECT * EXCLUDE (internal_id, created_at)
FROM users;
-- REPLACE columns in SELECT *
SELECT * REPLACE (upper(name) AS name)
FROM users;
-- COLUMNS expression — apply function to matching columns
SELECT min(COLUMNS('amount_*'))
FROM invoices;
-- COLUMNS with lambda
SELECT COLUMNS(c -> c LIKE '%_id')
FROM orders;
-- GROUP BY ALL — auto-group by all non-aggregate columns
SELECT region, product, sum(amount)
FROM sales
GROUP BY ALL;
-- ORDER BY ALL — sort by every column
SELECT * FROM products
ORDER BY ALL;
-- FROM-first: more natural for exploration
FROM sales
SELECT region, sum(amount)
GROUP BY region;
-- Implicit SELECT * (just FROM)
FROM sales;
-- equivalent to: SELECT * FROM sales;
-- FROM-first with WHERE
FROM sales
WHERE year = 2024
SELECT region, count(*);
-- FROM-first with LIMIT
FROM 'huge_file.parquet'
LIMIT 10;
-- Create a list (array)
SELECT [1, 2, 3] AS nums;
-- Create a struct (named tuple)
SELECT {'x': 1, 'y': 2} AS point;
-- Create a MAP
SELECT map(['a', 'b'], [1, 2]) AS m;
-- Access struct fields
SELECT point.x, point.y
FROM (SELECT {'x': 10, 'y': 20} AS point);
-- Unnest lists into rows
SELECT unnest(['a', 'b', 'c']) AS letter;
-- List comprehension
SELECT [x * 2 FOR x IN [1, 2, 3]] AS doubled;
-- Filter within list
SELECT [x FOR x IN [1, 2, 3, 4, 5] IF x > 3];
-- String slicing (Python-style)
SELECT 'DuckDB'[1:4]; -- 'Duck'
-- String similarity
SELECT jaro_winkler_similarity('duck', 'dock');
-- Regular expressions
SELECT regexp_extract(
'Price: $42.50',
'(\d+\.\d+)', 1
); -- '42.50'
-- String aggregation
SELECT string_agg(name, ', ' ORDER BY name)
FROM users
GROUP BY department;
-- Format strings
SELECT format('{} has {} items', name, count)
FROM inventory;
-- QUALIFY — filter window function results (like HAVING for aggregates)
SELECT name, department, salary
FROM employees
QUALIFY row_number() OVER (
PARTITION BY department
ORDER BY salary DESC
) = 1;
-- SAMPLE — random sampling
SELECT * FROM big_table
USING SAMPLE 10 PERCENT;
SELECT * FROM big_table
USING SAMPLE 1000 ROWS;
-- PIVOT and UNPIVOT
PIVOT sales
ON quarter
USING sum(amount)
GROUP BY product;
UNPIVOT monthly_data
ON jan, feb, mar, apr
INTO NAME month VALUE revenue;
-- ASOF JOIN — join on nearest timestamp
SELECT t.*, q.price
FROM trades t
ASOF JOIN quotes q
ON t.ticker = q.ticker
AND t.ts >= q.ts;
-- Scalar macro
CREATE MACRO add(a, b) AS a + b;
SELECT add(3, 4); -- 7
-- Table macro
CREATE MACRO top_n(tbl, n) AS TABLE
SELECT * FROM tbl LIMIT n;
SELECT * FROM top_n(sales, 5);
-- Macro with default parameters
CREATE MACRO pct(val, total := 100) AS
round(val / total * 100, 2);
-- Recursive CTE with macro
CREATE MACRO fibonacci(n) AS TABLE
WITH RECURSIVE fib(a, b, i) AS (
VALUES (0, 1, 1)
UNION ALL
SELECT b, a + b, i + 1
FROM fib WHERE i < n
)
SELECT a AS value FROM fib;
-- Create an ENUM type
CREATE TYPE mood AS ENUM ('happy', 'sad', 'neutral');
CREATE TABLE journal (
entry_date DATE,
feeling mood,
notes VARCHAR
);
-- Enums use minimal storage (dictionary encoding)
INSERT INTO journal VALUES
('2024-01-15', 'happy', 'Great day!'),
('2024-01-16', 'neutral', 'Uneventful');
Window functions are essential for analytical queries. DuckDB supports the full SQL window function specification with optimized execution for large-scale aggregations.
-- ROW_NUMBER: unique sequential rank
SELECT
name, department, salary,
row_number() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rank
FROM employees;
-- RANK: ties get same rank, gaps in sequence
SELECT
product, revenue,
rank() OVER (ORDER BY revenue DESC) AS rnk
FROM products;
-- DENSE_RANK: ties get same rank, no gaps
SELECT
student, score,
dense_rank() OVER (ORDER BY score DESC) AS drnk
FROM exam_results;
-- NTILE: divide into N equal groups
SELECT
customer, total_spend,
ntile(4) OVER (ORDER BY total_spend DESC) AS quartile
FROM customers;
-- Running total
SELECT
date, amount,
sum(amount) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM transactions;
-- Moving average (7-day)
SELECT
date, value,
avg(value) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM metrics;
-- Percentage of total
SELECT
category, revenue,
round(
revenue * 100.0 / sum(revenue) OVER (),
2
) AS pct_of_total
FROM sales;
-- Min/Max within partition
SELECT
employee, department, salary,
salary - min(salary) OVER (PARTITION BY department) AS above_dept_min,
max(salary) OVER (PARTITION BY department) - salary AS below_dept_max
FROM employees;
-- LAG / LEAD — access previous/next rows
SELECT
date, price,
lag(price, 1) OVER (ORDER BY date) AS prev_price,
price - lag(price, 1) OVER (ORDER BY date) AS daily_change,
lead(price, 1) OVER (ORDER BY date) AS next_price
FROM stock_prices;
-- FIRST_VALUE / LAST_VALUE
SELECT
ts, event,
first_value(event) OVER (
PARTITION BY user_id
ORDER BY ts
) AS first_event,
last_value(event) OVER (
PARTITION BY user_id
ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_event
FROM events;
-- NTH_VALUE
SELECT
nth_value(product, 3) OVER (
ORDER BY revenue DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS third_best
FROM products;
-- Top earner per department (no subquery needed)
SELECT name, department, salary
FROM employees
QUALIFY row_number() OVER (
PARTITION BY department ORDER BY salary DESC
) = 1;
-- De-duplicate: keep latest record per user
SELECT *
FROM user_events
QUALIFY row_number() OVER (
PARTITION BY user_id ORDER BY ts DESC
) = 1;
-- Products with above-average revenue in their category
SELECT product, category, revenue
FROM products
QUALIFY revenue > avg(revenue) OVER (PARTITION BY category);
-- Define a window once, reuse multiple times
SELECT
date, ticker, price,
avg(price) OVER w AS avg_price,
min(price) OVER w AS min_price,
max(price) OVER w AS max_price,
lag(price) OVER w AS prev_price
FROM stock_prices
WINDOW w AS (
PARTITION BY ticker
ORDER BY date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
);
DuckDB's Python API is designed for seamless integration with the data science ecosystem. It can query Pandas DataFrames, Polars DataFrames, and Arrow tables directly without any data copying.
import duckdb
# In-memory connection (default)
con = duckdb.connect()
# Persistent database
con = duckdb.connect('analytics.duckdb')
# Execute SQL
result = con.sql("SELECT 42 AS answer")
print(result.fetchall()) # [(42,)]
# Use the module-level API (implicit connection)
duckdb.sql("SELECT * FROM 'data.parquet' LIMIT 5").show()
# Execute and fetch as different formats
result = duckdb.sql("SELECT * FROM 'sales.csv'")
result.fetchdf() # Pandas DataFrame
result.fetchnumpy() # Dict of NumPy arrays
result.arrow() # PyArrow Table
result.pl() # Polars DataFrame
import pandas as pd
import duckdb
# Create a Pandas DataFrame
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'score': [95, 82, 91]
})
# Query the DataFrame by variable name
duckdb.sql("SELECT * FROM df WHERE score > 85").show()
# Join a DataFrame with a Parquet file
duckdb.sql("""
SELECT d.name, d.score, p.grade
FROM df d
JOIN 'grades.parquet' p ON d.name = p.name
""").show()
# Polars integration
import polars as pl
lf = pl.LazyFrame({'x': [1, 2, 3]})
duckdb.sql("SELECT sum(x) FROM lf").show()
import duckdb
# Chain operations without SQL strings
con = duckdb.connect()
result = (
con.read_parquet('sales.parquet')
.filter('amount > 100')
.aggregate('region, sum(amount) AS total')
.order('total DESC')
.limit(10)
.fetchdf()
)
# Mix relational API with SQL
rel = con.read_csv('data.csv')
con.sql("SELECT * FROM rel WHERE age > 30").show()
# Insert from DataFrame (fast)
con.sql("CREATE TABLE users AS SELECT * FROM df")
# Append to existing table
con.sql("INSERT INTO users SELECT * FROM new_df")
# Parameterized queries
con.execute(
"SELECT * FROM users WHERE age > ? AND city = ?",
[25, 'NYC']
).fetchdf()
# Transaction control
con.begin()
con.sql("INSERT INTO orders VALUES (1, 'widget', 42.0)")
con.sql("UPDATE inventory SET qty = qty - 1 WHERE item = 'widget'")
con.commit()
# Register a Python function as UDF
from duckdb.typing import VARCHAR
con.create_function(
'reverse_str',
lambda s: s[::-1],
[VARCHAR], VARCHAR
)
con.sql("SELECT reverse_str('DuckDB')").show()
DuckDB's Node.js client provides both callback-style and Promise-based APIs. The native addon runs DuckDB in-process for maximum performance.
// Install
// npm install duckdb
// or: npm install @duckdb/node-api (newer API)
const duckdb = require('duckdb');
// In-memory database
const db = new duckdb.Database(':memory:');
// Persistent database
const db = new duckdb.Database('analytics.duckdb');
// Read-only
const db = new duckdb.Database(
'analytics.duckdb',
{ access_mode: 'READ_ONLY' }
);
// Callback-based API
db.all("SELECT * FROM 'data.parquet' LIMIT 10", (err, rows) => {
if (err) throw err;
console.log(rows);
});
// Parameterized query
db.all(
"SELECT * FROM users WHERE age > ? AND city = ?",
25, 'NYC',
(err, rows) => { console.log(rows); }
);
// Streaming results (large datasets)
const con = db.connect();
const stmt = con.run("SELECT * FROM big_table");
stmt.each((err, row) => {
// Process one row at a time
console.log(row);
});
// Prepared statements
const stmt = db.prepare("INSERT INTO events VALUES (?, ?, ?)");
stmt.run(1, 'click', new Date());
stmt.run(2, 'scroll', new Date());
stmt.finalize();
// Using @duckdb/node-api (recommended for new projects)
import { DuckDBInstance } from '@duckdb/node-api';
const instance = await DuckDBInstance.create('analytics.duckdb');
const con = await instance.connect();
// Run a query
const result = await con.run(
"SELECT * FROM 'sales.parquet' WHERE amount > 1000"
);
// Get rows as objects
const rows = result.getRows();
console.log(rows);
// Stream chunks for large results
const reader = await con.runAndReadAll(
"SELECT * FROM big_table"
);
const chunks = reader.getChunks();
const express = require('express');
const duckdb = require('duckdb');
const app = express();
const db = new duckdb.Database('analytics.duckdb');
app.get('/api/sales', (req, res) => {
const { region, year } = req.query;
db.all(
`SELECT region, sum(amount) as total
FROM sales
WHERE region = ? AND year = ?
GROUP BY region`,
region, parseInt(year),
(err, rows) => {
if (err) return res.status(500).json({ error: err.message });
res.json(rows);
}
);
});
DuckDB-WASM brings the full power of DuckDB to the browser. Run analytical SQL queries entirely client-side, directly on Parquet files, CSVs, or in-memory data. No server required.
<!-- Load DuckDB-WASM from CDN -->
<script src="https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm/dist/duckdb-browser-blocking.js"></script>
<script type="module">
import * as duckdb from
'https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm/+esm';
// Select a bundle (browser picks best for platform)
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
// Instantiate
const worker = new Worker(bundle.mainWorker);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
// Connect and query
const con = await db.connect();
const result = await con.query("SELECT 42 AS answer");
console.log(result.toArray());
</script>
// Register a file from URL
await db.registerFileURL(
'remote.parquet',
'https://example.com/data.parquet',
duckdb.DuckDBDataProtocol.HTTP,
false
);
const result = await con.query(
"SELECT * FROM 'remote.parquet'"
);
// Register a file from user upload (File/Blob)
const fileInput = document.querySelector('input[type=file]');
fileInput.addEventListener('change', async (e) => {
const file = e.target.files[0];
await db.registerFileHandle(
file.name, file,
duckdb.DuckDBDataProtocol.BROWSER_FILEREADER,
true
);
const result = await con.query(
`SELECT * FROM '${file.name}' LIMIT 100`
);
renderTable(result);
});
// Insert data from JavaScript arrays
await con.query(`
CREATE TABLE metrics AS
SELECT * FROM (VALUES
('cpu', 0.85, '2024-01-01'),
('mem', 0.62, '2024-01-01')
) AS t(name, value, date)
`);
Interactive dashboards, data exploration, offline-capable apps, SQL playgrounds
~2 GB practical limit in browsers (memory constraints)
Multi-threaded via SharedArrayBuffer (requires COOP/COEP headers)
Subset of extensions available (parquet, json, httpfs via fetch API)
// In Observable notebooks
db = DuckDBClient.of({
flights: FileAttachment("flights.parquet")
});
// Then use SQL cells directly:
// SELECT carrier, count(*) FROM flights GROUP BY carrier
// In Jupyter with JupySQL
%load_ext sql
%sql duckdb:///:memory:
%%sql
SELECT * FROM 'https://example.com/data.parquet'
LIMIT 10;
The httpfs extension enables DuckDB to read files directly from HTTP servers, S3-compatible storage, Google Cloud Storage, and Azure Blob Storage. No downloads required -- query remote data with SQL.
-- Install and load httpfs
INSTALL httpfs;
LOAD httpfs;
-- Query a Parquet file over HTTP
SELECT count(*)
FROM 'https://example.com/data/sales.parquet';
-- Query a CSV from GitHub
SELECT *
FROM read_csv(
'https://raw.githubusercontent.com/user/repo/main/data.csv'
)
LIMIT 10;
-- Remote Parquet with predicate pushdown
-- Only downloads the row groups that match
SELECT *
FROM 'https://example.com/huge.parquet'
WHERE year = 2024 AND region = 'US';
-- Configure S3 credentials
SET s3_region = 'us-east-1';
SET s3_access_key_id = 'AKIAIOSFODNN7EXAMPLE';
SET s3_secret_access_key = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCY';
-- Or use credential chain (IAM, env vars, ~/.aws/credentials)
SET s3_use_credential_chain = true;
-- Query S3 Parquet files
SELECT *
FROM 's3://my-bucket/data/sales/*.parquet';
-- Hive-partitioned data on S3
SELECT *
FROM read_parquet(
's3://data-lake/events/**/*.parquet',
hive_partitioning = true
)
WHERE year = 2024 AND month = 3;
-- Write results to S3
COPY (
SELECT * FROM processed_data
) TO 's3://my-bucket/output/results.parquet';
-- Google Cloud Storage
SET s3_endpoint = 'storage.googleapis.com';
SET s3_access_key_id = 'your-key';
SET s3_secret_access_key = 'your-secret';
SELECT * FROM 's3://gcs-bucket/data.parquet';
-- Or use gs:// protocol directly
SELECT * FROM 'gs://my-bucket/data.parquet';
-- Azure Blob Storage
SET azure_storage_connection_string = 'DefaultEndpointsProtocol=...';
SELECT * FROM 'az://container/data.parquet';
-- Create a secret for S3
CREATE SECRET my_s3_secret (
TYPE S3,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCY',
REGION 'us-east-1'
);
-- Create a secret using credential chain
CREATE SECRET (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);
-- List secrets (credentials are masked)
SELECT * FROM duckdb_secrets();
-- Drop a secret
DROP SECRET my_s3_secret;
DuckDB and SQLite share the same philosophy (embeddable, serverless, zero-config) but are optimized for fundamentally different workloads. SQLite excels at OLTP; DuckDB excels at OLAP.
| Aspect | DuckDB | SQLite |
|---|---|---|
| Storage Model | Columnar | Row-oriented |
| Optimized For | OLAP (analytical queries) | OLTP (transactional queries) |
| Best At | Aggregations, scans, joins on large data | Single-row lookups, inserts, updates |
| Concurrency | Multiple readers, single writer (MVCC) | Multiple readers, single writer (WAL mode) |
| Type System | Strict types, rich type system | Flexible affinity system |
| Parallel Execution | Yes, intra-query parallelism | No, single-threaded execution |
| Vectorized Engine | Yes (processes data in batches) | No (tuple-at-a-time) |
| File Format Support | Parquet, CSV, JSON, Excel native | Own format only (via extensions) |
| Window Functions | Full support + QUALIFY | Basic support |
| Nested Types | Lists, structs, maps, unions | JSON text (via extension) |
| Database Size | Tested to 100s of GB | Tested to TB scale |
| Binary Size | ~40 MB | ~1 MB |
Ad-hoc queries on CSV/Parquet files without ETL pipelines
Transform and aggregate data between systems
SQL in Jupyter/Python as an alternative to pandas
OLAP engine inside applications (dashboards, BI)
Query JSON/CSV log files without infrastructure
Query S3/GCS Parquet files directly via httpfs
-- You can use both together!
INSTALL sqlite_scanner;
LOAD sqlite_scanner;
-- Attach a SQLite database
ATTACH 'app.sqlite' AS sqlite_db (TYPE SQLITE);
-- Run analytical queries on SQLite data
SELECT
region,
count(*) AS orders,
sum(total) AS revenue,
avg(total) AS avg_order
FROM sqlite_db.orders
GROUP BY region
ORDER BY revenue DESC;
-- Join SQLite data with Parquet files
SELECT o.*, p.category
FROM sqlite_db.orders o
JOIN 'products.parquet' p
ON o.product_id = p.id;