Lightweight, source-safe data extraction from PostgreSQL and MySQL to Parquet/CSV.
Rivet is a single-binary CLI that exports query results from relational databases to files — locally, to S3, to GCS, or to stdout. It is extract-only (no loading, no merging, no CDC) and designed to be gentle on production databases through tuning profiles, preflight health checks, and intelligent retry with backoff.
Names. The project and CLI are Rivet; the command is
rivet. On crates.io the package is published asrivet-clibecause the crate namerivetwas already taken. Homebrew and release archives install therivetbinary.
Rivet does extraction end-to-end — query, batch, retry, validate, reconcile, checkpoint, plan/apply — from PostgreSQL 12–16 and MySQL 5.7 / 8.0 to Parquet (zstd / snappy / gzip / lz4 / none) or CSV. Destinations: local, Amazon S3, Google Cloud Storage, stdout. See docs/ for the full feature list and contracts.
Rivet is not a loader, a CDC platform, an ELT orchestrator, or a SaaS connector marketplace. It deliberately stops at "file on disk / in a bucket" — you bring the warehouse side yourself.
Documentation language: English-only. See CONTRIBUTING.md.
More walkthroughs: plan / apply · reconcile + repair. Source scripts in docs/gifs/.
brew install panchenkoai/rivet/rivet
rivet --versionRequires Rust 1.94+:
cargo install rivet-cli
rivet --versionThe binary is named
rivet. The crate is published asrivet-clibecause therivetname on crates.io is taken.
Download the latest release for your platform from GitHub Releases:
# macOS (Apple Silicon)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-aarch64-apple-darwin.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/
# macOS (Intel)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-x86_64-apple-darwin.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/
# Linux (x86_64)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/
# Linux (arm64)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-aarch64-unknown-linux-gnu.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/rivet --versiondocker run --rm ghcr.io/panchenkoai/rivet:latest --version
docker run --rm \
-e DATABASE_URL="postgresql://user:pass@host.docker.internal:5432/db" \
-v $(pwd)/examples/rivet.yaml:/config/rivet.yaml \
-v $(pwd)/output:/output \
ghcr.io/panchenkoai/rivet:latest \
run --config /config/rivet.yamlFrom a container,
localhostis not your machine. Usehost.docker.internal(Docker Desktop) or--add-host=host.docker.internal:host-gatewayon Linux. See Getting Started for details.
Requires Rust 1.94+:
git clone https://github.com/panchenkoai/rivet.git
cd rivet
cargo build --release
# binary is at target/release/rivet| Topic | Link |
|---|---|
| All docs (index) | docs/README.md |
| First run — install, connect, export | docs/getting-started.md |
Export modes (full, incremental, chunked, time_window) |
docs/modes/ |
| Destinations (local, S3, GCS, stdout) | docs/destinations/ |
| Config YAML reference | docs/reference/config.md |
| CLI commands and flags | docs/reference/cli.md |
| Tuning profiles | docs/reference/tuning.md |
Scaffold config from a live DB (rivet init) |
docs/reference/init.md |
| Pipeline, traits, memory model, source layout | docs/architecture.md |
| Pilot guide (ordered instructions) | docs/pilot/README.md |
| Quickstart: PostgreSQL | docs/pilot/quickstart-postgres.md |
| Quickstart: MySQL | docs/pilot/quickstart-mysql.md |
| Demo on a pre-seeded 14-table fixture (~10 min) | docs/pilot/demo-quickstart.md |
| Pilot walkthrough — discovery → reconcile → repair | docs/pilot/pilot-walkthrough.md |
| Production checklist | docs/pilot/production-checklist.md |
| Architecture decision records | docs/adr/ |
| Contributing, tests, CI | CONTRIBUTING.md |
Rivet v0.5.x adds explicit controls over how much RAM, CPU, and disk a run is allowed to consume. These are production-safety primitives, not performance knobs.
| Setting | What it controls |
|---|---|
tuning.max_batch_memory_mb |
Hard cap on a single Arrow batch. When exceeded, the on_batch_memory_exceeded policy fires. |
tuning.on_batch_memory_exceeded |
warn (log + continue) · fail (abort) · auto_shrink (split batch recursively, then continue) |
tuning.memory_threshold_mb |
Process-level RSS guard — pauses fetching when RSS exceeds the threshold |
tuning.batch_size_memory_mb |
Adaptive batch sizing: Rivet samples the first batch to estimate row width, then adjusts subsequent batch sizes automatically |
| Setting | What it controls |
|---|---|
compression_profile |
none / fast (Snappy) / balanced (Zstd-3) / compact (Zstd-9) |
parquet.row_group_strategy |
auto (schema-based estimate) / fixed_rows / fixed_memory |
parquet.target_row_group_mb |
Target row group size; lower values reduce peak RSS during Parquet writes |
| Setting | What it controls |
|---|---|
quality.row_count_min / row_count_max |
Fail the export if row count is outside this range — fires even when the source returns 0 rows |
quality.null_ratio_max |
Fail the export if the null ratio in a column exceeds the threshold |
quality.unique_columns |
Track column uniqueness via typed xxHash3-64 hashing |
quality.unique_max_entries |
Cap the uniqueness hash set to prevent unbounded memory growth on high-cardinality columns |
| Environment | Recommended starting point |
|---|---|
| Production database (shared) | profile: safe, max_batch_memory_mb: 128, on_batch_memory_exceeded: warn |
| CI / strict pipeline | max_batch_memory_mb: 128, on_batch_memory_exceeded: fail |
| Low-memory host (1–2 GB) | profile: safe, max_batch_memory_mb: 64, on_batch_memory_exceeded: auto_shrink |
| Read replica / fast backfill | profile: fast, compression_profile: fast |
See the Best Practices guides for detailed explanations, trade-off analysis, and worked examples:
- Resource-aware extraction — memory budgets, policies, RSS formula
- Parquet tuning — row group strategies, targets, downstream read implications
- Compression profiles — profile-to-codec mapping, CPU/size trade-offs
- Quality checks — row count gates, null ratio, uniqueness cap
- Low-memory runners — settings for 512 MB–4 GB hosts
- Recovery and resume —
--resumesemantics, crash recovery
- Latest release and version history: CHANGELOG.md.
- Strategy, pains, and execution tracker: rivet_roadmap.md — the single source of truth for what is shipped and what is open.
