Skip to content

panchenkoai/rivet

Repository files navigation

Rivet

Lightweight, source-safe data extraction from PostgreSQL and MySQL to Parquet/CSV.

Rivet is a single-binary CLI that exports query results from relational databases to files — locally, to S3, to GCS, or to stdout. It is extract-only (no loading, no merging, no CDC) and designed to be gentle on production databases through tuning profiles, preflight health checks, and intelligent retry with backoff.

Names. The project and CLI are Rivet; the command is rivet. On crates.io the package is published as rivet-cli because the crate name rivet was already taken. Homebrew and release archives install the rivet binary.

What Rivet is (and is not)

Rivet does extraction end-to-end — query, batch, retry, validate, reconcile, checkpoint, plan/apply — from PostgreSQL 12–16 and MySQL 5.7 / 8.0 to Parquet (zstd / snappy / gzip / lz4 / none) or CSV. Destinations: local, Amazon S3, Google Cloud Storage, stdout. See docs/ for the full feature list and contracts.

Rivet is not a loader, a CDC platform, an ELT orchestrator, or a SaaS connector marketplace. It deliberately stops at "file on disk / in a bucket" — you bring the warehouse side yourself.

Documentation language: English-only. See CONTRIBUTING.md.

See it run

Rivet basic workflow — init, doctor, check, run, state

More walkthroughs: plan / apply · reconcile + repair. Source scripts in docs/gifs/.


Installation

Homebrew (macOS / Linux) — recommended

brew install panchenkoai/rivet/rivet
rivet --version

cargo install (crates.io)

Requires Rust 1.94+:

cargo install rivet-cli
rivet --version

The binary is named rivet. The crate is published as rivet-cli because the rivet name on crates.io is taken.

Pre-built binaries

Download the latest release for your platform from GitHub Releases:

# macOS (Apple Silicon)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-aarch64-apple-darwin.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/

# macOS (Intel)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-x86_64-apple-darwin.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/

# Linux (arm64)
curl -L https://github.com/panchenkoai/rivet/releases/latest/download/rivet-aarch64-unknown-linux-gnu.tar.gz | tar xz
sudo mv rivet-*/rivet /usr/local/bin/
rivet --version

Docker

docker run --rm ghcr.io/panchenkoai/rivet:latest --version

docker run --rm \
  -e DATABASE_URL="postgresql://user:pass@host.docker.internal:5432/db" \
  -v $(pwd)/examples/rivet.yaml:/config/rivet.yaml \
  -v $(pwd)/output:/output \
  ghcr.io/panchenkoai/rivet:latest \
  run --config /config/rivet.yaml

From a container, localhost is not your machine. Use host.docker.internal (Docker Desktop) or --add-host=host.docker.internal:host-gateway on Linux. See Getting Started for details.

Build from source

Requires Rust 1.94+:

git clone https://github.com/panchenkoai/rivet.git
cd rivet
cargo build --release
# binary is at target/release/rivet

Documentation

Topic Link
All docs (index) docs/README.md
First run — install, connect, export docs/getting-started.md
Export modes (full, incremental, chunked, time_window) docs/modes/
Destinations (local, S3, GCS, stdout) docs/destinations/
Config YAML reference docs/reference/config.md
CLI commands and flags docs/reference/cli.md
Tuning profiles docs/reference/tuning.md
Scaffold config from a live DB (rivet init) docs/reference/init.md
Pipeline, traits, memory model, source layout docs/architecture.md
Pilot guide (ordered instructions) docs/pilot/README.md
Quickstart: PostgreSQL docs/pilot/quickstart-postgres.md
Quickstart: MySQL docs/pilot/quickstart-mysql.md
Demo on a pre-seeded 14-table fixture (~10 min) docs/pilot/demo-quickstart.md
Pilot walkthrough — discovery → reconcile → repair docs/pilot/pilot-walkthrough.md
Production checklist docs/pilot/production-checklist.md
Architecture decision records docs/adr/
Contributing, tests, CI CONTRIBUTING.md

Resource-aware extraction

Rivet v0.5.x adds explicit controls over how much RAM, CPU, and disk a run is allowed to consume. These are production-safety primitives, not performance knobs.

Memory controls

Setting What it controls
tuning.max_batch_memory_mb Hard cap on a single Arrow batch. When exceeded, the on_batch_memory_exceeded policy fires.
tuning.on_batch_memory_exceeded warn (log + continue) · fail (abort) · auto_shrink (split batch recursively, then continue)
tuning.memory_threshold_mb Process-level RSS guard — pauses fetching when RSS exceeds the threshold
tuning.batch_size_memory_mb Adaptive batch sizing: Rivet samples the first batch to estimate row width, then adjusts subsequent batch sizes automatically

Output controls

Setting What it controls
compression_profile none / fast (Snappy) / balanced (Zstd-3) / compact (Zstd-9)
parquet.row_group_strategy auto (schema-based estimate) / fixed_rows / fixed_memory
parquet.target_row_group_mb Target row group size; lower values reduce peak RSS during Parquet writes

Quality gates

Setting What it controls
quality.row_count_min / row_count_max Fail the export if row count is outside this range — fires even when the source returns 0 rows
quality.null_ratio_max Fail the export if the null ratio in a column exceeds the threshold
quality.unique_columns Track column uniqueness via typed xxHash3-64 hashing
quality.unique_max_entries Cap the uniqueness hash set to prevent unbounded memory growth on high-cardinality columns

Choosing settings for your environment

Environment Recommended starting point
Production database (shared) profile: safe, max_batch_memory_mb: 128, on_batch_memory_exceeded: warn
CI / strict pipeline max_batch_memory_mb: 128, on_batch_memory_exceeded: fail
Low-memory host (1–2 GB) profile: safe, max_batch_memory_mb: 64, on_batch_memory_exceeded: auto_shrink
Read replica / fast backfill profile: fast, compression_profile: fast

See the Best Practices guides for detailed explanations, trade-off analysis, and worked examples:


Releases and roadmap

  • Latest release and version history: CHANGELOG.md.
  • Strategy, pains, and execution tracker: rivet_roadmap.md — the single source of truth for what is shipped and what is open.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages