The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
Updated
Feb 28, 2024 - Python
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Privacy and Security focused Segment-alternative, in Golang and React
Memphis.dev is a highly scalable and effortless data streaming platform
A list of useful resources to learn Data Engineering from scratch
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
task management & automation tool
A lightweight stream processing library for Go
Open-source data observability for analytics engineers.
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Example end to end data engineering project.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Smarter data pipelines for audio.
Pythonic tool for running machine-learning/high performance/quantum-computing workflows in heterogeneous environments.
A list about Apache Kafka
Code review for data in dbt
Streaming reactive and dataflow graphs in Python
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."