COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20220909010705/https://github.com/topics/data-pipeline
Here are
381 public repositories
matching this topic...
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Updated
Sep 8, 2022
Scala
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Kestra is an infinitely scalable orchestration and scheduling platform, creating, running, scheduling, and monitoring millions of complex pipelines.
A list of useful resources to learn Data Engineering from scratch
The open standard for data logging
Updated
Sep 8, 2022
Jupyter Notebook
task management & automation tool
Updated
Sep 8, 2022
Python
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Updated
May 24, 2022
Jupyter Notebook
A lightweight stream processing library for Go
Smarter data pipelines for audio.
Updated
Mar 23, 2022
Python
Open-source data observability for analytics engineers
Example end to end data engineering project.
Updated
Jul 6, 2022
Python
A list about Apache Kafka
Streaming reactive and dataflow graphs in Python
Updated
Jul 5, 2022
Python
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Updated
Jun 9, 2021
Python
Use SQL to build ELT pipelines on a data lakehouse.
Updated
May 25, 2022
JavaScript
A powerful messaging platform for modern developers
Data Integration for Production Data Stores.
🐳 Tool to automate data quality checks on data pipelines
Fluent data pipelines for python and your shell
Updated
Feb 18, 2022
Python
Content for architecting a data science platform for products using Luigi, Spark & Flask.
Updated
Jan 27, 2020
Jupyter Notebook
Improve this page
Add a description, image, and links to the
data-pipeline
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
data-pipeline
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.