COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200911181709/https://github.com/topics/etl-pipeline
Here are
271 public repositories
matching this topic...
A stream processor for mundane tasks written in Go
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Updated
Mar 9, 2020
Python
Example project implementing best practices for PySpark ETL jobs and applications.
Updated
Jul 9, 2020
Python
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Updated
Mar 5, 2020
Python
An extensible Java framework for building XML and non-XML (CSV, EDI, Java, etc...) streaming applications
Updated
Sep 11, 2020
Java
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Clojure dataframe library and pipeline for data processing and machine learning
Updated
Aug 14, 2020
Clojure
Download DIG to run on your laptop or server.
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
A simple Spark-powered ETL framework that just works 🍺
Updated
Sep 9, 2020
Scala
SEO dashboard from Search console Data using the Google Search API, Mysql database , NodeJS RESTAPI( ExpressJS) and reactJs Dashboard
Updated
Jul 7, 2020
JavaScript
Azure Data Factory Hands On Lab - Step by Step - A Comprehensive Azure Data Factory and Mapping Data Flow step by step tutorial
🚹 💾 Script to import issues from a JIRA instance into a database.
Updated
Sep 10, 2020
Python
Updated
Sep 10, 2020
Scala
A Kafka Connect source connector that generates data for tests
Updated
Jun 26, 2019
Java
Updated
Apr 9, 2018
Python
Running an ETL pipeline with COBOL on Kubernetes
Updated
Sep 10, 2020
Shell
Blog post on ETL pipelines with Airflow
Updated
Jun 7, 2020
Jupyter Notebook
Ethereum Analytical Database - Ethereum data access solution that can be used for analytics and application development. The solution works on a fast DB - Clickhouse.
Updated
Oct 30, 2019
HTML
A tutorial to setup and deploy a simple Serverless Python workflow with REST API endpoints in AWS Lambda.
Updated
Apr 22, 2020
Python
Waterdrop Plugin developing examples.
Updated
Jun 11, 2020
Scala
ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event
Updated
Feb 24, 2019
Python
ETL pipeline for the Ethereum blockchain
Updated
Feb 13, 2019
JavaScript
Build end-to-end Machine Learning pipeline to predict accessibility of playgrounds in NYC
Updated
Jul 9, 2020
Jupyter Notebook
Parallel Streaming Transformation Loader
Updated
Apr 23, 2019
Java
Improve this page
Add a description, image, and links to the
etl-pipeline
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
etl-pipeline
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
currently Metorikku is using a simple YAML config file as input.
we need to be able to override this configuration using CLI params