Open source platform for the machine learning lifecycle
-
Updated
Dec 10, 2022 - Python
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Open source platform for the machine learning lifecycle
Simple and Distributed Machine Learning
酷玩 Spark: Spark 源代码解析、Spark 类库等
Interactive and Reactive Data Science using Scala and Spark.
lakeFS - Git-like capabilities for your object storage
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache Spark docker image
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
A curated list of awesome Apache Spark packages and resources.
Feathr – An Enterprise-Grade, High Performance Feature Store
PySpark + Scikit-learn = Sparkit-learn
(Deprecated) Scikit-learn integration package for Apache Spark
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
C# and F# language binding and extensions to Apache Spark
R interface for Apache Spark
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Created by Matei Zaharia
Released May 26, 2014