The Wayback Machine - https://web.archive.org/web/20221210023403/https://github.com/topics/apache-spark

#

apache-spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 1,321 public repositories matching this topic...

mlflow / mlflow

Open source platform for the machine learning lifecycle

machine-learning ai apache-spark ml model-management mlflow

Updated Dec 10, 2022
Python

SynapseML

microsoft / SynapseML

Simple and Distributed Machine Learning

Updated Dec 8, 2022
Scala

lw-lin / CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

spark apache-spark sparkcore spark-streaming structured-streaming

Updated May 18, 2022
Scala

spark-notebook / spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

data-science reactive scala spark apache-spark notebook

Updated Oct 19, 2021
JavaScript

lakeFS

treeverse / lakeFS

lakeFS - Git-like capabilities for your object storage

go golang apache-spark aws-s3 google-cloud-storage data-engineering data-lake azure-storage data-version-control object-storage datalake hadoop-filesystem data-quality data-versioning azure-blob-storage apache-sparksql git-for-data lakefs datalakes

Updated Dec 9, 2022
Go

intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

python scala apache-spark pytorch keras-tensorflow bigdl distributed-deep-learning deep-neural-network analytics-zoo

Updated Dec 8, 2022
Jupyter Notebook

GoogleCloudPlatform / spark-on-k8s-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

kubernetes spark apache-spark kubernetes-operator kubernetes-controller kubernetes-crd google-cloud-dataproc

Updated Dec 1, 2022
Go

big-data-europe / docker-spark

Apache Spark docker image

docker kubernetes apache-spark spark-kubernetes k8s-spark

Updated Jul 7, 2022
Shell

dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Updated Dec 9, 2022
C#

OryxProject / oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

java machine-learning kafka apache-spark cloudera apache-kafka lambda-architecture oryx

Updated Aug 16, 2021
Java

awesome-spark / awesome-spark

A curated list of awesome Apache Spark packages and resources.

awesome apache-spark pyspark sparkr

Updated Dec 4, 2022
Shell

feathr-ai / feathr

Feathr – An Enterprise-Grade, High Performance Feature Store

data-science machine-learning apache-spark azure artificial-intelligence data-engineering feature-engineering data-quality mlops feature-store feature-management feature-marketplace feature-governance feature-metadata feature-platform

Updated Dec 9, 2022
Scala

japila-books / apache-spark-internals

The Internals of Apache Spark

spark apache-spark book internals

Updated Dec 7, 2022

lensacom / sparkit-learn

PySpark + Scikit-learn = Sparkit-learn

python machine-learning apache-spark scikit-learn distributed-computing

Updated Dec 31, 2020
Python

databricks / spark-sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

machine-learning apache-spark scikit-learn grid-search parameter-tuning

Updated Dec 3, 2019
Python

goodreads_etl_pipeline

san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Updated Mar 9, 2020
Python

microsoft / Mobius

C# and F# language binding and extensions to Apache Spark

streaming spark apache-spark csharp fsharp bigdata dataset spark-streaming eventhubs mapreduce dataframe rdd dstream mobius kafka-streaming near-real-time

Updated Dec 8, 2022
C#

sparklyr

sparklyr / sparklyr

R interface for Apache Spark

machine-learning r spark apache-spark dplyr ide distributed rstats sparklyr livy remote-clusters

Updated Dec 8, 2022
R

databricks / LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

spark apache-spark mllib structured-streaming spark-sql spark-mllib mlflow delta-lake

Updated Dec 12, 2021
Scala

miguno / kafka-storm-starter

[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

scala kafka spark apache-spark storm integration avro apache-storm apache-kafka apache-avro

Updated Mar 22, 2022
Scala

Created by Matei Zaharia

Released May 26, 2014

Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics