COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200812182907/https://github.com/topics/apache-spark
Here are
916 public repositories
matching this topic...
Open source platform for the machine learning lifecycle
Updated
Aug 12, 2020
Python
酷玩 Spark: Spark 源代码解析、Spark 类库等
Updated
May 26, 2019
Scala
Interactive and Reactive Data Science using Scala and Spark.
Updated
Jun 2, 2020
JavaScript
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Updated
Aug 12, 2020
Jupyter Notebook
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Updated
Mar 17, 2020
Java
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Apache Spark docker image
Updated
Jun 26, 2020
Dockerfile
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
(Deprecated) Scikit-learn integration package for Apache Spark
Updated
Dec 3, 2019
Python
PySpark + Scikit-learn = Sparkit-learn
Updated
Oct 24, 2017
Python
A curated list of awesome Apache Spark packages and resources.
The Internals of Apache Spark
C# and F# language binding and extensions to Apache Spark
🚚 Agile Data Science Workflows made easy with Pyspark
Updated
Aug 12, 2020
Jupyter Notebook
R interface for Apache Spark
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Updated
Jan 24, 2017
Scala
A Cluster Computing System for Processing Large-Scale Spatial Data
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Updated
Mar 9, 2020
Python
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Updated
Jul 25, 2018
Python
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on
https://github.com/apache/spark/
Updated
Jan 8, 2020
Scala
A command-line tool for launching Apache Spark clusters.
Updated
Aug 3, 2020
Python
REST web service for the true real-time scoring (<1 ms) of R, Scikit-Learn and Apache Spark models
Updated
Jun 10, 2019
Haskell
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Updated
Jul 29, 2020
Jupyter Notebook
A list about Apache Kafka
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Updated
Sep 14, 2015
Shell
The Internals of Spark Structured Streaming
This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Updated
Jun 17, 2020
Scala
Serverless proxy for Spark cluster
Updated
Oct 7, 2019
Scala
Improve this page
Add a description, image, and links to the
apache-spark
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
apache-spark
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.