apache-spark

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

scala kafka spark apache-spark storm integration avro apache-storm apache-kafka apache-avro

Updated Jan 24, 2017
Scala

DataSystemsLab / GeoSpark

Star

A Cluster Computing System for Processing Large-Scale Spatial Data

apache-spark geospatial spatial-analysis spatial-index spatial-queries cluster-computing spatial-join spatial-sql

Updated Aug 9, 2020
Java

san089 / goodreads_etl_pipeline

Star

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Updated Mar 9, 2020
Python

cerndb / dist-keras

Star

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

data-science machine-learning spark apache-spark deep-learning hadoop tensorflow keras keras-models optimization-algorithms data-parallelism distributed-optimizers

Updated Jul 25, 2018
Python

apache-spark-on-k8s / spark

Star

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/

kubernetes apache-spark kubernetes-cluster

Updated Jan 8, 2020
Scala

nchammas / flintrock

Star

A command-line tool for launching Apache Spark clusters.

apache-spark ec2 orchestration apache-spark-cluster spark-ec2

Updated Aug 3, 2020
Python

openscoring / openscoring

Star

REST web service for the true real-time scoring (<1 ms) of R, Scikit-Learn and Apache Spark models

api real-time r apache-spark scikit-learn xgboost lightgbm pmml

Updated Aug 5, 2020
Java

lw-lin / streaming-readings

Star

Streaming System 相关的论文读物

streaming apache-spark storm stream-processing spark-streaming dataflow flink heron drizzle millwheel s4 streaming-engine spe stream-processing-engine

Updated Mar 31, 2018

tweag / sparkle

Star

Haskell on Apache Spark.

haskell spark apache-spark analytics

Updated Jun 10, 2019
Haskell

rjurney / Agile_Data_Code_2

Star

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Updated Jul 29, 2020
Jupyter Notebook

infoslack / awesome-kafka

Star

A list about Apache Kafka

infrastructure kafka apache-spark stream-processing apache-kafka kafka-streams data-processing data-pipeline streaming-data

Updated Dec 22, 2019

miguno / wirbelsturm

Star

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

vagrant puppet kafka spark apache-spark storm apache-storm apache-kafka

Updated Sep 14, 2015
Shell

jaceklaskowski / spark-structured-streaming-book

Star

The Internals of Spark Structured Streaming

spark apache-spark gitbook internals structured-streaming

Updated Nov 16, 2019

LucaCanali / sparkMeasure

Star

This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.

spark apache-spark performance-metrics performance-troubleshooting spark-troubleshooting

Updated Jun 17, 2020
Scala

Hydrospheredata / mist

Star

Serverless proxy for Spark cluster

api big-data apache-spark serverless

Updated Oct 7, 2019
Scala

Improve this page

Add a description, image, and links to the apache-spark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apache-spark topic, visit your repo's landing page and select "manage topics."

Learn more

Jul	AUG	Sep
	12
2019	2020	2021