apache / spark
Apache Spark - A unified analytics engine for large-scale data processing
See what the GitHub community is most excited about today.
Apache Spark - A unified analytics engine for large-scale data processing
Chisel 3: A Modern Hardware Design Language
A simple client for Android
Simple and Distributed Machine Learning
Feathr – An Enterprise-Grade, High Performance Feature Store
StreamPark, Make stream processing easier! easy-to-use streaming application development framework and operation platform
A scala library to write Http apps.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Modern Load Testing as Code
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
The Scala 3 compiler, also known as Dotty.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
ZIO — A type-safe, composable library for async and concurrent programming in Scala
State of the Art Natural Language Processing
A Scala API for Apache Beam and Google Cloud Dataflow.
A Spark plugin for reading and writing Excel files
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Monitor Kafka Consumer Group Latency with Kafka Lag Exporter
Open, Modular, Deep Learning Accelerator
Spark: The Definitive Guide's Code Repository
DataStax Spark Cassandra Connector
Apache Spark Connector for SQL Server and Azure SQL
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
Redshift data source for Apache Spark