spotify / scio
A Scala API for Apache Beam and Google Cloud Dataflow.

See what the GitHub community is most excited about today.
A Scala API for Apache Beam and Google Cloud Dataflow.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Scala 2 compiler and standard library. For bugs, see scala/bug
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Apache Spark - A unified analytics engine for large-scale data processing
Apache OpenWhisk is an open source serverless cloud platform
CMAK is a tool for managing Apache Kafka clusters
Chisel 3: A Modern Hardware Design Language
Microsoft Machine Learning for Apache Spark
Rocket Chip Generator
Async Scala-Akka-Netty based Load Test Tool
A service mesh for Kubernetes and beyond. Main repo for Linkerd 1.x.
Approximate Nearest Neighbors in Spark