microsoft / SynapseML
Simple and Distributed Machine Learning
See what the GitHub community is most excited about today.
Simple and Distributed Machine Learning
Apache Spark - A unified analytics engine for large-scale data processing
Spark: The Definitive Guide's Code Repository
A fault tolerant, protocol-agnostic RPC system
Feathr – A scalable, unified data and AI engineering platform for enterprise
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Rocket Chip Generator
FireSim: Fast and Effortless FPGA-accelerated Hardware Simulation with On-Prem and Cloud Flexibility
The pure asynchronous runtime for Scala
FEEL parser and interpreter written in Scala
Rawls service for DSDE
Pure Scala Artifact Fetching
State of the Art Natural Language Processing
The Scala 3 compiler, also known as Dotty.
Useful utilities for BAR projects
CMAK is a tool for managing Apache Kafka clusters
A Lift web app wrapping Dumbster, the fake Smtp Server
Java, Scala and Spring SDKs for Kalix
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
A Spark plugin for reading and writing Excel files
Declarative, type-safe web endpoints library
workbench identity and access management
Notebook service
Snowplow Enrichment jobs and library