Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
-
Updated
Jun 19, 2023 - Java
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Upserts, Deletes And Incremental Processing on Big Data.
lakeFS - Data version control for your data lake | Git for data
Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
The LeoFS Storage System
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
汇总Apache Hudi相关资料
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Use SQL to build ELT pipelines on a data lakehouse.
A Data Platform built for AWS, powered by Kubernetes.
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Apache Spark Course Material
A library to accelerate ML and ETL pipeline by connecting all data sources
Add a description, image, and links to the datalake topic page so that developers can more easily learn about it.
To associate your repository with the datalake topic, visit your repo's landing page and select "manage topics."