
Significantly More Cost-Efficient Than Flink
Both RisingWave and Apache Flink are designed for building real-time stream processing applications.
Flink specializes in stream processing, but it relies on other datastores for serving real-time data, which may not be optimized for that purpose. In contrast, RisingWave is built on Unified batch and stream processing architecture with built-in serving layer. This is crucial because it enables data processing within a single framework and reducing complexity and cost.
the Clear Choice for Cost Efficiency?
RisingWave was created during the cloud era. By adopting a modern decoupled compute and storage architecture.
- RisingWave achieves better elasticity and cost efficiency. In particular RisingWave persists its data in S3 or other compatible cloud storage services.
- RisingWave can handle complex streaming joins over large time windows and recover from failures in seconds, not minutes or hours.
- The new architecture also allows each component to be optimized separately, reducing resource waste and avoiding task overload.
As a computing framework born during the Hadoop-dominant big-data era, the architecture of Flink was heavily influenced by the MapReduce paradigm. The coupled compute and storage architecture enables Flink to achieve high parallelism and scalability.
However, this very architecture can give rise to concerns regarding execution costs. Due to the nature of its local state storage, e.g. RocksDB, Flink needs to scale large enough to handle large streaming joins and other stateful stream processing tasks.
Apache Flink | RisingWave | |
---|---|---|
License | Apache License 2.0 | Apache License 2.0 |
System category | Stream processors | Streaming database |
Architecture | MapReduce-style Coupled compute-storage | Cloud-native Decoupled compute-storage |
Programming API | Java, Scala, Python, SQL | SQL + UDF (Python, Java, and more) |
Client libraries | - | Java, Python, Node.js, and more |
State management | RocksDB in local machine; periodically checkpointed to S3 | Native storage persisted in S3 or equivalent storage |
Query serving | Support batch mode execution | Support concurrent ad-hoc SQL query serving |
Correctness | Support exactly-once semantics and out-of-order processing | Support exactly-once semantics, out-of-order processing, snapshot read, and consistent read |
Integrations and tooling | Big-data ecosystem | Big-data ecosystem, cloud ecosystem, and PostgreSQL ecosystem |
Learning curve | Steep (Flink-specific interface) | Extremely shallow (PostgreSQL-Like experience) |
Failure recovery | Minutes to hours (depending on specific system configuration) | Instant |
Dynamic scaling | Stop the world | Transparent and instant |
Performance cost | High | Low (especially when handling complex queries like joins) |
Typical use cases | Streaming ETL, streaming analytics | Streaming ETL, streaming analytics, online serving |