Database Scaling
Strategies
MASTERCLASS
Exploring Scaling
Approaches
Guilherme Nogueira (Gui)
MASTERCLASS
Ask the best 5
questions in chat and
win a copy of Database
Performance at Scale
Database Scaling Strategies Masterclass
Vertical vs. Horizontal
Database Scaling Strategies Masterclass
Horizontal Scaling
Pros:
■ Leveraging commodity hardware
■ The sky's the limit (infinite headroom)
Cons:
■ Distributed Systems Tax
● Network overhead
● Challenging long tail latency
● Consistency challenges
● "N+1" failure modes
■ Different hardware involved = higher failure chance
5
…
Database Scaling Strategies Masterclass
Vertical Scaling
Pros:
■ No arch changes
■ Low operational complexity
■ Lower noisy neighbour risk
Cons:
■ Hard ceiling in compute
■ Exponential cost curve
■ Diminishing returns in going HUGE
■ Only if your DB is not prepared for it 😉
6
Database Scaling Strategies Masterclass
Both worlds combined
Is there a right way to do this?
Leverage the best of both worlds 😉
Anonymized Use-case: North America EV Manufacturer
■ AWS to On-premises
■ 256 vCPUs / 2TB ram
■ 180TB storage (NVMes)
■ 30 nodes (and growing FAST)
■ ~4PB dataset (post-replication, compression)
7
…
Database Scaling Strategies Masterclass
Sharding and Distribution
Database Scaling Strategies Masterclass
Implicit vs Explicit sharding
Explicit: App-level or plugin-driven
■ Plugins might jeopardize long-term plans
■ Cumbersome to maintain
■ Error-prone
■ Hard to react
Implicit: DB-native
■ Built-in to the tech
● Core DB, coordinators, drivers, etc
■ Perfected over time with the DB
■ Offloaded to the DB maker
9
Explicit Examples
Commonly uses explicit app-level
or plugin sharding.
Implicit Examples
Native implementations like
ScyllaDB, Cassandra, and
MongoDB.
Database Scaling Strategies Masterclass
Sharding types
Per-node sharding
■ Split at a node level
■ Sometimes virtualized
● Multiple virtual nodes on a single physical one
■ Rebalance involves a node's volume of data
Per-core sharding
■ Highly specialized
■ Each vCPU owns a piece of the dataset
■ Maximizes performance
■ Each vCPU is equally busy
■ Minimizes contention
10
Database Scaling Strategies Masterclass
Rebalance time
A.K.A. what happens when you add a node?
■ Cost of data movement (streaming)
● CPU maxed-out
● Disks busy
● Network spikes
● Tablets solves this!
■ DB behavior during resource shortage
● Cassandra has to throttle streaming/compaction in config
● ScyllaDB has Workload Prioritization
■ Client-side things to watch out for
● Application latency
● Streaming resource overhead
11
Database Scaling Strategies Masterclass
Rebalancing mishaps
Scaling is finished!
Why are the dashboards red?
What to monitor for?
■ Clients connection health
■ Hot nodes/shards
■ Data imbalance
■ Post-scaling operations
● Compactions (Cassandra)
● Aerospike (Migrations)
● MongoDB (Compactions)
12
Database Scaling Strategies Masterclass
Scaling Tradeoffs
Database Scaling Strategies Masterclass
Complexity beyond simply adding nodes
■ Cost
● Scale too soon = wastes $ on idling resources
● Scale too late = risks SLAs, lose $
■ Complexity
● Operational overhead
■ Latency
● Busy systems
● Higher latency
■ Decision Making
● When to trigger scaling?
● Cost / Risk tradeoffs
14
Database Scaling Strategies Masterclass
Why Scaling in NoSQL is different
■ Built-in to the technology
● Lower application overhead
■ Aimed at scale
● Massive datasets
● High throughput
● Lower latency
■ Frequent scaling
● Day/Night cycles
● Business/Weekend days
15
Database Scaling Strategies Masterclass
Scaling Incoherency
Doubling cluster size != Doubling DB capacity
■ Neo4j
● Graphs computing is mathematically hard to scale
■ MongoDB
● Especially for write capacity
● Add'l mongos and config servers
■ Cassandra
● Although improved on latest versions
■ DynamoDB
● Not a cluster per-se
● Scaling takes minutes to react, effectuate
■ HBase
● Heavy cluster-wide rebalance tasks, often hurts performance
16
Database Scaling Strategies Masterclass
Auto and Elastic Scaling
Database Scaling Strategies Masterclass
Auto vs. Elastic Scaling
Auto-scaling
■ Reactive to metrics
● CPU, RAM, Disk I/O, Request latency
■ DBs are not stateless
■ Data has a weight
● Shifting it around costs CPU, Disk and Network I/O
● Even when restoring from a backup
Elastic scaling
■ Leverages Cloud Infrastructure
■ Scales in incremental steps
■ Instances of different types
● Works under circumstances where others would fail
18
Database Scaling Strategies Masterclass
■ To provision or not to provision?
● Over-provisioning vs under-provisioning
■ Consequences of an overloaded system?
● Elevated latency
● Drop in throughput
● Client-side DB timeouts
● App calls timeout
● …
● Users see the effect
Cost of Elasticity
19
Database Scaling Strategies Masterclass
Tactical Responses
Database Scaling Strategies Masterclass
External caches are not a solution
■ Caches always improve performance
● Right!?
■ Often:
● Application Complexity
● Additional Latency
● Additional Cost
● Lowers Service Availability
● Ruins DB's cache
● Wastes leverage DB's potential
● Reduces Security
Check out: https://www.scylladb.com/2024/05/29/eliminating-external-database-caches/
21
Database Scaling Strategies Masterclass
Configuration Tuning
Tune the problem away?
■ Cassandra has tons of throttling configurations
● Streaming
● Compaction
● Helps during the event, but requires babysitting
■ ScyllaDB allows tuning it too, but…
● You don't usually need them
● Workload Prioritization applies to internal groups as well
■ Streaming is one of them 😉
● Tablet-based streaming is much more efficient
■ Requires zero tuning
22
Database Scaling Strategies Masterclass
Client-side tactics
Best practices for steady-state AND scaling time
■ Up-to-date drivers
■ Load-balancing client drivers
● If ScyllaDB: Shard + Tablet-aware
■ Retry policies + speculative retries
■ Use DNS endpoints
● Improve discoverability
● Curtails client-side changes after replaced
23
Pass the exam
and win a certificate
and swag pack!
Submit an accepted review of
ScyllaDB on G2 and win a $50
gift card!*
*While supplies last.
Earn $50!
Database Scaling Strategies Masterclass
Guilherme Nogueira (Gui)
Technical Director
ScyllaDB
guilherme.nogueira@scylladb.com
Keep in touch!