How to Crater Your Database, Part Four - Consistency

#serverless #aws #dynamodb #architecture

Part One
Part Two
Part Three
Part Four <-

Intro

At first, life was simple. You had a web server running on a SQL database, and your prototype was successful. Every action was 100% immediately consistent because you wrapped it in a SQL transaction. You went live. Your early customers loved your product. You grew. You needed to scale. You may have even hired someone like me to help. One of the first things out of my mouth would be, "You do everything in a transaction. You integrate at the data layer. You can't scale like this." "But," you may reply, "We need consistency!" You do, but not like this and not to this extent. Consistency matters, but you don't need complete consistency; context is king.

Where is consistency essential? Where is it not? What are the costs of consistency? I want to address these questions to provide you with an indication of how I approach the issue. But first, let's define our terms.

What is consistency?

Consistency is the 'C' in ACID, which outlines properties of certain database transactions. These transactions aim to ensure data validity across four dimensions: atomicity, consistency, isolation, and durability.¹ Here, I use the term "transaction" more broadly than the SQL begin/commit/rollback transaction commands with which some of you may be familiar.

As part of ACID, consistency means that your change does not violate any database constraints or invariants, and that future interactions will reflect your change. This helps to prevent data corruption.

Consistency is also the 'C' in CAP, representing three trade-offs: consistency, availability, and partition tolerance. As used in CAP, consistency means that "Every read receives the most recent write or an error."² Note that this differs from the ACID definition.

Consistency is not inherently bad. Further, it is easy in the beginning. Most databases offer efficient consistency at a row or record level. You can insert a record and read it immediately. This is sometimes referred to as "read-after-write consistency."

Problems arise as developers strive for business-level consistency. For example, suppose I have an e-commerce site and want my Order service to complete an order only if there is inventory available, with no possibility of an out-of-stock sale. In that case, I will need to tightly bind (couple) my Inventory service to my Order service. Many engineers would implement this with a single database transaction across the order and inventory tables. Now, the team that runs the Orders service and the team that runs the Inventory service must share the same database server. Furthermore, because they are integrating at the data layer (please do not integrate at the data layer), they cannot scale independently of each other.

If you cling tightly to consistency, you will lose the ability to scale. The corollary is also true: those who have successfully scaled have carefully constrained how they use consistency.

You don't need complete consistency

Let's revisit the example I used above about orders and inventory. While we would prefer only to allow a sale when we have an in-stock item in the warehouse, this isn't easy to enforce. I argue that it isn't necessary.

Imagine having a loosely coupled, scalable system where it's possible to occasionally oversell an item. What of it? If you find that you cannot fulfill an order, issue an apology to the customer and inform her that her order will be fulfilled as soon as the item is back in stock. You could also include a discount coupon for a future purchase. This "apology workflow" is significantly less expensive than engineering a fully consistent order and inventory system and attempting to scale it. Use it.

Airlines and hotels often do this. They don't need airtight consistency when selling seats or rooms. They only need a way to remedy a problem when it does arise.

Real world example: Starbucks

As Gregor Hohpe points out in his 2020 book, The Software Architect Elevator, there are ample sources of design guidance in the real world that we should consider. One of these came from his observations about how Starbucks handles orders to maximize throughput.

First, the Starbucks cashier takes your order (and your money). Then, your order is completed by the barista, and you finalize the transaction by picking up your drink. The two actions are decoupled and independently scalable. If you were to wait at the cashier for your drink before you handed over your money, you would have an airtight transaction, but this would ruin throughput and scale poorly. Starbucks figured this out and jettisoned "consistency" to achieve high-volume output. I daresay the approach has been a smashing success.

Gregor first published this example in an essay,³ which is freely accessible. I highly recommend reading and re-reading it until it sinks in. You don't need complete consistency.

Real world example: Car Dealership

If you've read Martin Kleppmann's Designing Data-Intensive Applications, you may recall an example of his in the "Transactions" chapter about car sales (p. 369. Kindle Edition). He asserts that once a car sells, it should no longer be listed. So far, so good. However, he proposes that this is a canonical example of why you should use a DB transaction that covers both the car's title and its advertising. He is wrong; this is a mistake.

Kleppmann's goal is to prevent the car from being sold twice. This is crucial to the business; having two people pay for the same vehicle would be a disaster. However, the importance wanes when we get to the advertisement about the car. Is it so terrible that the ad lives an extra minute or two before it is removed? If someone calls about the vehicle, a salesman can look up the title and tell the caller, "I'm sorry. That vehicle has sold." (apology workflow)

The cost of wrapping the advertisement and the title in the same transaction (and therefore residing in the same database server) is enormous. If you follow Kleppmann's advice, you cannot scale marketing independently of titling. In effect, you are saying, "The advertisement's consistency is as important as the title's consistency," and that is flat-out false.

Eventual consistency

Let's review the three examples I've included: e-commerce inventory, Starbucks fulfillment, and a car dealership. In each, there are essential operations that ought to be consistent. In each case, this centers on payment. We want to ensure that the customer has paid for her item exactly once and that no other customer has paid for the same item.

Here, the A.C.I.D. properties are helpful. We want this payment to be atomic, consistent, isolated, and durable. However, adjusting inventory, handing off your drink, or retiring an auto listing are all handled later. And, once complete, they make the overarching transaction consistent. This part of the interaction is made consistent "eventually."

Eventual consistency is all around us. If I buy tacos for lunch, the charge will eventually show up on my monthly statement. If I hire a roofer to replace my roof, my house will eventually get a new roof. If I borrow money to pay for the new roof, the lender will eventually be repaid the principal plus interest. In all these cases, the entire transaction becomes consistent over time.

Why DynamoDB is different

AWS recognized that if it wanted to provide scalability for DynamoDB, all its operations could not be entirely consistent. They designed DynamoDB to be consistent where it counts, and less so where it doesn't, similar to the examples I presented above. They have constrained where consistency is applied.

For example, all of DynamoDB's Global Secondary Indexes (GSIs) and event streams are eventually consistent. This is done to support predictable write times while keeping each write fully A.C.I.D. compliant.

Since 2018, DynamoDB has supported transactions. However, the intent was not to lock your inventory to your orders or your titles to your advertisements, but to support features such as unique properties, idempotency, and authorization checks. For detailed examples (with code) of these use cases, see Alex DeBrie's 2020 article, "DynamoDB Transactions: Use Cases and Examples" (link in References, below).

Summary

You don't need consistency everywhere and over everything. Identify where consistency matters and ensure it at that point. Let techniques like the "apology workflow" and eventual consistency help smooth out your distributed and now scalable systems. Be cautious when reaching for database transactions; use them judiciously.

Happy building!