3

Lets say i have an application where user can register, and the username has to be unqiue value.

Now lets say i have N partitions and for each partition i have M replicas with multiple leaders.

Now i have questions regarding these scenarios:

First:

  1. User 1 attempts to register with username user1 - the write request gets routed to partition1 and to leader1
  2. User 2 attempts to register with username user1 - the write request gets routed to the same partition1 and also to the leader1,

In this scenario the behavior is same as we had just one database. First transaction occures and the second one fails since the user1 value is already here and we are operating on the same replika

Second:

  1. User 1 attempts to register with username user1 - the write request gets routed to partition1 and to leader1
  2. User 2 attempts to register with username user1 - the write request gets routed to the same partition1 and to leader2,

In this case we have concurrent write. How does this determine what registration fails and what not? We can look at this as no partition and multiple leader and as far as i researched in this case the typical solution is to either 1) prevent this by doing first scenario or 2) merge the values which is not acceptable in this case. Or solve conflicts on application level that is also not acceptable. How do DB's deal with this ?

Third:

  1. User 1 attempts to register with username user1 - the write request gets routed to partition1 and to leader1
  2. User 2 attempts to register with username user1 - the write request gets routed to the same partition2 and to leader3,

In this case all writes go to different partitions ( what makes sense to me that this will probably not happen in real life since they have same value and thus should be routed to one partition ). How would the DB resolve what registration would succeed and which one would fail? How would it lock stuff or check if the value exists and so on?

The more i read about distributed DB's and how it works (even on high level ) im more and more confused.

Thanks for answers!

1 Answer 1

0

How do distributed databases follow unique constraints?

This is going to vary from database system to database system.

In this case we have concurrent write...as far as i researched in this case the typical solution is to either 1) prevent this by doing first scenario or 2) merge the values which is not acceptable in this case. Or solve conflicts on application level that is also not acceptable.

Correct, those are pretty common solutions different database systems use. And in a multitude of commonplace scenarios they would be unacceptable or subpar solutions.

Many distributed database systems follow CAP Theorem as apposed to the traditional ACID Principles of most modern relational database systems. With CAP Theorem, consistency may be traded-off in favor of availability and partition-tolerance, as opposed to being guaranteed with the ACID Principles.

How do DB's deal with this ?

Again, depends on the specific database system.

Microsoft SQL Server has a feature that offers distributed databases called Availability Groups. In Availability Groups, there are secondary replicas that are consistent copies of the primary replica database. This is similar to the partitions you referred to. The difference with this feature to sharding in non-consistent distributed databases, is that there can only be one primary replica that can be written to. Changes are written to that primary replica and then synchronized to all secondary replicas.

The secondary replicas are used to distribute the read workload, which generally I've found to be the heavier side of querying for most use cases. The distribution of the read workload also helps reduce blocking contention for writes on the primary replica.

This design prevents concurrency problems since only a single primary replica can be written to, concurrently. And it maximizes consistency, since the primary will be the single source of truth to distribute the changes to the secondaries. Availability Groups can be configured in a mode called Synchronous, which helps ensure any change to the primary has to be written to the secondary (barring abnormalities at the system level).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.