Powerful Queries in Microservice Architectures
Motivation
As platforms and applications grow, it is common for the following query functionalities to become required by clients:
In addition, as platform engineers we want to make sure that clients can only perform actions and can only receive the data they are actually authorized for.
Satisfying these requirements can become challenging when working in a microservice landscape, especially one which follows the database-per-service pattern through which each microservice owns its own data store. How can this data be efficiently joined?
Solution
Today, I want to present one way to implement powerful queries in such a setup: the command query responsibility segregation (CQRS) pattern.
In this pattern, we separate the responsibilities of processing incoming read and write requests into separate services. Have a look at this reference implementation built on top of AWS:
Explanation
Each microservice owns its entities, also known as "domain objects", in the form of domain/business logic and persistent storage. However, it only processes incoming write requests (create/update/delete). The service decides on its own authorization logic, that is, who can see and manipulate its entities, but stores the authorization relationship between users, roles, and entities in a separate, internal authorization service which is really just a thin API wrapper with almost no business logic around a database.
Read requests are processed by a special query service which contains a central data store, in this case based on an OpenSearch/Elasticsearch domain but a SQL database could work as well. This store is filled with "data transfer objects" (DTOs) representing pure state through a pub-sub mechanism. Although the query service is not aware of any domain logic pertaining to an individual entity, it is aware of relationships between entities. For example, it may be necessary to create a filter for entity A.1 based on an attribute of B.1 (imagine filtering orders by the country in which customers live). The query service is also not aware of authorization logic. To ensure that only visible data is returned to the requesting client, it performs a request to the authorization service to obtain a list of all visible entity IDs for the client and includes this information as a filter condition on the data store.
Recommended by LinkedIn
Final remarks
As in all microservice architectures, it is paramount to keep coupling between services to a minimum. Use domain-driven design to decide on service boundaries and never allow circular, synchronous dependencies between services.
Use asynchronous communication between services where possible to reduce coupling. The downside of this approach is that the system is only eventually consistent but that is acceptable in most scenarios. For example, service A still returns the result of a CREATE operation synchronously to the client although a READ immediately following a CREATE could result in a NOT FOUND error as the pub-sub system may need a few hundred milliseconds to update the query service.
Think about how you want to set up your pub-sub system. There are simpler, managed solutions using AWS SNS/SQS or EventBridge but depending on what you choose you may have to compromise on features such as message persistence (replay) or FIFO. If you need all the fireworks, go for Apache Kafka (Amazon MSK) or Amazon Kinesis. The point is: if you're bothered by the "re-index" arrows in the diagram, there are ways to get rid of them.
Hide your implementation details. In the diagram, notice that there is a gateway in front of all microservices including the query service. The client cannot see that you've implemented CQRS and neither should they: make it as easy as possible for the client to get the data they need. That is why I'm proposing you use federated GraphQL as a self-documenting, scalable, single entry point way to communicate with the outside world. AWS supports a managed type of federation through AppSync Merged APIs. But feel free to implement federation yourself. You could run gateway + services on a shared Kubernetes cluster, for example by using NestJS + Apollo Federation, or you could consider running only the gateway on Fargate (the gateway is stateful) but stick to Lambda for the individual services. But that's a topic for another post.
I hope you enjoyed the read!
Further reading
Newman, S. (n.d.). Building Microservices. “O’Reilly Media, Inc.”
Bellemare, A. (2020). Building Event-Driven microservices. O’Reilly Media.
Skelton, M., & Pais, M. (2019). Team topologies: Organizing Business and Technology Teams for Fast Flow. It Revolution Press.