Data Access between Micro/Services

Nawab Iqbal
3 min readFeb 8, 2022

--

As any product switches to a microservices or multiple services (big or small) model, one key challenge to figure out is which service owns what piece of the data and how will other services access that data. Here are four approaches:

  1. Directly calling a shared database is not a good option. https://martinfowler.com/bliki/IntegrationDatabase.html
  2. Synchronous Call: Call the api of the data owner service. This is most straight forward to accomplish and in some cases it is necessary when the data is required in the critical path to fulfill a user request. However, it can introduce “availability coupling” (i.e. one service failure will have a domino effect on its dependent services) so necessary steps need to be taken to ensure that in the absence of a timely response (e.g. api timeout) the requesting service can gracefully fail fall back on some less accurate (e.g. older) cached data. Multiple retries is also an option in case of transient errors. Another concern is: if the response will take longer to compute (expensive queries and joins) or if the network latency is prohibitive, then client should consider using a cache for the computed data. In such scenarios, one should exlore a CQRS using Event notification (see #4). One more concern with api calls is how to protect your service with an intended or unintended DOS attack by your clients, so you will need to implement request limiting, client quotas (governor limits), and traffic lights/circuit breakers.
  3. Asynchronous Command & Response: Send an asynchronous request for data, e.g using message bus, or event stream; and receive data using a response topic or event stream. Data may also be returned using a webhook on the request service, but that will have similar issues as #1. Asynchronicity removes the “availability coupling” mentioned in the case of an api call. Since the response is generated when requested, this mechanism doesn’t create reusable topics or event streams. Also, each response will correspond to a particular request by a specific requesting service (e.g. when multiple services can call a certain data owner service) so the requesting services will use some mechanism (e.g. “Reply-To” attribute in the payload) to find the responses to their request. In the absence of the ‘Reply-to’ mechanism, each requesting service may end up creating separate topics for them, which is not ideal either.
  4. Event Notification: Data owning service propagates changes using events. (e.g. CDC). This is possibly the best option for decoupling the services, while also ensuring high availability. There is a possibility that different services may need varying detail about the object, so the data owning service will have to apply the Pareto Principle to ensure that most of the needs are met by the data in the event and if some service needs more data then its needs can be met separately (by using any of these methods, e.g. a special api, enhancing the topic with new schema, separate topic, etc.). After getting the event notification, receiving services can organize the data the way they need. The services can save them as it is (like a replica of the original data set) or update search index or aggregates so that they can be served readily when needed. https://martinfowler.com/bliki/CQRS.html
    Similarly, the event can also be fed into a data lake for analytical needs. Martin Fowler likes to differentiate between a pure event notification (which will have the minimal possible detail about something happened, e.g. just the object id, etc.) and a message with more event detail (Event Carried State Transfer).

--

--

No responses yet