How Can You Build Resilient Micro-services

The micro-service architecture is the state-of-art in the current software development decade. Coding an application, building via a pipeline, packing into a docker and deploying on a docker platform are daily work for the developers and the DevOps-colleagues. They are enjoying the development and deployment independence very much. If a software development project manager says he wants to build up a monolith, he will very probably find no one to support .

Micro-service is not nano-service

But as I observed, the software architects and developers often forget the other important aspect besides the “independence” of the micro-services: resilience, which asks the services for independent running. They decouple the previous monolith vertically and horizontally into “nano-services” which communicate with each other synchronously during run-time. Each shared module is built as a independent service. The services are now just calling each other via REST instead of the direct program function calls.

The consequences of this architecture are:

  1. Failure avalanche: In case one nano-service which is called by one or more other services synchronously crashes, the calling services will not able to work any longer. This would bring a so called avalanche effect, which leads more and more services stop working until most of your business branches strike, even if they have nothing to do with each other in the business point of view.
  2. Error hideout: In the failing case mentioned above, you would spend hours to find which service actually is the first one who causes the problem. The reason is not just due to the technical logging position, but also because the services are normally owned by different teams! You have to contact all related teams, if you know them occasionally, one by one to ask them what’s going on there.
  3. Performance impact: Each REST-call means client-side serialization, network transferring and server-side de-serialization. The same to the responses, too. Longer the synchronous call chains are, more delays the callers have to expect. Besides, a common solution is that the end user’s identity is sent from front end to back end and some kind of tokens will be generated for further calls. More services are called, more generations and validations of the token will be performed.
  4. Solution compatibility: The nano-services are bound together so tightly, that there exists in fact no branch-independent solution. Each solution you build into the shared services has to be considered, implemented and tested carefully, whether it has any impact on all of branch callers. Each new feature and bug-fix will cost more and more time and budget to cover all needs of the branches. Unfortunately in the real complex projects, you would not even get a complete list of the all callers - worse than the monolith, by which you could at least get the call stack easily. The fatal impact to the business is that, i.e. if the branch A need a new feature, it will perhaps have to wait for a while until an implementation which is compatible to the branch B comes out. That is absolutely waste of time!
  5. Vague Responsibility: The horizontal technical separation of micro-services makes no one team is really responsible for a business process from head to foot. A function call from UI to database may depends several different teams with different domain knowledges.

Resilient Micro-services

Eberhard Wolff mentioned in his presentation “Why micro-service fail” in Software Gathering 2019 Munich three cures for this extreme de-coupling problem:

  1. Do not use share libraries among the micro-services, because if the libraries change, you have to re-deploy all services again (See also my blog https://medium.com/@siweheee/do-we-still-need-team-shared-basis-libraries-in-age-of-micro-services-8641bccf1251).
  2. Do not share model services among the micro-services. Instead just build a micro-service according to its bounded business context. It means, each service should have its own independent models. Maybe the shared services and models seem to be similar at the first glance, but they are located in their own business context and have their own specialties and reasons to be changed. As shown in the diagram below, such vertically split micro-service can be developed, deployed and be running independently and resiliently.

3. A service should call another service asynchronously (except i.e. calling an authentication or authorization service), first of all, via events. If the service has to call another synchronously, you should think again whether you have split the services properly.

You would maybe ask: Are we going back to monolith? No, the problem of monolith was that it mixed business contexts together. The micro-services in comparison with it are intended to separate the business contexts.

From the technical point of view, the built micro-services will contain certain code duplicates. Yes, that is true. It has its good reason: The micro-service design focus on independence and resilience. The duplicated parts are actually independent in the business context.

Certainly I did not mean the teams have to reinvent the wheels again and again. They could still communicate to make their solutions similar if the business requirements are similar. What I did mean here is that the teams should not be forced technically to change and redeploy their applications if the changes are not suitable to their business context and development clock.

Conclusion

Please do not overdress your fear against monolith by means of separating it to extremely small nano-services. Please just cut the monolith vertically into micro-services according to the business domain with allowing certain code duplications.

Hint: You could construct your system by referring to the software architecture patterns, which include but not only micro-services.

10 years software architect who likes designing and programming with Java and Angular, who can lead, participate and follow, who is always listening and open