Fighting Service Latency in Microservices With Kubernetes
CPU and network speed have increased significantly in the last decade, as well as memory and disk sizes. But still one of the possible side effects of moving from a monolithic architecture to microservices is the increase in the service latency. Here are few quick ideas on how to fight it using Kubernetes.
It Is Not the Network
In recent years, networks transitioned to using protocols that are more efficient and moved from 1 GBit to 10 GBit and even to a 25 GBit limit. Applications send much smaller payloads with less verbose data formats. With all that in mind, the chances are the bottleneck in a distributed application is not in the network interactions, but somewhere else like the database. We can safely ignore the rest of this article and go back to tuning the storage system.
Kubernetes Scheduler and Service Affinity
Co-located pods in the same region/zone/rack.
If two services (deployed as Pods in the Kubernetes world) are going to interact a lot, the first approach to reduce the network latency would be to ask politely the scheduler to place the pods as close as possible using the node affinity feature. How close depends on our high-availability requirements (covered by anti-affinity), but it can be co-located in the same region, availability zone, or rack, or even on the same host.
Run Services in the Same Pod
Containers/service co-located in the same pod.
The deployment unit in Kubernetes (pod) allows a service to be independently updated, deployed, and scaled. But if performance is a higher priority, we could put two services in the same pod, as long as that is a deliberate decision. Both services would still be independently developed, tested, and released as containers, but they would share the same runtime lifecycle in the same deployment unit. That would allow the services to talk to each other over localhost rather than using the service layer, or use the file system, or use some other high performant IPC mechanism on the shared host, or shared memory.
Run Services in the Same Process
Co-located services in the same process.
If co-locating two services on the same host are not good enough, we could have a hybrid between microservices and monolith by sharing the same process for multiple services. That means we are back to a monolith, but we could still use some of the principles of microservices, allow development time independence, and make a compromise in favor of performance on rare occasions.
We could develop and release two services independently by two different teams, but place them in the same container and share the runtime.
For example, in the Java world that would be placing two.jar files in the same Tomcat, WildFly, or Karaf server. At runtime, the services can find each other and interact using a public static field that is accessible from any application in the same JVM. This same approach is used in Apache Camel direct component, which allows synchronous in-memory interaction of Camel routes from different.jar files by sharing the same JVM.
Other Areas to Explore
If none of the above approaches seems like a good idea, maybe you are exploring in the wrong direction. It might be better to explore whether using some alternative approaches such using a cache, data compression, HTTP/2, or something else might help for the overall application performance.