Handling Service Timeouts Using Istio
The Timeout Design Pattern states that you should not wait for a service response for an indefinite amount of time — you should rather throw an exception instead of waiting for too long. This will ensure that you are not stuck in a limbo state while continuing to consume application resources.
In my earlier blog post, I discussed the fallacies of Distributed Computing and how service calls made over the network are not reliable and might fail.
Service calls made over the network might fail. There can be congestion in the network or a power failure impacting your systems. The request might reach the destination service but it might fail to send the response back to the primary service. The data might get corrupted or lost during transmission over the wire. While architecting distributed cloud applications, you should assume that these types of network failures will happen and design your applications for resiliency. Implementing a timeout strategy ensures that your microservices are resilient and available.
When you encounter latency with your service communication and you are not sure what the root cause is, it is a preferred approach to not just wait for the response. Implementing a timeout strategy between your service-to-service communication over the network is critical. Istio makes it pretty simple to implement this functionality within your service mesh.
Setting Request Timeouts
In Istio you can set a request timeout either by using the route rules explained below or on a per-request basis by adding a header entry in outbound requests. You can mention the timeout in milliseconds on the header.
By default, the timeout for HTTP requests is disabled, but it can be overridden by using a routing rule as below.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: serviceB
spec:
hosts:
- serviceB
http:
- route:
- destination:
host: serviceB
timeout: 10s
To test the timeout functionality you can inject a delay of say 10 seconds between your services. You can verify that when you invoke the service endpoint there is a delay of 10 seconds to get a response.
The next step is to configure the timeout rule to introduce a timeout period of, say, three seconds. Now you will see that the service call timeouts after three seconds and does not wait 10 seconds to get a response back.
If you want to quickly try out this capability without having to download the required tooling and installations, I would highly recommend you to try out the interactive platform right from your browser using Katacoda.