Developing and Testing Services Among a Sea of Microservices

Microservices are great. They allow you to create large, complex applications using a large development team, without each team member needing to understand the complexities of the entire application. Each developer must understand only the service(s) for which they are ultimately responsible.

However, with all the advantages of building microservice-based production services, very little attention is being paid to the complexities of how you test a microservice. While there are many options, one option stands out for most situations.

Consider the situation: You own microservice #37 out of a system that contains 500 microservices. Your service takes inputs from one set of microservices, performs some operations and, in the process, it makes use of a few other microservices. Your service is right in the middle of a complex tangled web of microservices, such as shown in Figure 1.

A complex microservice application

Figure 1. A complex microservice application

Also, what about the services you call? You depend on those “output” services to allow your service to operate. But how do you get those services working? Oh, and by the way, those services require a set of other services to operate as well. How do you get those working?

Do you really have to get all 500 microservices working on your laptop computer in order to verify that your one microservice actually works? Is it even possible for you to do this? Will they all fit on your single laptop? Should they fit on your laptop? Also, what about external services provided by another company, such as a SaaS service? You certainly can’t put another SaaS service on your laptop!

Well, you have a few options. Which option is right for you depends on the size and complexity of your service, and the size and complexity of your application as a whole, along with the tools and services you have at your disposal. Here are the options.

Option #1: Put All Services on Your Laptop

The first option is to take all of the services that make up the entire application and put them on your laptop. This may work well for a smaller application, but if your application is large or has a large number of services, this solution won’t work very well. Imagine having to install, update, and manage 500, 1,000, or 5,000 services in your development environment on your laptop. When a change is made to one of those services, how do you get it updated? How often should you update? With hundreds and thousands of services, you might have to apply hundreds or thousands of updates just to make sure that this time you are running the production version of each service.

This says nothing about the resources required. How much memory do you need? What are the CPU requirements? A large microservice application may be designed to run across hundreds and hundreds of servers. Will it even fit on a single laptop? Or will you run out of resources, such as the poor user shown in Figure 2?

Testing entire application on your laptop

Figure 2. Testing entire application on your laptop

Option #2: Developer-Specific Cloud Instances

The second option solves some of these issues. Imagine having the ability to click a button and deploy a private version of the application in a cloud-based sandbox accessible only to you. This sandbox is designed to look exactly like your production environment. It may hopefully even use the same Terraform configurations to create the infrastructure and get it all connected, but it will use smaller cloud instances and fewer instances, so it won’t cost as much to run. Then, you can link your service running on your laptop to this developer-specific cloud setup and make it look like it’s running in a production environment. Each developer would have their own instance of the application that they can use for their test purposes. This is shown in Figure 3.

Testing in a developer-specific cloud instance

Figure 3. Testing in a developer-specific cloud instance

With proper automation and tooling, this model can avoid much of the complexity and version issues associated with setting up thousands of services. Assuming your application operations team uses automated Terraform-like infrastructure configuration in order to bring up your production environments (it does do this, doesn’t it?), it’s possible to use this as the basis of launching smaller development clones of the production environment for your development teams. Due to the magic of the Cloud, these pseudo-production-like developer environments can be launched when needed and destroyed when they are no longer needed. You can have as many or as few of these environments running as required in order to facilitate the development and testing actions in progress by all of your developers. This model provides a good, fairly inexpensive model for creating an environment to test your service.

Yet, it’s still not perfect. After all, while you can automate the setup of the other services, there will undoubtedly be maintenance you’ll have to perform to keep those other services running. You will likely still be spending more time making sure this huge test harness is operating, rather than spending your time actually testing your service.

This model does have many advantages, but it doesn’t do anything to reduce the testing complexity involved in testing in a large microservice application.

Option #3: Connect to a Pre-Production Environment

This is a very similar model to the previous option. You test in a production-like environment that is set up in the Cloud. The difference with this model is that a single pre-production environment is used by all developers to test all of their services: it’s a shared environment. This is illustrated in Figure 4.

Using a pre-production environment

Figure 4. Using a pre-production environment

This model reduces the complexity of testing, because the developer has to understand only how their service works. They don’t have to know how to keep all the other services operating.

However, there is a great chance of testing collisions when using this model. Imagine if you are testing a service while someone else is testing a neighboring service. You could do things in your tests that impact the capability of the neighboring service to operate correctly. Your testing could break their testing. Coordination on the use of this pre-production environment becomes critical, complex, and potentially problematic.

It’s for this reason that pre-production environments are mostly used for final integration testing rather than individual developer testing. Having individual developers use them for their testing can effectively work only if you limit the types of activities you are allowed to do to activities that don’t impact other users of the environment. When this isn’t possible, you have to use a different option.

Additionally, pre-production environments suffer from a data shortage problem. Keeping a pre-production environment filled with data and simulated customer interactions so that it actually looks and acts like production can be daunting. For these reasons, more often than not, a pre-production environment does not provide a good testbed for developer testing.

Option #4: Connect to Production

This leads us to the next option: connecting your service to the production environment to see how it works. This seemingly has all the advantages of the pre-production option without having the problem with simulated traffic. You are using real, live customer traffic to test your service in a real, working, production environment.

Production testing is extremely important to a healthy application, so, there is nothing wrong with testing in production per-se.

However, it can be very difficult and very dangerous to connect a developer’s untested service running on their laptop to the production application. Even if they are careful and make sure they don’t perform tasks that would be too dangerous to perform in a production environment, the risk to production is still very high. Additionally, the openings in security that would be necessary to facilitate connecting your laptop to production are unacceptable security vulnerabilities for almost all modern production environments. It certainly violates many security best practices.

No, connecting your laptop to production is not a good option.

Option #5: Use Service Mocks

The last option is to use service mocks. If you want to test your service on your laptop rather than connecting it to real copies of your service dependencies, you connect it to mocked versions of those services.

For services that call you, you create mock services that simulate the same calls the real service will make to you. For services that you call, you create mock services that respond in the same way those services will respond. This is illustrated in Figure 5.

Use service mocks

Figure 5. Use service mocks

Notice that you need to mock only the services that connect directly to your service. No other service needs to be included in the environment. This means when you are developing and testing on your laptop, you don’t need connections to any other external service. All you need are the service mocks that simulate the connection to those services. This requires fewer resources, and hence, can easily be run on your laptop development environment.

This option has the added advantage of working for third-party services, such as SaaS services. You can just as easily simulate how Stripe responds to making a credit card transaction using a mocked version of the Stripe service, then you can by calling the Stripe service directly.

This also makes it easier to test failure conditions. It’s much easier to teach a service mock to send a bad response than it is to teach a real service to generate a bad response.

With service mocks, you can even simulate load testing, allowing you to do some limited load and scaling testing, and even some simulated availability testing in a development environment. This means you can do a better job of validating the operation of your service in general.

The problem, of course, is creating and maintaining the mocks. Services can have complex interactions between them, and writing simulated service transactions can sometimes be just as complex as the service you are trying to mock itself. Also, when the other services update and change their APIs, you have to keep updating your mocks to work for you. Finally, testing against a mock gives you a test against the way the service dependency should respond. It may or may not respond in a way that’s consistent with what you expect it to do. This can create invalid test scenarios.

One solution is to use a tool that can create the mocks for you. One common model for doing this is to examine the traffic flowing between services in a real production environment, then using the traffic as a template, create a mock that simulates the creation and/or consumption of the same data. This allows you to use your service, sandwiched between these automated mocks, and validate that your service works as expected. If a dependent service changes its API, you can simply re-examine the production service and create a new template, updating the mock to match. This method also increases the likelihood that a service mock will perform in the same manner as the production service, because the mock was created by watching the production service.

This model does require tooling to create these mocks. While there are public domain tools that provide partial solutions to mock creation, tools such as Speedscale can capture the production traffic and turn it into useful mocks for testing purposes.

No One Right Answer

Unfortunately, there is no one right answer to how you test your services in your microservice-based application. For small applications, it’s easier to put the entire application on your development laptop. However, as the application grows in size and complexity, this quickly becomes unreasonable to continue. Depending on your application, any one of these answers may work well, or may not work at all.

Using service mocks seemingly has the best capabilities with the fewest disadvantages. If you have access to the tooling—and it’s getting easier and easier to get access to useful tooling—start with that option, as it will give you the best likelihood of success. It’s certainly a much better option than testing against your pre-production or, worse yet, your production environment.

 

 

 

 

Top