AWS Lambda Performance and Cold Starts

This article was written by Clay Smith, a Developer Advocate at New Relic. The original article is located here.

One of the most discussed components of serverless compute architecture is Function-as-a-Service (FaaS) products like Amazon Web Services Lambda. AWS Lambda and competitors like Google Cloud Functions or Microsoft Azure Functions are designed to let developers write scalable code without having to think about the details of the container, operating system, or infrastructure that actually runs the program.

While this offers a less complex (and potentially much less expensive) way to build systems, it presents a new challenge to operators and developers: How do you build fast and resilient functions when many traditional system and application metrics are either unavailable or no longer relevant?

To help answer that question and make more informed performance decisions, let’s look at metrics from AWS Lambda functions that respond to external API requests. We’ll analyze function invocation time and HTTP request timing data for Lambda functions behind an API Gateway to understand the latencies of different components.

Observing Cold Start Time in the Real World

Cold start time refers to the increased invocation time that can occur when a Lambda function is invoked after not being used for a long enough period of time. Observing cold start time is relatively straightforward. In this example using data from New Relic Infrastructure’s AWS Lambda integration, we can see how long functions took to execute for four example functions created for this post:

lambda1

In the chart above, the function in green executes faster than the others. It finishes in less than 100 milliseconds in most cases. (This threshold is important since, as of early 2017, AWS bills Lambda functions in 100-millisecond intervals.)

Functions will vary in their execution time, yet all of these functions are identical and run in the same region. The only difference is how often they are triggered from an external API request. Using New Relic Synthetics monitors, the green function (memInfo0) is invoked once a minute, while the other functions are invoked every 10, 30, and 60 minutes, respectively. We also can count the number of invocations (in order to safely remain in the AWS Lambda free tier):

lambda2

In theory, because the green function is kept warm by Lambda’s internal scheduler so that it can more quickly respond to frequent requests, it should execute faster in response to its event trigger. While it’s nice to have a Lambda function execute slightly faster, does it actually matter in the real world when responding to an external API request?

End-to-End Request Visibility Using Synthetic Checks

To better understand how Lambda functions can respond to real-world requests (including network latency, TLS negotiation, and connection time), we’ve set up and collected data from Virginia-based synthetic check monitors invoking identical Lambda functions behind an API Gateway running in a West Coast AWS region.

lambda3

This data provides a more detailed view of requests powered by Lambda functions than just looking at function invocation time. Surprisingly, the performance benefit we received from warming the green function had no significant impact on how quickly the external request completed and the warmed Lambda function sometimes responded more slowly than the cold Lambda functions did.

The data also reveals that the time needed to establish a secure TLS connection to the API gateway from across North America — more than 200 milliseconds in some cases — is more significant than the Lambda execution time itself.

lambda4Because these functions rarely operate in isolation, understanding the latencies of different components — including clients, network latency, gateways, and other service dependencies — is important to understanding what (and what not) to optimize.

Lambda Optimization: Good Questions to Ask

In our example, the benefit of keeping a simple Lambda function warm was overwhelmed by other considerations. However, this is not necessarily true for all Lambda functions. Here’s a checklist of questions to ask when analyzing performance:

Don’t Think Servers; Still Think Performance

As many others have said: Serverless does not mean no servers. Instead, as noted in the AWS console, serverless is more a question of not having to think about servers. While a serverless approach can simplify some computing tasks, the need to collect, understand, and analyze performance data remains as important as ever.

Functions-as-a-Service products are still new, and as developers acquire better tools and get more familiar with designing, operating, and troubleshooting FaaS systems, we should get better at building more impressive serverless architectures and apps. Until then, looking at the data and asking hard questions is a good place to start.

 

 

 

 

Top