Upgrading to Node v8 Has Significantly Reduced Our Operating Costs

We’ve bet on well supported open source projects like Google’s V8. Following an upgrade from Node.js v6 to v8, this bet has paid off. Our latencies are more consistent and our global infrastructure server costs have gone down by almost 40%.

At Ably Realtime, a distributed pub/sub messaging platform, we use a wide array of technologies in our real-time stack including Elixir, Go, and Node.js. Whilst Node.js is not often our first choice for new services, our core routing, and front-end layers are built with Node.js, which, in spite of some well-known shortcomings, continues to serve us very well.

Whilst we are continuously optimizing our infrastructure running costs, there is always a trade-off between allocating engineering resource to focus on new revenue (features) vs reducing costs (optimizing performance).

In the majority of cases, for our high growth tech business, revenue growth is where our focus lies. Fortunately though, as we have bet on a number of underlying technologies that are incredibly well supported by the community, we continue to get material cost reductions without much engineering effort.

Case in point is a recent Node.js upgrade from v6 to v8. As an effect of that, we have seen two significant improvements:

Under Load, Performance Is Less Spiky and More Predictable

In the graph below, you’ll see that in our clusters containing 100s of nodes, during the busiest times we saw nodes spiking to nearly 100% for brief periods in spite of the mean CPU utilization sitting at around the 50% mark.

                    Node.js v6: A busy cluster shows intermittent spikes across all nodes.

Yet once we completed the upgrade to Node.js v8, with a comparable load on the cluster, we see far more predictable performance without the spikes:

                    Node.js v8: A busy cluster demonstrates consistent, even performance.

We can speculate what changes in the underlying V8 are responsible for this, but in reality, the V8 JavaScript engine is improving on many fronts (specifically the compiler, runtime and garbage collector) which all collectively play a part.

Bang for our buck has vastly improved: circa 40% real world saving

When we performed load testing in our lab on Node.js v6 vs Node.js v8, we saw that in the said region, there was a 10% increase in performance. This is not all that surprising as the v8 performance improvements are well known. However, once we tested v8 in one of our isolated clusters servicing real-world production traffic, the benefits were far more significant. Whilst Google V8’s TurboFan and Ignition gave us the ability to increase the rate of operations on the same underlying hardware, the improvements mentioned above (that made the performance more predictably smooth on each node) gave us more confidence in regards to the true spare capacity we had in each cluster. As such, we were able to run with fewer nodes under most conditions.

As you can see below, in one of our busier clusters running Node.js v8, we were able to reduce our raw server costs by circa 40–50%:

                            Server usage hours in a single cluster before & after Node.js v8.

If Performance Matters, Bet on the Technology That Has the Engineering Muscle and Drive to Continuously Optimize, So You Don’t Have To

Whilst the benefits we experienced from this upgrade could be considered to be a lucky win for us, we don’t see it that way. Building an Internet-scale system without Google-scale resources requires a strategic approach to your technology choices. If you focus on projects that have a community of engineers focused on improving performance, then you’re bound to have some luck along the way.

If we started again from scratch today, we’d probably not be using Node.js as extensively as we use it now. However, in 2013 when we started Ably Realtime, we chose Node.js for our core routing layer as it provided us with technology well suited to the high I/O workloads. It also encouraged rapid development cycles, and at the time the technology performed well versus other similar technologies.

One bet we took when choosing Node.js, was that over time, it would continue to get faster, and it has, significantly so. That’s no surprise, of course, given the Google V8 engine is used in their Chrome browser. Since 2015, there’s been an average of around 20,000 lines of code changing each week in the V8 engine. That’s a mammoth amount of effort from a highly skilled engineering team.

The graph above shows lines of code added/deleted from the V8 engine repository each week.

We’ve made similar bets with other technologies we’ve chosen, which also have a large group of contributors focused not just on features, but also on continuous performance optimizations such as:

Some additional notes:

 

 

 

 

Top