Accelerate Cloud-Native Applications With NVMe

2024-12-02

Cloud-native applications have different storage needs from legacy applications or traditional software that’s hosted in the cloud. To work effectively, they require high-performing and low-latency storage. In practice, the recommendation is to use local NVMe® flash, regardless of the orchestration platform.

A cloud-native application is a software designed to run on a private or public cloud. It is built to leverage the innate capabilities of the cloud computing software delivery model. Often these applications are deployed as virtual machines (VMs) managed by OpenStack, VMware vSphere®, or containers managed by Kubernetes.

From an infrastructure architect’s perspective, the more performance available from the storage, the more it is possible to scale cloud-native applications without requiring any change to the storage system. This is especially true if the storage is disaggregated (not attached directly to servers). In that case, architects can add more servers to more and more applications without having to scale storage in tandem. In this context, network-attached NVMe flash storage is ideal to deliver high performance and low latency requirements.

Comparing NVMe-oF Protocols for Cloud-Native Applications

NVMe over Fabrics (NVMe-oF) provides the best storage characteristics for cloud-native applications—extending the low latency and high-performance characteristics of NVMe over a network to remote devices. Multiple transport options exist for NVMe-oF. However, the most commonly used are NVMe-oF over Remote Direct Memory Access (RDMA), NVMe-oF over Fibre Channel, and NVMe-oF over Transport Control Protocol/Internet Protocol (TCP/IP). All three enable the creation of an end-to-end NVMe storage solution with high performance and low latency.

NVMe-oF over RDMA—offers a way to exchange information between two computers’ main memories across a network without involving the operating system (OS) processor or cache of either machine.
NVMe over Fibre Channel—transfers data between storage arrays and servers using standard fibre channel (FC) Protocol, which supports access to shared NVMe flash.
NVMe over TCP/IP—uses the TCP transport protocol to transfer data across IP (Ethernet) networks.

Criteria	NVMe-oF RDMA	NVMe-oF Fibre Channel	NVMe-oF using TCP (NVMe/TCP)
Overhead cost	Medium	High	Low
Infrastructure considerations and complexity, including interoperability and ease of use	Complex—with scalability limitations, requiring switches with RDMA capabilities	Complex—requiring a dedicated network, FC switches, and HBAs	Simple—leveraging standard TCP/IP network. It is a scalable approach, requiring no special switches.
Accessibility	Limited	Limited	Anywhere

Comparisons between NVMe-oF over RDMA, NVMe-oF over FC, and NVMe-oF over TCP/IP looking at cost, infrastructure considerations, and accessibility for cloud-native applications.

The table above compares NVMe-oF over RDMA, NVMe-oF over FC, and NVMe-oF over TCP/IP as storage transport protocols for cloud-native applications. In terms of overhead cost, NVMe-oF over FC emerges as the most expensive of the three. This is because fibre channel requires a dedicated network that requires FC host bus adapters (HBAs). The protocol also needs FC switches. These items contribute to a higher cost for fibre channels versus RDMA or TCP.

Of the three, NVMe-oF over RDMA comes with overhead costs that are in between those of Fibre Channel or TCP. RDMA does not require a dedicated network, but the protocol does need special RDMA switches. In contrast, NVMe-oF over TCP/IP does not need its own switches, adapters, or network. It is, therefore, the lowest-cost option in most cases.

RDMA and fibre channel rates are lower than TCP in terms of infrastructure complexity, ease of use, and scalability. RDMA faces limits to scalability due to its need for RDMA switches. Fibre channel is similarly complex because of the requirements for FC switches, HBAs, and a dedicated network. It is a complex process to scale beyond a single rack of RDMA or FC storage using a single switch. Routing limitations present themselves in this scenario, as well.

NVMe over TCP is comparatively simple. It just runs on a standard Ethernet TCP/IP network. There is no need for special network adapters or switches. And, it's more highly scalable and routable—easy to scale across multiple routes and different networks. This difference also relates to the accessibility of storage. Cloud-native applications are relatively less accessible to RDMA and FC storage when compared to TCP. NVMe-oF over TCP/IP is accessible to cloud-native applications anywhere.

NVMe-oF and Kubernetes for Cloud-Native Applications

Kubernetes is an open-source container orchestration system that automates software deployment, scaling, and management. A server running cloud-native applications might easily have Kubernetes running dozens or hundreds of microservices, which would clearly affect the way the servers interact with storage resources.

Many cloud-native applications use the microservices architecture, an architecture that efficiently allocates resources to smaller (micro) services designed for a particular task—making the application flexible and ideally suited to cloud software architecture.

Some cloud-native applications are input/output (IO) intensive. They require high bandwidth and low latency. They need a storage solution and a protocol that meets those requirements. Other times, the containers collectively working as applications may not require a lot of I/O operations per second (IOPS) and low latency; instead, the issue is volume. In the cloud, there might be hundreds or even thousands of applications running in parallel as containers on many physical servers. Taken together, they create a requirement for high performance from storage, coupled with high bandwidth and low latency.

Cloud-native applications orchestrated by Kubernetes have distinctive storage needs that can be met by NVMe-oF. Most Kubernetes apps are data-intensive, so they need the kind of high-performing storage that NVMe-oF provides. This is particularly true for stateful applications like databases. These need low latency from storage in order to drive acceptable levels of application performance. Storage for such apps must also be able to scale easily if the app itself is to scale.

Kubernetes apps also do well when they have access to the right level of resources. Whether it’s computed, network, or storage, the resources supporting a Kubernetes app should ideally be available in the right proportions. This is not possible if the app relies on Direct-Attached Storage (DAS) connected to the server running the Kubernetes app. It is virtually guaranteed that the server will have DAS that is either under or over-utilized. Neither is desirable from a performance or economic perspective.

A disaggregated approach is preferable. If storage is not directly connected to servers running Kubernetes apps, it is possible to achieve dynamic, independent scaling of computing and storage. With this approach, storage will always be available to the Kubernetes app in the right proportion and with the right performance characteristics.

Disaggregated storage also results in the ability of the Kubernetes app to access data from anywhere. The app and storage become portable, in effect. This is beneficial because Kubernetes apps may run and move across different servers. Data has to be accessible across all these places. Some architects solve this problem by placing storage in a cluster.

Benefits of NVMe-oF Over TCP/IP for Cloud-Native Applications

NVMe-oF over TCP/IP offers several benefits for cloud-native applications beyond great accessibility, lower overhead costs, and reduced complexity. It provides virtualized and centralized storage pools that can act as local flash storage. With this approach, NVMe-oF over TCP/IP means accelerated application performance, coupled with easy and efficient scaling.

For Kubernetes apps, the right NVMe-oF over TCP/IP implementation can deliver the performance of DAS but through a clustered storage solution with the convenience and ubiquity of Ethernet networking. Ethernet is already present in every IT environment, so it’s easy to deploy a portable storage cluster.

Conclusion

NVMe emerges as the best storage medium for cloud-native applications. It serves the performance needs of Kubernetes apps and comparable cloud-native software. Out of NVMe transport options, NVMe-oF over TCP/IP presents the best mix of qualities for cloud-native workloads. It enables high-performing storage at lower levels of cost and complexity compared to NVMe-oF over RDMA or fibre channels.