Kubernetes On-Premises: Why and How
Kubernetes has achieved an unprecedented adoption rate. The main reason is due to the fact that it substantially simplifies the deployment and management of microservices. Almost equally important is that it allows users who are unable to utilize the public cloud to operate in a “cloud-like” environment. It does this by decoupling dependencies and abstracting infrastructure away from your application stack, giving you the portability and the scalability that’s associated with cloud-native applications.
Challenges With Kubernetes On-Premises
There is a downside, however, since Kubernetes is known for its steep learning curve and operational complexity. As opposed to using Kubernetes on AWS or Azure – where your public cloud provider essentially hides all the complexities from you – running Kubernetes on-prem means you’re on your own for managing these complexities – including etcd, load balancing, availability, auto-scaling, networking, roll-back on faulty deployments, persistent storage, and more.
In addition to building services to deal with the aforementioned complexities that public clouds generally solve for you, deploying Kubernetes on-prem in a DIY fashion also involves a considerable amount of core code modification, including:
- Changing hostnames by modifying the etcd/host. This is done because the default first interface is connected to a non-routable host-only network.
- Modifying host file configurations so hosts can communicate with each other by hostname.
- Verify each host has a unique MAC and product_uid.
- Configuring OS-level settings by enabling br_netfilter Kernel Module and disabling SWAP.
All of these concerns apply to bare metal deployment managed by hand. If the on-prem Kubernetes uses OpenStack/vSphere with software networking – where IPs are managed by the cloud platform – to manage the infrastructure as a private cloud, then you would use these to manage the infrastructure for the worker VMs.
But even in a bare-metal cluster, the worker nodes can be programmed to talk to a DNS system to get an IP which lives for their lifetime. kubeadm provides a single command to add/remove node from cluster given ip and ssh access
What’s more, keep in mind that you’ll need to upgrade your clusters roughly every 3 months when a new upstream version of Kubernetes is released (you’d typically use kubeadm for that), as well as address the complexities of managing Storage and monitoring of the health of your clusters in your on-prem Kubernetes environment.
Opportunities and Benefits for Kubernetes On-Prem
Why then do organizations choose the path of running Kubernetes in their own data centers, compared to the relative “cake-walk” with public cloud providers?
Some organizations simply can’t use the public cloud, as they are bound by stringent regulations related to compliance and data privacy issues. But more importantly, enterprises are looking to take advantage of Kubernetes leveraging their existing data centers to transform their business and be able to modernize their applications for cloud-native – while improving infrastructure utilization and saving costs.
Considerations for Running DIY Kubernetes On-prem
Kubernetes enables users to run it on on-premises infrastructure, but not in a straightforward way like you would hope. You can “repurpose” your environment to integrate with Kubernetes – using VMs, hypervisors or creating your own cluster from scratch on bare metal – but there’s no escaping the requirement for a deep understanding of the associated servers, storage systems, and networking infrastructure.
Since Kubernetes is rather new and expertise can be hard to find, the CNCF has introduced certifications like CKA (Kubernetes Administrator) and CKAD (Kubernetes Application Developer) that can be achieved by passing a test. While employing CNCF certified administrators is a great option, not everyone can hire new staff.
Also, you should plan for the fact that DIY projects in the enterprise often balloon to months-long (and even years-long) projects trying to tame and effectively manage the open source components at scale – accumulating costs and delaying time to market.
Keep in mind that when running Kubernetes on your own in your own on-prem data centers, you’ll need to manage all the storage integrations, Load balancer, DNS, security management, Container registry, and monitoring infrastructure yourself.
In addition, each one of these components – from storage to networking – needs its own monitoring and alerting system, and you’ll need to set up your internal processes to monitor, troubleshoot and fix any common issues that might arise in these related services to ensure the health of your environments.
Infrastructure Requirements and Best Practices for On-Prem DIY Kubernetes Implementation
- Kubernetes can technically run on one server, but ideally, it needs at least three: one for all the master components which include all the control plane components like the kube-apiserver, etcd, kube-scheduler and kube-controller-manager, and two for the worker nodes where you run kubelet.
- While master components can run on any machine, best practice dictates using a separate server for the master and not running any user containers on this machine.
- One key feature of Kubernetes is the ability to recover from failures without losing data. It does this with a ‘political’ system of leaders, elections, and terms – referred to as Quorum – which requires “good” hardware to properly fulfill this capability. To be both available and recoverable, best practice dictates allotting three nodes with 2GB RAM and 8GB SSD each to this task, with three being the bare minimum and seven the maximum
- An SSD is recommended here since etcd writes to disk, and the smallest delay adversely affects performance. Lastly, always have an odd number of cluster members so a majority can be reached.
- Kubelet is a tool that along with kube-proxy, runs on each worker node, making sure all containers are running and meeting network regulations.
- You also want to run kubeadm on the master. Kubeadm is an installation tool that uses kubeadm init and kubeadm join as best practices to create clusters.
- For production environments, you would need a dedicated HAProxy load balancer node, as well as a client machine to run automation.
- It’s also a good idea to get a lot more power than what Kubernetes’ minimum requirements call for. Modern Kubernetes servers typically feature two CPUs with 32 cores each, 2TB of error-correcting RAM and at least four SSDs, eight SATA SSDs, and a couple of 10G network cards.
- It is best practice to run your clusters in a multi-master fashion in Production – to ensure high availability and resiliency of the master components themselves. This means you’ll need at least 3 Master nodes (an odd number, to ensure quorum). You’ll further need to monitor the master(s) and fixing any issues in case one of the replicas are down.
- HA edcd: etcd is an open-source distributed key-value store, and the persistent storage for Kubernetes. Kubernetes uses etcd to store all cluster-related data. This includes all the information that exists on your pods, nodes, and cluster. Accounting for this store is mission-critical, to say the least, since it’s the last line of defense in case of cluster failure.Managing highly available, secured etcd clusters for large-scale production deployments is one of the key operational complexities you need to handle when managing Kubernetes on your own infrastructure. For production use, where availability and redundancy are important factors, running etcd as a cluster is critical. Bringing up a secure etcd cluster – particularly on-premises – is difficult. It involves downloading the right binaries, writing the initial cluster configuration on each etcd node, setting and bringing up etcd. This is in addition to configuring the certificate authority and certificates for secure connections. For an easier way to run etcd cluster on-prem, check out the open-source etcdadm tool.
Kubernetes in Production Needs More Than Just Infrastructure
Once you’re done accounting for Kubernetes-specific dependencies which are Docker, kubeadm, kubelet and kubctl (CLI tool to communicate with the cluster), it’s time to look at the additional services used to deploy applications.
In case you’re deploying offline or in an air-gapped environment, you’ll need to have your own repositories in place for Docker, Kubernetes and any other open-source tools you may be using. This includes helm chart repositories for Kubernetes manifests, as well as binary repositories.
You also definitely want to install the Kubernetes dashboard which is one of the most useful and popular add-ons. To do this you will need to log in, select the token, get the list of dashboard secrets and then describe them to gain full access.
Best practices include always checking logs when something goes wrong by looking in your syslog files.
Additional Services
This stage can be a lot of fun since you get to experiment with all the tools in the industry, or a major pain — depending on your infrastructure and processes complexity.
Weaveworks and Flannel are both great networking tools, while Istio and Linkerd are popular service mesh options. Grafana and Prometheus help with monitoring and there are a number of tools to automate CI/CD like Jenkins, Bamboo, and JenkinsX.
Security is a major concern. Every open source component needs to be scanned for threats and vulnerabilities. Additionally, keeping track of version updates and patches and then managing their introduction can be labor-intensive, especially if you have a lot of additional services running.
In conclusion, Kubernetes helps on-premise data centers benefit from cloud-native applications and infrastructure, irrespective of hosting or public cloud providers. They could be on Openstack, KVM, VMware vSphere or even bare metal and still reap the cloud-native benefits that come from integrating with Kubernetes.
Note that bare-bone Kubernetes is never enough for real-world production applications. A complete Kubernetes infrastructure on-prem needs proper DNS, load balancing, Ingress and K8’s role-based access control (RBAC), alongside a slew of additional components that then makes the deployment process quite daunting for IT.
Once Kubernetes is deployed comes the addition of monitoring, tracing, logging, and all the associated operations for troubleshooting — such as when running out of capacity, ensuring HA, backups, and more. For further reading, check out this article on the seven key services you need around bare-bone Kubernetes to enable mission-critical production use.