Container Fails
To understand the current and future state of containers, we gathered insights from 33 IT executives who are actively using containers. We asked, "What are the most common failures you see with containers?"
Here's what they told us:
Skills
- Expectations versus reality. Expectations are due to the hyperbole around security. Reality is it can, but it doesn’t always pan out; you only get to a happy place after paying down the infrastructure cost itself. The other thing people underestimate is the learning curve. There is a sizeable code base you are going to be running on top of that you will not understand. The research, growth, and learning around containers take time and you can have slip ups with default behavior. Set realistic expectations and anticipate not getting everything right the first time around.
- Lack of skills and container knowledge results in common problems like scaling vertically and horizontally and forgetting the core tenets of being lightweight. The more "lift and shift" you attempt to execute, the more you are doomed for failure. You will have difficulty around scaling services. Web service may scale, while billing service doesn’t.
- Most problems are from the lack of deep-skilled talent and lack of production-level knowledge running containers. There is an inherent lack of understanding of application factors and architectural dependencies. Enterprises and developers must evolve processes used for container development and focus more on the security aspects. Security breaches align with potential application system vulnerabilities. You need to fully understand the vulnerabilities of the platform and influence those decisions. It's easy to over-architect decisions. People have a tendency to build too many microservices. Adding to the complexity of monitoring interaction patterns is very difficult, especially for building, troubleshooting, and the DevOps side.
- One challenge is folks not understanding how containers and container engines work. For example, not knowing what defaults they are providing, or how specific runtimes or languages work within a container. Also, a failure to thoroughly understand the differences between networking within a container versus outside.
Security
- The biggest fail we’ve seen is folks treating containers like a cool toy, getting lazy, and letting things slip. You need to put together a thorough and thoughtful framework to deploy and secure apps. If processes and procedures are not followed you will have unexpected and undesired fallout. We need to educate everyone on the need to always take all of the steps and not take shortcuts.
- 1) Once you move into containers, you see security configuration for workloads is not hardened. Running containers with too many read/write permissions makes them vulnerable. Containers can and should be deployed and run effectively with minimized permissions. 2) When using containers, organizations should avoid using network isolation tools. This can leave huge security gaps that may lead to the exfiltration of data due to misconfiguration, application vulnerabilities, or even the appearance of an open-source library coming from a corrupted source.
- Lately, the most common failure is someone in the organization setting up a small Kubernetes (K8s) cluster and not understanding how to operationalize it within their IT organization. Container platforms help DevOps teams move more quickly; however, it is important to couch decisions in an enterprise context so you wind up with a supportable solution. For example, an organization might have a strategy for logging and monitoring, but as they implement a container orchestration platform, they don’t verify if that strategy fits in the new world. They either march toward production without a logging and monitoring strategy for the container's environment—or they choose a solution that is completely separate and wind up managing twice as many solutions as they need. Most people see the value containers provide, which can lead some folks to jump in without fully understanding the common pitfalls that create insecurity in their infrastructure. For example, many people assume it’s safe to store unencrypted data within containers because they run in isolated environments. Containers are a wonderful technology, but you still have to be mindful of best security practices, such as encrypting sensitive data, even within a container.
- Introducing the appropriate solutions to secure container environments is a complex task with which many enterprises can struggle. However, it’s a crucial one, especially as container environments increasingly enter production. At the same time, deep expertise in K8s, DevOps, and creating the tools needed to implement application workflows in a containerized environment can be hard to come by. As a result, enterprises often find it challenging to properly produce container environments that feature all necessary security measures.
- One of the reasons for failure is not acknowledging the process as a problem and just assuming technology will be your savior. Opening Docker for developers without discipline results in a black box with no configuration management and a lack of automation, security, and automation. If you don’t roll-out with discipline, it creates a giant mess have to go back, reverse engineer and clean up. Include security and operations from the beginning and don’t let the development team run ahead of security and operations.
Other
- 1) Crashes – Containers tend to crash when running for a long period of time due to garbage collected in nature. Efforts have been made in recent versions to address this issue. 2) Storage – When running containers you take more storage space as each container will have some overhead on top of your packaged application. Although storage is not so relevant nowadays, it is one of the common reasons for crashes.
- People get excited about technology, building and using it quickly without realizing how quickly technology changes.
- Most orchestrators are still not completely ready for hybrid multi-cloud environments with multiple data centers. There are still challenges in deploying distributed applications.
- Underestimating the complexity of containers leads to failure. Some people configure containers without realizing they still need to plan for upgrades of the platform and the application. Due to scale, you need to build automation to handle upgrades. There's a balance between people who want to move fast and those that don’t want to do anything. Containers and K8s tend to be ephemeral. When they die, K8s will try to restart. Unless you look for specific indicators or restarts, you may not be able to see that an application is not performing well. Help customers identify all of the APIs that are available to monitor events and set up monitoring and logging infrastructure to capture logs and feed into a central location to be searched and used as a basis for alerting. The Prometheus infrastructure for application response can be plotted over time to know when performance metrics get out of line.
- The biggest failure is not letting it sink in that your allocation architecture will benefit from containers. Understand you can put anything in a container, but application architecture matters to realize the full benefits. When you start to use microservices you really have to know how you are going to manage, deploy, and monitor. If you're not good at managing, deploying, and monitoring your monolithic app, you will not be good with microservices and containers. Once learn to deploy into containers and learn autoscaling you need to think about how to monitor and get the log file out. Think about where are you persisting data and where data will live permanently. This forces you into good system comprehension.
- Another fail is taking an immature setup or process and trying to containerize it. A good example is people installing software in the container without understanding all the components and dependencies it’s pulling in (e.g. via Ruby Gems). There is then a huge complicated dependency graph in the container the creators don’t understand. So, when they go to update some components the whole thing breaks. It’s the new version of “DLL Hell.”
- The most common failure I see with containers is people not fully understanding (or forgetting), that containers are immutable components. The moment you version a container and treat it as immutable, you have absolute knowledge about what that thing is. A lot of the tooling around containers is made with the assumption that a container from a specific version is always exactly the same. The moment you start touching it, you lose the advantages of using containers, like portability. Even if the container breaks, you can’t reuse its version number. You have to create a completely new one. If you don’t, you lose the appropriate management controls of not one, but two containers – the modified and unmodified one. This tends to be a bigger problem in the lower environments, and it usually happens when companies are in the process of changing their applications over to containers. Another failure I see is when companies set out to turn their existing applications into containers, they don’t realize onboarding those containers requires rearchitecting their applications into twelve-factor applications. They can’t just take their exact application and throw it into a container. If they do that, they won’t be able to elastically scale, they’ll lose portability, and they might become locked into a certain cloud platform.
- A common container failure scenario is when the container infrastructure hardware fails to satisfy the container startup policies. Since K8s provides a declarative way to deploy applications and those policies are strictly enforced, it's critical the declared and desired container states can be met by the infrastructure allocated, otherwise a container will fail start! Other area failures can occur, or cause concern is how to deploy persistent storage, how to properly monitor and alert on failure events, and how to deploy applications across multiple K8s clusters. While the promise of orchestrated container environments holds great promise, there are still several areas that require careful attention to reduce the potential for failures and issues when deploying these systems.
- Common causes fit into two buckets – opaqueness and complexity. Containers as black boxes running software in an isolated way make it hard to understand what is happening. When things work, it’s great. When things don’t work, getting visibility gets much harder. Complexity – little things talk to each other in a distributed system. There can be latency with every call. Dealing with distributed systems is complex.
- Failure takes place around configuration. Apply hundreds of best practices and configurations to build guardrails around the environment. Lack of proper configurability results in a lack of visibility. Shine a light on things and know the scope of the environment.
- Two of the areas we see giving users the most difficulties are configuration and troubleshooting. Configuring networking and storage in containerized environments, especially when using frameworks like K8s, requires thinking differently and carefully about how resources are connected and managed and are frequently the causes of headaches and failures during implementation. Troubleshooting also requires thinking differently—access to services and logs typically requires extra steps because of how containers are separated and isolated from outside access.
- The most common failures we see when talking to developers using containers are 1) User misconfiguration of containers or applications in those containers. 2) Over-commitment of resources because of a lack of capacity management and planning around containers which is hard to predict. 3) Lack of consideration for day-2 operations with containers, which can lead to problems such as the abuse of privileged containers. 4) Attempting to port workloads to containers which are not suitable for containers or microservice architecture.
- You should not try to containerize monolithic legacy applications without a plan to decompose the application into smaller, stateless services. Using a container like a virtual machine also isn’t a good idea.
Here’s who we spoke to:
- Tim Curless, Solutions Principal, AHEAD
- Gadi Naor, CTO and Co-founder, Alcide
- Carmine Rimi, Product Manager, Canonical
- Sanjay Challa, Director of Product Management, Datical
- OJ Ngo, CTO, DH2i
- Shiv Ramji, V.P. Product, DigitalOcean
- Antony Edwards, COO, Eggplant
- Anders Wallgren, CTO, Electric Cloud
- Armon Dadgar, Founder and CTO, HashiCorp
- Gaurav Yadav, Founding Engineer Product Manager, Hedvig
- Ben Bromhead, Chief Technology Officer, Instaclustr
- Jim Scott, Director, Enterprise Architecture, MapR
- Vesna Soraic, Senior Product Marketing Manager, ITOM, Micro Focus
- Fei Huang, CEO, NeuVector
- Ryan Duguid, Chief Evangelist, Nintex
- Ariff Kassam, VP of Products and Joe Leslie, Senior Product Manager, NuoDB
- Bich Le, Chief Architect, Platform9
- Anand Shah, Software Development Manager, Provenir
- Sheng Liang, Co-founder and CEO, and Shannon Williams, Co-founder, Rancher Labs
- Scott McCarty, Principal Product Manager - Containers, Red Hat
- Dave Blakey, CEO, Snapt
- Keith Kuchler, V.P. Engineering, SolarWinds
- Edmond Cullen, Practice Principal Architect, SPR
- Ali Golshan, CTO, StackRox
- Karthik Ramasamy, Co-Founder, Streamlio
- Loris Degioanni, CTO, Sysdig
- Todd Morneau, Director of Product Management, Threat Stack
- Rob Lalonde, VP and GM of Cloud, Univa
- Vincent Lussenburg, Director of DevOps Strategy; Andreas Prins, Vice President of Product Development; and Vincent Partington, Vice President Cloud Native Technology, XebiaLabs