Platform Engineering Trends in Cloud-Native: Q&A With Ville Aikas

The rise of Kubernetes, cloud-native, and microservices spawned major changes in architectures and abstractions that developers use to create modern applications. In this multi-part series, I talk with some of the leading experts across various layers of the stack — from networking infrastructure to application infrastructure and middleware to telemetry data and modern observability concerns — to understand emergent platform engineering patterns that are affecting developer workflow around cloud-native. The next participant in our series is Ville Aikas, Founder of Chainguard and formerly a staff engineer at VMware and software engineer at Google, where he was one of the earliest members of the Kubernetes project.

Q: We are nearly a decade into containers and Kubernetes (K8s was first released in Sept 2014). How would you characterize how things look different today than ten years ago, especially in terms of the old world of systems engineers and network administrators and a big dividing line between these operations concerns and the developers on the other side of the wall? And what has that meant for developers’ responsibilities around security concerns?

A:  In the early days of pre-cloud infrastructure, it was a good first step for engineers and network administrators to take a virtualization approach for everything, but this was also a downside because they tried to match everything to the physical world, which at the time, was what was familiar. I also think there were some missed opportunities in this transition, where everyone was focused on a 1:1 comparison vs. thinking of alternatives and improvements in the process. On the one hand, the nice thing nowadays with the evolution of Kubernetes and cloud-native technologies is that you don’t necessarily have to decide which cloud you’re running on before starting to build things. You can delay that decision and rejigger things to work on that cloud or multiple clouds. This novel approach gave developers an opportunity to move beyond the physical barriers of network-based perimeters, but in turn, that changed the security landscape because things were not all behind the perimeter. In my humble opinion, these abstractions are still a bit leaky today, but with the introduction of new technologies and frameworks like service meshes and zero trust, they are no longer confined to having to request the network to get things done or building core security infrastructure like authentication, authorization and more. With the introduction of Kubernetes, GRBC, and MTLS, the network is less important to a developer’s day-to-day, giving them more time to focus on who can talk to what at a higher level and at more granularity. Of course, somebody still has to care about network-level security, if for nothing else other than DDoS and so forth.

Q: The popularity of the major cloud service platforms and all of the thrust behind the last ten years of SaaS applications and cloud-native created a ton of new abstractions for the level that developers are able to interact with underlying cloud and network infrastructure and the security of the application infrastructure. How has this trend of raising the abstraction for interacting with infrastructure affected developers from a security perspective? 

A: On one hand, the creation of cloud-native services has made it easier for developers to create applications and launch them. Witnessing these developers' experience, and I experienced it first hand as a developer myself, being able to deploy containers was a game changer, but there is still a long way to go. Today, it’s tricky always to know what’s running in those containers and what dependencies those containers have, which introduces some security risks and impacts the entire supply chain of those applications and services. The same can be said for applications since they could easily access anything, so speed (which is great) also means that maybe not enough thought in some cases was given to all the things that could go wrong from a security standpoint. This introduced the question for developers and their security teams: just because you can run or do something more easily now with a container or cloud-native application, should you?

Q: What are the areas where it makes sense for developers to have to really think about the security of underlying application infrastructure versus the ones where having a high degree of instrumentation or customization ("shift left") is going to be very important?

A:  One area is being able to lock down the communication patterns of your applications. This results not only in more secure systems but better design systems, too. Also, developers should be thinking about how to lock down applications if and when something bad happens, for example, egressing data. If you can again lock down the communication patterns at the start, then that’s one more way to have a belt and suspenders in place. Instrumentation is also very important, again, not only for security but also for being able to understand how systems behave, for debugging, and so forth. Also, by instrumenting, you can understand if there are any emergent behaviors that are not really how the system was designed, but, well, that’s how it turned out. I’ve seen this a lot where a bunch of glue (think functions) are super easy to write, but being able to reason about the system becomes very difficult.

Q: Despite the obvious immense popularity of cloud-native, much of the world's infrastructure (especially in highly regulated industries) is still running in on-prem datacenters. What does the future hold for all this legacy infrastructure and millions of servers humming along in data centers? What are the implications for managing the security of these mixed cloud-native infrastructures together with legacy data centers over time? 

A:  I would think that a higher level of abstraction will keep making its way to the on-prem systems because the developer talent is more familiar and productive with it. If you squint hard enough, the kinds of services that organizations provide will just be a special kind of cloud, but all the same rules apply there. For example, private clouds will start behaving more like public clouds, with some exceptions for on-prem requirements like “air-gapping.” 

Q: What do you think are some of the modern checklist items that developers care most about in terms of their workflow and how platform engineering makes their lives more productive? Broadly speaking, what are conditions that are most desirable versus least desirable in terms of the built environment and toolchains that modern developers care about?

A: The ability to get things done, I think, is really the most important thing still today. Developers have and always will want to do things and do them faster.  Reducing the friction by having security built-in to applications and tools vs. having to do something different after the fact or that requires extra developer work.  I am hoping that as we mature in this field, core developer tooling will keep on improving so that instead of security being a friction, it reduces the number of things that developers have to devote their time to. The sooner you can ship new features is a desirable state for developers, but when security is already built into this process, then on the same level as shipping the features, you can address security issues or fix things. 

Q: Where are we in the journey to making the software supply chain secure? What progress has been made since major moments like SolarWinds and Log4j, and what’s next?

A: I’d say we are still in the early stages. We have done a good job as an industry in raising awareness of these issues, as well as providing tooling for developers to start tackling software supply chain security, but there really is no “secure-my-shit” flag that developers can apply to fix the problem. It’s a LONGchain, where each link needs to be addressed, and every company is at a different stage in the journey. There are certainly tools and solutions that address each of the different links, but it will take organizations a lot of time and effort to tackle it by using different tools for each step. 

For example, at Chainguard, we believe that the way to solve this is starting all the way at the start when software gets built, where security is a part of the foundation by default in developer tooling that then permeates through each and every part of the software supply chain. We have started some of this work with our Wolfi-OS distribution and now our Chainguard Images, which are hardened, minimal containers that aim for little or no vulnerabilities. 

Q: What does software supply chain security look like today, in terms of build systems and common toolchains that developers use? What are the practical things they should be doing? And what are the security responsibilities that fall more to the platform engineering teams? Who should own what in the ideal scenario? 

A: It’s easier than ever to start baking the security into software being produced.  For example, Sigstore’s technology lets you sign commits, you can create  SBOMs (that must be attested), and you can verify the origin of the software you run. There are a lot of automated tools that you can use to keep your dependencies up to date (hello dependabot, for example). One note, though, is to make sure you do not just blindly apply everything you can. Vigilance is still necessary with the advances in tooling developed for software supply chain security. For example, I think platform engineering can ensure that only properly vetted software that comes from trusted sources and contains proper provenance is allowed to run in production and continuously verify that these things are up-to-date. Just because milk was good two weeks ago, it may still not be, and the software, unfortunately, doesn’t come with an expiration date. 

Q: How are images and operating system distributions evolving for software supply chain security concerns? 

A: Today’s OS distros, in particular new distros like Wolfi, are built to include more provenance and attestations, meaning that one can reason more about what actually is in the container running or being built. New container image options include things like high-quality SBOM data, which is extremely important. When you can understand how your container images are built, this results in higher quality images and a more secure supply chain in the long term. Instead of the past model, which was just cutting and pasting to get things up and running (again, the speed argument; I have been guilty of it myself), Chainguard Images and Wolfi are emerging in this space and really putting the time and effort into building something secure by default. 

 

 

 

 

Top