Why You Should Be Deploying Postgres on Kubernetes

Running a Postgres installation is trivial. However, setting up a production environment is a whole different matter. The main reason is that Postgres is not by itself a production-ready software, it requires a set of side tools to complement its functionality such as connection pooling, monitoring, backup tools, high availability management...

In this talk, Álvaro Hernández, founder and CEO at OnGres, discusses how Kubernetes is the best platform to make Postgres production-ready as well as other extra advantages it provides.


Bart Farrell:

Our next speaker is no stranger to this community. He’s one of the first people that actually started talking to me when I joined the community as community leader last year. His name is Álvaro Hernández. He is the CEO and founder of OnGres. And if you ever have any question that’s related to Postgres, whether it’s uncorrelated outside Kubernetes, obviously, for our community’s sake inside Kubernetes, Álvaro has been an amazing presenter. I’m looking forward to having him in a talk as well, in Spanish at some point, possibly the best-attended meetup we had was given by Álvaro earlier a few months ago. So anyway, wonderful to have you back with us today. 

Álvaro Hernández:

Thank you so much for having me here today. And let’s get into it. I have a quick presentation here today about why you should be deploying Postgres primarily on Kubernetes. 

So first of all, let’s just start talking about the blind Postgres without Kubernetes. What basically, you need to do to get Postgres up and running without using Kubernetes old school, as we were all doing a few years ago, right? So what it takes, right what it’s going to take to get this get up and running. So it’s actually pretty simple, right? If you’re using Ubuntu or Debian distribution, or something like that, you can just get PostgreSQL installed. And you’re done. This operation will run in about 40 seconds, depending on your internet connection. And you’ll get a Postgres cluster up and running. So I guess this presentation is pretty much over, because we just say, at this point, don’t run Postgres and Kubernetes. This is so simple that, why even worry, why even have to deal with other areas, right? But basically, you may not get the latest version of Postgres, by issuing this command, you may want to add some other repositories with newer versions of Postgres. And, probably, if you’re using Red Hat, an environmental-based distribution, and you’re gonna run different commands, you need to type a few more commands here. But other than these technicalities, you will not get a Postgres that is totally ready for production usage. And that’s, that’s the key here. If you’re running just for testing on your laptop, you’re probably fine with this. But if you’re going to deploy this into production, you may need to do some tweaking. 

The first one is going to be configuration. So let’s talk about configuration, we really need to tune the database to run Postgres in production, if you take Postgres configuration, as it comes by default, is super conservative. It is great for running on a Raspberry Pi cluster, which I heard someone has a cluster around. So maybe that’s fine for a Raspberry Pi. But in reality, when you’re going to run clusters and 96 core machines with 600 gigabytes of RAM and several terabytes of storage SSDs you want to choose some basic Postgres parameters. I’m not even saying like let’s go super deep on this. But at least a basic tune-in, you really need to apply to Postgres. Yes, to give you a hint, Postgres by default, comes with 128 megabytes of shared buffer, which acts as the parameter that controls the caching you do in run. So basically, if you’re gonna tune in Postgres by default, you’re going to be using 128 megabytes of RAM.

As I said, that’s probably good for Raspberry Pi, but not for your production cluster. You probably want to cache more RAM. I mean, Postgres will run fine with it. It’s just like if you have a Ferrari and you’re running just in first gear, you will get quite good speed but not the one you’re expecting. So this is the same case. Even if you look at the default configuration, it will not be open to the world. Your Postgres will only be listening to localhost, you definitely need to tune Postgres. And this is no easy task. So to tune Postgres, Postgres contains almost 300 tunable parameters, and some of them require quite a significant amount of Postgres expertise to tune. So even if we’re just aiming at the basic tuning, just getting some basic parameters, letting Postgres going into the outside world, having a decent sizing of memory, RAM, some of the other parameters that affect query performance, like the amount of parallel processes that you want to run some basic training for logging, which is later on critical for actually troubleshooting performance problems. You need to involve quite a fair amount of time, let’s say a Postgres DBA, with some experience that may require 2 to 8 hours to perform this job. This is the wild estimator that pushes the view. If you’ve got to do a deep tuning, you really require much more time for us before we require to run benchmarks and to verify the performance assumptions you’re making. 

If you’re wondering if there is a small help, there’s an online tool, that disclaimer, I helped build it. The Postgres field called https://postgresqlco.nf/ is just like the postgresql.conffiles, except that you’re moving the dot two positions to the right. So postgresqlco.nf, it allows you to upload a Postgres config, interact with an API, that is like your GitHub repo for postgresql.conf, which comes with a tuning guide, so this may help you in the process. 

We just installed of Postgres, that was easy, then we need to tune Postgres, this requires a little bit more expertise, and we need a Postgres DBA. For this. What else? Are we ready to go live? Well, not really. Because in reality, if you look at it, Postgres almost always require, let’s be precise, requires connection pooling. If you just deployed Postgres to the outside world without connection pooling, very likely, you’re going to run into trouble. 

Which kind of trouble? So basically, here, there’s a small benchmark that I ran, where comparing Postgres in a small instance because it affects those earlier, where if you just increase the amount of concurrent connections hitting Postgres, let’s say up to 500, in this case, so let’s say you don’t want to have 500 production concurrent users, that performance of the database in terms of transactions per second, it’s okay, but when you look at the latency, the time you’re taking to serve the request of the queries, it's terrible. It dramatically skyrockets. 

So, in order to run Postgres and solve these performance problems, you may experience with a high amount of concurrent connections, you typically always need connection pooling, and going to fall incoming fronting database can be set up as a middleware layer can be on the client side can be on all of them or a combination of those elements. And it takes time to understand where you want this connection pooling to be.

Fronting the database, which positions the architecture how to tune it whether you’re going to run a session mode or transaction mode, there are a few tunables there. So you’re going to require, under my estimation, some amount of time to set this up. It’s usual, so you don’t need a lot of time still, but you require help from, let’s say, a devops profile, and also Postgres DBA profile, because you need some Postgres expertise, something in between 4 to 16 hours for the optimistic case, 16 hours being a more potentially realistic case. 

So okay, so we need to do tuning, we need to do connection pooling. Is there anything I was willing to do to deploy Postgres in production? Well, yes, because who doesn’t want high availability as of today. Do you want to wake up at 3 am in the morning? I don’t. And I want to have my servers, my Postgres nodes to do automatic failover. I wanted them to reconnect automatically, I probably even want to have them if a node fails to be reprovisioned, automatically. And I’m not even getting into that. Because you can also do that outside of Kubernetes, even though it’s quite trivial on Kubernetes could also be easy to do somehow, with some auto-scaling mechanism and cloud environments and so forth, hard to do on-prem. 

But even without doing that setting up high availability Postgres is pretty tough, you need to first select the software you’re going to use for this, for example, Patroni is a Postgres one of the most recommended software for high availability or PG auto-failover. But then you need to solve all the pain points related to high availability, you need to talk about how to distribute the Postgres configuration so that it’s uniform across all the servers in the fleet. And you need to solve a sort of the entry point problem, which is how I’m going to locate the master is going to be a DNS virtual IP, I’m going to do some other discovery mechanisms. So in reality setting something like this will take you probably from 8 to 24 hours of mixed profile again, something like a DevOps person and somehow some Postgres expertise you require anyway. 

Of course, who doesn’t want backups? We need backups and also deploy clusters in production. So the same story applies here. And to solve the vital problem, we need to understand which tool I’m going to use, where I’m going to store your backups, how I want to perform lifecycle management for those backups, how to recycle, rotate them, and obviously to talk about restoration and testing of backups. In this case, not much Postgres expertise is required; a simple Postgres tool like WAL-G or pgBackRest, anyone without Postgres expertise in DevOps profile, but probably able to set it up in something like 4 to 16 hours time. 

But of course, we want monitoring, don’t we? We’re gonna want to use something like a refiner for monitoring Postgres. And this is also quite involved because Postgres Explorer for the computational system exposes a lot of metrics that you need to write queries for those. So some tools of course come with pre-configured queries, but you will probably want to customize them. So you’re going to need, again, a combination of DevOps and Postgres DBAs to sort this out, and be able to export the metrics you want to properties. So let’s say this is going to take anywhere between 8 and 24 hours or more. And in reality, just imagine the 12-node Postgres server you know, leaving the logs of each server on each node? Probably not. You want to send them to a central location where you can process them together and help troubleshoot. So this takes something like export to CSV logging, use an agent collector like FluentBit implemented collector like Fluentd, or just send them to a SaaS service where you’re going to be paying for that. But in this case, I will tell you something like 4 hours for reading and setting it up. Otherwise, we’re going to do the FluentBit setup and CSV logging. It’s going to take you like maybe 4 days of work, this is DevOps that we do, but still some work. And if you want to run some more advanced use cases, for example, right now, you can use Envoy for proxying the Postgres traffic discounts, which may lead to significant advantages. One is that you can export additional metrics of the network layer, to Prometheus. The other one is that you can offload SSL starting with envoy 1.18. So relieving Postgres of managing SSL rotating certificates, and so forth. There’s an interesting blog post that is linked here about how to use Envoy with Postgres. And this is a recommended setup. So this would also take something around 8 to 16 hours for a DevOps Envoy configuration. It’s not easy, but Envoy is a great software configuration but requires a little bit of expertise. 

And what about managing all the clusters? There are a lot of question marks, yes, because there’s basically no software for managing Postgres clusters, or reads for managing Postgres databases, but not clusters. So this will take an indefinite amount of time to build or set up a tool to help you manage several clusters, not just one.

And last but not least, you probably want to automate all this. So you want to run all these via IAC, or infrastructure as code. And this will take a lot of time, just bear in mind all the tools and all the components of this stack that we have mentioned that you need to set up an enterprise-grade Postgres stack. So anywhere between 48 and 96 hours of DevOps, working is probably pretty conservative. So just in summary, these are all the components that you probably need to run a Postgres stack in production. 

Let’s look now and how we would deploy this in Kubernetes. Well, this simple YAML will give you a Postgres cluster on Kubernetes with connection pooling, with tuned config by default and, and a few other bells and whistles there. So just one hour of Kubernetes administrator. In reality, you probably need a little bit more, right. 

By the way, there is an example software I’m using to illustrate these principles called StackGres, if you want to have more information about it, just go to StackGres.io. And this is a Postgres Kubernetes Operator which implements all the features that I’m using here. That’s why I’m using this as an example. And so, this simple demo YAML file will create a cluster. If you really want a few more bells and whistles like automatic backups, like Envoy proxy, and integrated monitoring you probably have a few more lines of YAML. But it’s no more that it’s going to take anywhere between 4 to 16 hours for a Kubernetes administrator. But note that not all the DevOps knowledge or especially Postgres knowledge is going to be needed. It comes with default tuning conflict and all the components that I just mentioned before. So you may spend some additional hours to create ingress, if that’s needed to expose the web console through load balancer, or integrate with GitOps. So let’s say in total, you’re going to do a really good job by investing anywhere between 4 to 16 hours of a Kubernetes administrator, not Postgres expertise required.

So in summary, we wouldn’t want to compare what it’s going to take this in total, we’re looking at a significant comparison that goes from up to 300 hours of work with just 16 hours of work in the worst case for both scenarios, comparing solutions Kubernetes for now. 

What about day-to-day operations? Once I have deployed the cluster, I may want to run database administration tasks. And there are several of those which are automated in Postgres operators like the one that I mentioned StackGress, like Repack, Vacuum, running minor version upgrades, minor version upgrades, Controlled restarts even benchmarks, they’re only automated with a few lines of YAML. How much time are all those efforts going to take you without Kubernetes? 

So those are all the reasons why to deploy Postgres on Kubernetes, briefly mentioning what’s the reasons not to deploy Postgres on Kubernetes. 

So option number 1 if you just want to run a fully managed service something like RDS and really pay for it because the price difference is significant then you should go that route. The other option is if you see a need for specific OS tuning software or some specific capabilities like ZFS, which sometimes are released on some particular Kubernetes environments, then you probably want to stay there. 

The last option that I really can think of is that if you think the earth is flat, the vaccines don’t work and Kubernetes is not for data storage or data platforms, so if you don’t want to get out of your comfort zone and stay there, there’s not much we can do. So that’s pretty much what I wanted to share with you today. If we have some questions, please let me know!

Bart Farrell:

Yes, we do. It was a fantastic presentation in a very short period of time. Let’s take a look really quickly at some of the questions that we got. So correct me if I’m wrong, but Postgres is a single code relational database. But to have PG be compatible with Kubernetes, wouldn’t that need some fundamental changes?

Álvaro Hernández:

Actually, yes. And that’s part of the effort that we are trying to leverage at the StackGres project. The previous talk that I gave at the data on Kubernetes meetup, which I would suggest you have a look at is recorded on video. It’s called “Deconstructing Postgres as a cloud native platform.“ And precisely the goal of this is to try to strip off some parts of Postgres, which can be offloaded to all the components of the cloud native ecosystem like Envoy that I mentioned, like Fluentd through, obviously Grafna for monitoring. And as time passes by, and we’re innovating here, we’re taking off some parts of Postgres functionally and uploading to this component. So it feels a bit more natural for Kubernetes administrators, and we’re simplifying the Postgres core component by itself. 

Bart Farrell:

Very good. I really liked the references to vaccines. There was a follow-up on that about saying how Postgres has many high availability options, which you would definitely want on Kubernetes. 

But you know, when folks are looking at some of the alternatives that are out there, in terms of which database, you should be thinking about? You have mentioned this before, I’m sure more than a couple of times, but we’re looking at the Kubernetes spectrum, right? What is it that Postgres provides that we just cannot find in another database?

Álvaro Hernández:

Well, there are really, really many things. Postgres is an extremely featureful database. And the first thing I would say is feature-set. Most people use databases in a limited fashion. And that’s okay. But if you really want to dive deeper into what Postgres can cover, like the SQL compatibility layer, I’ll give you one quick sample. A few years ago, almost nobody knew about recursive queries, for example, and recursive queries just by yesterday, that feature by itself alone, help discuss the common platform switch, some, some very rendering on a page, but to a up to a version that took 20 times less effort in terms of CPU usage to rank. So comments were loading 20 times faster on the node. And that’s because a comment is no, it’s kind of a hierarchy, like the tree structure. And instead of throwing many questions, database throwing once to recursively. So if you really look at all the features of auto stamps with and you really exploit that you’re going to get a really, really powerful database, and no older. Not many other databases have this same level of features. 

The other one that I would say is extensions. Postgres has an extension mechanism like plugins, and you can really extend to your will by creating or using already available extensions. And that kind of transforms the database. Think of Firefox plugins or Chrome plugins. It’s the same idea except for your data.

Bart Farrell:

Alright, we got a couple of follow-up questions on that too. What about sharding provided by a Citus DB?

Álvaro Hernández:

Postgres is the site initially as a single cluster database, like one primary and several replicas, even though you can do cascading and you can create some complex topologies in terms of replication. Also, support supports logical replication. Charting is not built-in to that site is a great example of one of those extensions that I was mentioning. And inside of those enables Postgres to become a sharded database. It’s a great solution for you once you share. And you have data that supports a good distribution, you can distribute data according to a sharing criteria that is optimal for distribution. And it works great. As for sales on Kubernetes, I believe there’s there’s way to go. The Microsoft team who purchased it, who acquired CitusDB some time ago. I know they’re developing solutions for writing sales in the cloud in Azure. And I believe there’s also a better preview solution for running Citus, also on Kubernetes. But anyway, this open source is AGPL free. So we can expect to be in, in other open source solutions anytime soon. 

Bart Farrell:

Good! Any opinions about Super Base? 

Álvaro Hernández:  

I like the idea and the project. It’s a way to expose Postgres in a different way with a different kind of API, more application or towards oriented. And I have some nice words for the project. I think it’s a cool project.

Bart Farrell:

Okay, good. Someone said they were also tinkering with that. It’s basically hosted Postgres with a subscription feature to make it work, like Firebase hasn’t tried self-hosting yet. 

Great stuff Álvaro. Álvaro has got a lot of resources out there. He gives a lot of talks, is very active in the Postgres community, and also has a Postgres foundation that he started as well. That being said, Álvaro, thank you very much!

 

 

 

 

Top