Advanced Postgres Connection Pooling Using PgBouncer
Scaling Django/Postgres With PgBouncer on Heroku
Do you have an app on Heroku using Postgres? Are you running out of database connections? Maybe you've already tried setting CONN_MAX_AGE, but that only helped for a while. It’s time for PgBouncer, the de facto standard for Postgres connection pooling.
In this article, we'll look at how to use PgBouncer to scale your app on Heroku. We'll use Django, a popular Python web framework, as an example for the settings. The installation steps for PgBouncer and the concepts, however, should work for any app on Heroku.
What You Will Learn
Although setting up PgBouncer is straight forward, finding the right settings for your particular use case can be a challenge. There is no one-size-fits-all solution for a PgBouncer configuration.
This article will show you:
- how to select a PgBouncer pool mode and deployment type
- how to install PgBouncer on Heroku by adding the pgbouncer buildpack to your existing Django app
- how to monitor PgBouncer runtime statistics to help fine-tune your settings
- common errors and things to watch out for
You can find a sample project on GitHub at https://github.com/steuke/heroku-pgbouncer-django-demo.
Note: To use this guide, your app must use the DATABASE_URL environment variable to configure its database-connection. This is important. If you configure your database manually, you will first need to switch to using DATABASE_URL. For Django, have a look at dj-database-url and the sample project, which makes this very easy. You can check out the sample project to better understand how this can be done.
What to Consider Before You Start
Choosing the Right PgBouncer Deployment Type
PgBouncer can be deployed in different ways. Heroku supports both server-side and client-side deployments. The server-side implementation on Heroku is still in beta, is somewhat restricted on configuration options, and only supports transaction-mode (see next section). Therefore, this guide uses the client-side approach, which deploys PgBouncer on the same dyno as your Django app and gives you more control over each setting.
If you are unsure which deployment type to use, the official Heroku documentation will help you decide if the client-side deployment is right for you.
Choosing the Right PgBouncer Pool Mode and Django Settings
PgBouncer has three types of connection pool modes, listed here from most polite to most aggressive connection sharing: session, transaction, and statement mode.
Here is what you need to know when considering different pool modes for use with your Django app:
- Session-mode (the connection is returned to the pool only when the user session is closed): This mode is the most conservative. If you use this mode, you do not need to change anything in your Django app. If the other modes do not work with your app, this mode might fix the issues. This mode supports all Postgres features.
- Transaction-mode (the connection is returned to the pool as soon as the transaction is complete): This is the default mode of the Heroku PgBouncer buildpack. This mode requires you to disable the use of server-side cursors. In Django, disable them by setting DISABLE_SERVER_SIDE_CURSORS = True. This mode should work with the majority of apps, but note that some important Postgres features, such as prepared statements, listen/notify, and others, are not supported. If you use any of those features (or you rely on a library that uses these features), you'll risk breaking your app or straining the performance of some queries.
- Statement-mode (the connection is returned to the pool as soon as the statement is executed): This is even more aggressive than transaction mode. In this mode, you cannot use multi-statement transactions. Multi-state transactions will result in an error ("server closed the connection unexpectedly"). If you need this mode for certain use cases, consider setting up multiple PgBouncer pools with different settings.
As a best practice, start by using transaction-mode and verify that your app still works. After that, feel free to experiment with the other modes and their impact on performance and resource usage. For example, if you experience errors because your app uses features that are not supported, try switching to session-mode. If you don't experience errors, and you want to push your app's performance further, you could try to use statement-mode. More details can be found in the Heroku documentation.
Installing PgBouncer on Heroku Using a Buildpack
A word of advice: Use a staging environment to follow this guide. Roll this change out to production only after testing your app in a staging environment. Incorrect steps might prevent your app from working until you fix the problem. (You have been warned!)
OK, let's get started. Follow these five steps to add PgBouncer to your app (in this article, we'll use the Heroku CLI for commands):
1. Disable the use of server-side cursors
In a Django app, simply add this line to your settings.py:
DISABLE_SERVER_SIDE_CURSORS = True # required when using pgbouncer's pool_mode=transaction
2. Add the PgBouncer buildpack
xxxxxxxxxx
$ heroku buildpacks:add heroku/pgbouncer
3. Add the python buildpack
In some cases, you have to manually add the python buildpack (or the buildpack for the language you are using). Since there is no harm in doing this, let's do it:
xxxxxxxxxx
$ heroku buildpacks:add heroku/python
4. Change your Procfile so that PgBouncer is launched at startup
If you are using python/Django, then before this change, your Procfile will look something like this:
xxxxxxxxxx
web: gunicorn your_django_project.wsgi
Change it to something like this:
xxxxxxxxxx
web: bin/start-pgbouncer gunicorn your_django_project.wsgi
This change will pass your original startup command to the bin/start-pgbouncer script. The script is part of the PgBouncer buildpack. Its main purpose is to change the DATABASE_URL environment variable to point to the local PgBouncer connection pool instead of the original Postgres database.
This is the reason your Django app must use the DATABASE_URL environment variable to configure the database connection. If you do not use DATABASE_URL, your app will still work, but it won’t be connecting via PgBouncer to your Postgres.
5. Commit the changes and deploy them to Heroku
Use your favorite git client to commit the changes, and push them to Heroku. Then start looking at the logs with
xxxxxxxxxx
heroku logs -t --source app | grep pgbouncer
while running tests on your app. You should see a series of log-entries similar to these:
xxxxxxxxxx
2020-04-30T18:59:48.918478+00:00 app[web.1]: 2020-04-30 18:59:48.918 53 LOG C-0x56211ba01900: pgbouncer/pgbouncer (63):6000 login attempt: db=pgbouncer user=pgbouncer tls=no
xxxxxxxxxx
2020-04-30T18:59:48.919044+00:00 app[web.1]: 2020-04-30 18:59:48.919 53 LOG C-0x56211ba01900: pgbouncer/pgbouncer (63):6000 closing because: client close request (age=0)
Congratulations! This indicates that your app is successfully configured and using PgBouncer to connect to Postgres.
What Did You Just Achieve?
Your web dyno is now running a PgBouncer instance, which is providing a connection pool to your Django app (running on the same dyno). Your Django app is connecting to this local PgBouncer connection pool instead of directly to the Postgres database.
In the next section, you'll learn how to monitor PgBouncer.
Monitoring PgBouncer
There are several ways to monitor PgBouncer. One common way is to connect to the PgBouncer admin-console using psql. This section demonstrates two other ways: using Heroku logs and querying the stats from within your Django app.
Using Heroku Logs
The PgBouncer buildpack is configured to log the PgBouncer stats in one-minute intervals. Running the following command shows the most recent stats:
xxxxxxxxxx
$ heroku logs --source app | grep "LOG Stats"
The output will look similar to this:
xxxxxxxxxx
2020-04-30T19:11:25.720304+00:00 app[web.1]: 2020-04-30 19:11:25.720 54 LOG Stats: 5 xacts/s, 46 queries/s, in 5401 B/s, out 2756 B/s, xact 45640 us, query 992 us wait time 1396 us
Querying the PgBouncer Database From Your Django App Using SHOW STATS
PgBouncer provides a virtual database containing its stats. This is also referred to as the Admin Console. The PgBouncer buildpack provides access to this virtual database via a Unix socket named /tmp, using the username "PgBouncer" and no password.
Using these credentials, you can retrieve the PgBouncer stats from within your Django app by using the low-level psycopg2 database API. Here is an example function pgbouncer_stats() that retrieves the stats using that API:
xxxxxxxxxx
<code>
def pgbouncer_stats() -> List[Dict]:
try:
connection = psycopg2.connect(host="/tmp/", port=6000, dbname="pgbouncer", user="pgbouncer",
cursor_factory=extras.DictCursor)
connection.autocommit = True # must enable autocommit when querying pgbouncer stats
cursor = connection.cursor()
cursor.execute('SHOW STATS')
pgbouncer_stats = [dict(row) for row in cursor]
return pgbouncer_stats
except Exception as e:
return [{'error': True, 'description': 'Could not query pgbouncer stats.', 'reason': f'{e}'}]
</code>
The return value of pgbouncer_stats() is a list of dictionaries, with one dictionary for each database that PgBouncer is pooling:
xxxxxxxxxx
<code>
"pgbouncer_stats":[
{
"database":"db1",
"total_xact_count":220,
"total_query_count":2750,
"total_received":330110,
"total_sent":168517,
"total_xact_time":15062895,
"total_query_time":2837299,
"total_wait_time":84837,
"avg_xact_count":2,
"avg_query_count":29,
"avg_recv":3519,
"avg_sent":1796,
"avg_xact_time":68467,
"avg_query_time":1031,
"avg_wait_time":904
},
{
"database":"pgbouncer",
"total_xact_count":110,
"total_query_count":110,
"total_received":0,
"total_sent":0,
"total_xact_time":0,
"total_query_time":0,
"total_wait_time":0,
"avg_xact_count":1,
"avg_query_count":1,
"avg_recv":0,
"avg_sent":0,
"avg_xact_time":0,
"avg_query_time":0,
"avg_wait_time":0
}
]
</code>
You can see that the “db1” database has a total_xact_count of 220. This indicates that a total of 220 transactions have been processed. The PgBouncer documentation has the full list of stats and their meaning.
Common Errors and Things to Watch out For
How to fix some common problems when starting to use PgBouncer:
- "FATAL: database 'pgbouncer' does not exist:" The server you are connecting to does not seem to be a PgBouncer-server. Did you add the Heroku buildpack for PgBouncer? Are you using DATABASE_URL to configure your connection? Did you change the Procfile to start PgBouncer?
- "ERROR: no such user: username." This shouldn't happen if you are using the Heroku buildpack. Make sure you are connecting via the Unix socket as user "pgbouncer" without a password. See the pgbouncer-documentation on authentication for more details.
- You need to disable SSL in Django's database options, i.e. remove any 'sslmode': 'require'. Since you are using the client-side installation, traffic does not leave your machine and should be safe. Alternatively, see the PgBouncer docs for enabling TLS-support.
- If you connect to multiple Postgres databases, you need to set the PGBOUNCER_URLS environment variable to let PgBouncer know which environment variables it should use for connection pooling. For example, this command tells PgBouncer to create connection pools for DATABASE_URL and EVENT_DATABASE_URL: heroku config:add PGBOUNCER_URLS="DATABASE_URL EVENT_DATABASE_URL"
- If you have requests that need to access large tables with many rows, your app's performance can suffer when you have to disable server-side-cursors. One solution is to use multiple connection pools. You could have one pool in session mode for these large-table requests while using the second pool in transaction mode for others.
Next Steps: Monitor, Benchmark, and Fine-Tune PgBouncer
At this point, PgBouncer is up and running on your dyno, providing connection pooling to your Django app.
What should you do next?
- Make sure that your app performance is solid. Monitor the PgBouncer stats, keeping an eye on user-facing stats like avg_wait_time, which indicates the time that your app is waiting for connections.
- Perform benchmarks and experiment with the PgBouncer settings, as well as pool-types. One way to do this is by using a load-testing tool that puts your app under increasing load. While the load is increasing, watch PgBouncer's maxwait-stat. If that number is increasing dramatically, then the pool is not handling client requests quickly enough. You could try increasing PGBOUNCER_DEFAULT_POOL_SIZE, reducing PGBOUNCER_RESERVE_POOL_TIMEOUT, or try using a more aggressive pool mode. If none of this helps, maybe the server is running out of resources, which could mean you need to upgrade the Postgres server.
- Familiarize yourself with PgBouncer settings like max_client_conn, default_pool_size, and max_db_connections to tweak your connection pooling.
Conclusion
PgBouncer is the right tool to help scale your Postgres app when the number of database connections becomes an issue. Keep in mind that while adding PgBouncer to your setup will result in additional complexity for you and your team, taking the time to carefully setup, monitor, and tune your configuration should improve your app's performance.