Kubernetes Evolution: Transitioning from etcd to Distributed SQL
I recently stumbled upon an article explaining how to replace etcd with PostgreSQL. This transition was seamless with the Kine project, which serves as an external etcd endpoint, translating Kubernetes etcd requests into SQL queries for an underlying relational database.
Inspired by this approach, I decided to explore Kine's potential further by switching from etcd to YugabyteDB, a distributed SQL database built on PostgreSQL.
What’s the Problem With etcd?
Etcd is a key-value store used by Kubernetes to house all cluster data.
It doesn't typically demand your attention until you encounter scalability or high availability (HA) issues with your Kubernetes cluster. Managing etcd in a scalable and HA way is particularly challenging for large Kubernetes deployments.
Also, there's been mounting concern within the Kubernetes community regarding the future of the etcd project. Its community is dwindling, and just a few maintainers are left with the interest (and capability) to support and advance this project.
These concerns gave rise to Kine, an etcd API to SQL translation layer. Kine officially supports SQLite, PostgreSQL, and MySQL—systems that continue to grow in usage and boast robust communities.
Why Choose Distributed SQL Databases?
Although PostgreSQL, SQLite, and MySQL are excellent options for Kubernetes, they are designed and optimized for single-server deployments. This means that they can present some challenges, particularly for large Kubernetes deployments that have more stringent scalability and availability requirements.
If your Kubernetes cluster requires an RPO (Recovery Point Objective) of zero and an RTO (Recovery Time Objective) measured in single-digit seconds, the architecture and maintenance of MySQL or PostgreSQL deployments will be a challenge. If you’re interested in delving deeper into this topic, you can explore PostgreSQL high availability options here.
Distributed SQL databases function as a cluster of interconnected nodes that can be deployed across multiple racks, availability zones, or regions. By design, they are highly available and scalable and, thus, can improve the same characteristics for Kubernetes.
Starting Kine on YugabyteDB
My decision to use YugabyteDB as the distributed SQL database for Kubernetes was influenced by PostgreSQL. YugabyteDB is built on the PostgreSQL source code, reusing the upper half of PostgreSQL (the query engine) while providing its own distributed storage implementation.
The close ties between YugabyteDB and PostgreSQL allow us to repurpose the PostgreSQL implementation of Kine for YugabyteDB. However, stay tuned, this won't be a simple lift-and-shift story!
Now, let's translate these ideas into action and start Kine on YugabyteDB. For this, I'm utilizing an Ubuntu 22.04 virtual machine equipped with 8 CPUs and 32GB of RAM.
First, we launch a three-node YugabyteDB cluster on the machine. It's acceptable to experiment with a distributed SQL database on a single server before going distributed. There are multiple ways to kick off YugabyteDB locally, but my preferred method is via Docker:
mkdir ~/yb_docker_data
docker network create custom-network
docker run -d --name yugabytedb_node1 --net custom-network \
-p 15433:15433 -p 7001:7000 -p 9000:9000 -p 5433:5433 \
-v ~/yb_docker_data/node1:/home/yugabyte/yb_data --restart unless-stopped \
yugabytedb/yugabyte:latest \
bin/yugabyted start --tserver_flags="ysql_sequence_cache_minval=1" \
--base_dir=/home/yugabyte/yb_data --daemon=false
docker run -d --name yugabytedb_node2 --net custom-network \
-p 15434:15433 -p 7002:7000 -p 9002:9000 -p 5434:5433 \
-v ~/yb_docker_data/node2:/home/yugabyte/yb_data --restart unless-stopped \
yugabytedb/yugabyte:latest \
bin/yugabyted start --join=yugabytedb_node1 --tserver_flags="ysql_sequence_cache_minval=1" \
--base_dir=/home/yugabyte/yb_data --daemon=false
docker run -d --name yugabytedb_node3 --net custom-network \
-p 15435:15433 -p 7003:7000 -p 9003:9000 -p 5435:5433 \
-v ~/yb_docker_data/node3:/home/yugabyte/yb_data --restart unless-stopped \
yugabytedb/yugabyte:latest \
bin/yugabyted start --join=yugabytedb_node1 --tserver_flags="ysql_sequence_cache_minval=1" \
--base_dir=/home/yugabyte/yb_data --daemon=false
Note: I'm starting YugabyteDB nodes with the ysql_sequence_cache_minval=1
setting to ensure that database sequences can be incremented sequentially by 1. Without this option, a single Kine connection to YugabyteDB will cache the next 100 IDs of a sequence. This could lead to "version mismatch" errors during a Kubernetes cluster bootstrap because one Kine connection could be inserting records with IDs ranging from 1 to 100 while another could be inserting records with IDs from 101 to 200.
Next, start a Kine instance connecting to YugabyteDB using the PostgreSQL implementation:
1. Clone the Kine repo:
git clone https://github.com/k3s-io/kine.git && cd kine
2. Start a Kine instance connecting to the local YugabyteDB cluster:
go run . --endpoint postgres://yugabyte:yugabyte@127.0.0.1:5433/yugabyte
3. Connect to YugabyteDB and confirm the Kine schema is ready:
psql -h 127.0.0.1 -p 5433 -U yugabyte
yugabyte=# \d
List of relations
Schema | Name | Type | Owner
--------+-------------+----------+----------
public | kine | table | yugabyte
public | kine_id_seq | sequence | yugabyte
(2 rows)
Good, the first test has been a success. Kine treats YugabyteDB as PostgreSQL and starts without any issues. Now we progress to the next phase: launching Kubernetes on top of Kine with YugabyteDB.
Starting Kubernetes on Kine With YugabyteDB
Kine can be used by various Kubernetes engines, including standard K8s deployments, Rancher Kubernetes Engine (RKE), or K3s (a lightweight Kubernetes engine). For simplicity's sake, I'll use the latter.
A K3s cluster can be started with a single command:
- Stop the Kine instance started in the previous section.
- Start K3s connecting to the same local YugabyteDB cluster (K3s executable is shipped with Kine):
curl -sfL https://get.k3s.io | sh -s - server --write-kubeconfig-mode=644 \
--token=sample_secret_token \
--datastore-endpoint="postgres://yugabyte:yugabyte@127.0.0.1:5433/yugabyte"
3. Kubernetes should start with no issues and we can confirm that by running the following command:
k3s kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu-vm Ready control-plane,master 7m13s v1.27.3+k3s1
Excellent, Kubernetes functions seamlessly on YugabyteDB. This is possible thanks to YugabyteDB's high feature and runtime compatibility with PostgreSQL. This means that we can reuse most libraries, drivers, and frameworks created for PostgreSQL.
This could have marked the end of our journey, but as a diligent engineer, I decided to review the K3s logs. Occasionally, during a Kubernetes bootstrap, the logs may report slow queries, like the one below:
INFO[0015] Slow SQL(total time: 3s) :
SELECT
*
FROM (
SELECT
(
SELECT
MAX(rkv.id) AS id
FROM
kine AS rkv),
(
SELECT
MAX(crkv.prev_revision) AS prev_revision
FROM
kine AS crkv
WHERE
crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value
FROM
kine AS kv
JOIN (
SELECT
MAX(mkv.id) AS id
FROM
kine AS mkv
WHERE
mkv.name LIKE $1
GROUP BY
mkv.name) AS maxkv ON maxkv.id = kv.id
WHERE
kv.deleted = 0
OR $2) AS lkv
ORDER BY
lkv.theid ASC
LIMIT 10001
This may not be a significant concern when running YugabyteDB on a single machine, but once we switch to a distributed setting, queries like this can become hotspots and create bottlenecks.
As a result, I cloned the Kine source code and began to explore the PostgreSQL implementation for potential optimization opportunities.
Optimizing Kine for YugabyteDB
Here, I had the opportunity to collaborate with Franck Pachot, a database guru well-versed in optimizing the SQL layer with no or minimal changes in the application logic.
After inspecting the database schema generated by Kine and utilizing EXPLAIN ANALYZE
for certain queries, Franck suggested essential optimizations that would be beneficial for any distributed SQL database.
Fortunately, the optimizations did not necessitate any changes to the Kine application logic. All I had to do was introduce a few SQL-level enhancements. Consequently, a Kine fork with direct support for YugabyteDB was created.
Meanwhile, there are three optimizations in the YugabyteDB implementation compared to the PostgreSQL one:
- The primary index for the kine table has been changed from
PRIMARY INDEX (id)
toPRIMARY INDEX (id asc)
. By default, YugabyteDB utilizes hash sharding to distribute records evenly across the cluster. However, Kubernetes runs many range queries over the id column, which makes it reasonable to switch to range sharding. - The
kine_name_prev_revision_uindex index
has been updated to be a covering index by including theid
column in the index definition:CREATE UNIQUE INDEX IF NOT EXISTS kine_name_prev_revision_uindex ON kine (name asc, prev_revision asc) INCLUDE(id);
YugabyteDB distributes indexes similarly to table records. Therefore, it might be that an index entry references an id stored on a different YugabyteDB node. To avoid an extra network round trip between the nodes, we can include the id in the secondary index. - Kine performs many joins while fulfilling Kubernetes requests. If the query planner/optimizer decides to use a nested loop join, then by default, the YugabyteDB query layer will be reading and joining one record at a time. To expedite the process, we can enable batched nested loop joins. The Kine implementation for YugabyteDB does this by executing the following statement at startup:
ALTER DATABASE " + dbName + " set yb_bnl_batch_size=1024;
Let’s give this optimized YugabyteDB implementation a try.
First, stop the previous K3s service and drop the Kine schema from the YugabyteDB cluster:
1. Stop and delete the K3s service:
sudo /usr/local/bin/k3s-uninstall.sh
sudo rm -r /etc/rancher
2. Drop the schema:
psql -h 127.0.0.1 -p 5433 -U yugabyte
drop table kine cascade;
Next, start a Kine instance that provides the optimized version for YugabyteDB:
1. Clone the fork:
git clone https://github.com/dmagda/kine-yugabytedb.git && cd kine-yugabytedb
2. Start Kine:
go run . --endpoint "yugabytedb://yugabyte:yugabyte@127.0.0.1:5433/yugabyte"
Kine initiates without any issues. The only difference now is that instead of specifying 'postgres'
in the connection string, we indicate 'yugabytedb'
to enable the optimized YugabyteDB implementation. Regarding the actual communication between Kine and YugabyteDB, Kine continues to use the standard PostgreSQL driver for Go.
Building Kubernetes on an Optimized Version of Kine
Finally, let’s start K3s on this optimized version of Kine.
To do that, we first need to build K3s from sources:
1. Stop the Kine instance started in the section above.
2. Clone the K3s repository:
git clone --depth 1 https://github.com/k3s-io/k3s.git && cd k3s
3. Open the go.mod
file and add the following line to the end of the replace (..)
section:
github.com/k3s-io/kine => github.com/dmagda/kine-yugabytedb v0.2.0
This instruction tells Go to use the latest release of the Kine fork with the YugabyteDB implementation.
4. Enable support for private repositories and modules:
go env -w GOPRIVATE=github.com/dmagda/kine-yugabytedb
5. Make sure the changes take effect:
go mod tidy
6. Prepare to build a full version of K3s:
mkdir -p build/data && make download && make generate
7. Build the full version:
SKIP_VALIDATE=true make
It should take around five minutes to finish the build.
Note: once you stop experimenting with this custom K3s build you can uninstall it following this instruction.
Running the Sample Workload on an Optimized Kubernetes Version
Once the build is ready, we can start K3s with the optimized version of Kine.
1. Navigate to the directory with the build artifacts:
cd dist/artifacts/
2. Start K3s by connecting to the local YugabyteDB cluster:
sudo ./k3s server \
--token=sample_secret_token \
--datastore-endpoint="yugabytedb://yugabyte:yugabyte@127.0.0.1:5433/yugabyte"
3. Confirm Kubernetes started successfully:
sudo ./k3s kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu-vm Ready control-plane,master 4m33s v1.27.4+k3s-36645e73
Now, let's deploy a sample application to ensure that the Kubernetes cluster is capable of more than just bootstrapping itself:
1. Clone a repository with Kubernetes examples:
git clone https://github.com/digitalocean/kubernetes-sample-apps.git
2. Deploy the Emojivoto application:
sudo ./k3s kubectl apply -k ./kubernetes-sample-apps/emojivoto-example/kustomize
3. Make sure all deployments and services start successfully:
sudo ./k3s kubectl get all -n emojivoto
NAME READY STATUS RESTARTS AGE
pod/vote-bot-565bd6bcd8-rnb6x 1/1 Running 0 25s
pod/web-75b9df87d6-wrznp 1/1 Running 0 24s
pod/voting-f5ddc8ff6-69z6v 1/1 Running 0 25s
pod/emoji-66658f4b4c-wl4pt 1/1 Running 0 25s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/emoji-svc ClusterIP 10.43.106.87 <none> 8080/TCP,8801/TCP 27s
service/voting-svc ClusterIP 10.43.14.118 <none> 8080/TCP,8801/TCP 27s
service/web-svc ClusterIP 10.43.110.237 <none> 80/TCP 27s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/vote-bot 1/1 1 1 26s
deployment.apps/web 1/1 1 1 25s
deployment.apps/voting 1/1 1 1 26s
deployment.apps/emoji 1/1 1 1 26s
NAME DESIRED CURRENT READY AGE
replicaset.apps/vote-bot-565bd6bcd8 1 1 1 26s
replicaset.apps/web-75b9df87d6 1 1 1 25s
replicaset.apps/voting-f5ddc8ff6 1 1 1 26s
replicaset.apps/emoji-66658f4b4c 1 1 1 26s
4. Make a call to the service/web-svc
using the CLUSTER_IP:80
to trigger the application logic:
curl 10.43.110.237:80
The app will respond with the following HTML:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Emoji Vote</title>
<link rel="icon" href="/img/favicon.ico">
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-60040560-4"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-60040560-4');
</script>
</head>
<body>
<div id="main" class="main"></div>
</body>
<script type="text/javascript" src="/js" async></script>
</html>
In Summary
Job done! Kubernetes can now use YugabyteDB as a distributed and highly available SQL database for all its data.
This allows us to proceed to the next phase: deploying Kubernetes and YugabyteDB in a genuine cloud environment across multiple availability zones and regions, and testing how the solution handles various outages. This warrants a separate blog post, so stayed tuned!