Kubernetes Evolution: Transitioning from etcd to Distributed SQL

I recently stumbled upon an article explaining how to replace etcd with PostgreSQL. This transition was seamless with the Kine project, which serves as an external etcd endpoint, translating Kubernetes etcd requests into SQL queries for an underlying relational database. 

Inspired by this approach, I decided to explore Kine's potential further by switching from etcd to YugabyteDB, a distributed SQL database built on PostgreSQL.

What’s the Problem With etcd?

Etcd is a key-value store used by Kubernetes to house all cluster data.

It doesn't typically demand your attention until you encounter scalability or high availability (HA) issues with your Kubernetes cluster. Managing etcd in a scalable and HA way is particularly challenging for large Kubernetes deployments.

Also, there's been mounting concern within the Kubernetes community regarding the future of the etcd project. Its community is dwindling, and just a few maintainers are left with the interest (and capability) to support and advance this project.

These concerns gave rise to Kine, an etcd API to SQL translation layer. Kine officially supports SQLite, PostgreSQL, and MySQL—systems that continue to grow in usage and boast robust communities.

Why Choose Distributed SQL Databases?

Although PostgreSQL, SQLite, and MySQL are excellent options for Kubernetes, they are designed and optimized for single-server deployments. This means that they can present some challenges, particularly for large Kubernetes deployments that have more stringent scalability and availability requirements.

If your Kubernetes cluster requires an RPO (Recovery Point Objective) of zero and an RTO (Recovery Time Objective) measured in single-digit seconds, the architecture and maintenance of MySQL or PostgreSQL deployments will be a challenge. If you’re interested in delving deeper into this topic, you can explore PostgreSQL high availability options here.

Distributed SQL databases function as a cluster of interconnected nodes that can be deployed across multiple racks, availability zones, or regions. By design, they are highly available and scalable and, thus, can improve the same characteristics for Kubernetes.

Starting Kine on YugabyteDB

My decision to use YugabyteDB as the distributed SQL database for Kubernetes was influenced by PostgreSQL. YugabyteDB is built on the PostgreSQL source code, reusing the upper half of PostgreSQL (the query engine) while providing its own distributed storage implementation.

The close ties between YugabyteDB and PostgreSQL allow us to repurpose the PostgreSQL implementation of Kine for YugabyteDB. However, stay tuned, this won't be a simple lift-and-shift story!

Now, let's translate these ideas into action and start Kine on YugabyteDB. For this, I'm utilizing an Ubuntu 22.04 virtual machine equipped with 8 CPUs and 32GB of RAM.

First, we launch a three-node YugabyteDB cluster on the machine. It's acceptable to experiment with a distributed SQL database on a single server before going distributed. There are multiple ways to kick off YugabyteDB locally, but my preferred method is via Docker:

Shell
 
mkdir ~/yb_docker_data

docker network create custom-network

docker run -d --name yugabytedb_node1 --net custom-network \
  -p 15433:15433 -p 7001:7000 -p 9000:9000 -p 5433:5433 \
  -v ~/yb_docker_data/node1:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest \
  bin/yugabyted start --tserver_flags="ysql_sequence_cache_minval=1" \
  --base_dir=/home/yugabyte/yb_data --daemon=false

docker run -d --name yugabytedb_node2 --net custom-network \
  -p 15434:15433 -p 7002:7000 -p 9002:9000 -p 5434:5433 \
  -v ~/yb_docker_data/node2:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest \
  bin/yugabyted start --join=yugabytedb_node1 --tserver_flags="ysql_sequence_cache_minval=1" \
  --base_dir=/home/yugabyte/yb_data --daemon=false
      
docker run -d --name yugabytedb_node3 --net custom-network \
  -p 15435:15433 -p 7003:7000 -p 9003:9000 -p 5435:5433 \
  -v ~/yb_docker_data/node3:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest \
  bin/yugabyted start --join=yugabytedb_node1 --tserver_flags="ysql_sequence_cache_minval=1" \
  --base_dir=/home/yugabyte/yb_data --daemon=false


Note: I'm starting YugabyteDB nodes with the ysql_sequence_cache_minval=1 setting to ensure that database sequences can be incremented sequentially by 1. Without this option, a single Kine connection to YugabyteDB will cache the next 100 IDs of a sequence. This could lead to "version mismatch" errors during a Kubernetes cluster bootstrap because one Kine connection could be inserting records with IDs ranging from 1 to 100 while another could be inserting records with IDs from 101 to 200.

Next, start a Kine instance connecting to YugabyteDB using the PostgreSQL implementation:

1. Clone the Kine repo:

Shell
 
git clone https://github.com/k3s-io/kine.git && cd kine


2. Start a Kine instance connecting to the local YugabyteDB cluster:

Shell
 
go run . --endpoint postgres://yugabyte:yugabyte@127.0.0.1:5433/yugabyte


3. Connect to YugabyteDB and confirm the Kine schema is ready:

SQL
 
psql -h 127.0.0.1 -p 5433 -U yugabyte

yugabyte=# \d
           List of relations
Schema |    Name     |   Type   |  Owner
--------+-------------+----------+----------
 public | kine        | table    | yugabyte
 public | kine_id_seq | sequence | yugabyte
(2 rows)


Good, the first test has been a success. Kine treats YugabyteDB as PostgreSQL and starts without any issues. Now we progress to the next phase: launching Kubernetes on top of Kine with YugabyteDB.

Starting Kubernetes on Kine With YugabyteDB

Kine can be used by various Kubernetes engines, including standard K8s deployments, Rancher Kubernetes Engine (RKE), or K3s (a lightweight Kubernetes engine). For simplicity's sake, I'll use the latter.

A K3s cluster can be started with a single command:

  1. Stop the Kine instance started in the previous section.
  2. Start K3s connecting to the same local YugabyteDB cluster (K3s executable is shipped with Kine):
Shell
 
curl -sfL https://get.k3s.io | sh -s - server --write-kubeconfig-mode=644 \
--token=sample_secret_token \
--datastore-endpoint="postgres://yugabyte:yugabyte@127.0.0.1:5433/yugabyte"


3. Kubernetes should start with no issues and we can confirm that by running the following command:

Shell
 
k3s kubectl get nodes
NAME        STATUS   ROLES                  AGE     VERSION
ubuntu-vm   Ready    control-plane,master   7m13s   v1.27.3+k3s1


Excellent, Kubernetes functions seamlessly on YugabyteDB. This is possible thanks to YugabyteDB's high feature and runtime compatibility with PostgreSQL. This means that we can reuse most libraries, drivers, and frameworks created for PostgreSQL.

This could have marked the end of our journey, but as a diligent engineer, I decided to review the K3s logs. Occasionally, during a Kubernetes bootstrap, the logs may report slow queries, like the one below:

SQL
 
INFO[0015] Slow SQL(total time: 3s) :
SELECT
    *
FROM (
    SELECT
        (
            SELECT
                MAX(rkv.id) AS id
            FROM
                kine AS rkv),
(
                SELECT
                    MAX(crkv.prev_revision) AS prev_revision
                FROM
                    kine AS crkv
                WHERE
                    crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value
            FROM
                kine AS kv
                JOIN (
                    SELECT
                        MAX(mkv.id) AS id
                    FROM
                        kine AS mkv
                    WHERE
                        mkv.name LIKE $1
                    GROUP BY
                        mkv.name) AS maxkv ON maxkv.id = kv.id
                WHERE
                    kv.deleted = 0
                    OR $2) AS lkv
ORDER BY
    lkv.theid ASC
LIMIT 10001


This may not be a significant concern when running YugabyteDB on a single machine, but once we switch to a distributed setting, queries like this can become hotspots and create bottlenecks.

As a result, I cloned the Kine source code and began to explore the PostgreSQL implementation for potential optimization opportunities.

Optimizing Kine for YugabyteDB

Here, I had the opportunity to collaborate with Franck Pachot, a database guru well-versed in optimizing the SQL layer with no or minimal changes in the application logic. 

After inspecting the database schema generated by Kine and utilizing EXPLAIN ANALYZE for certain queries, Franck suggested essential optimizations that would be beneficial for any distributed SQL database.

Fortunately, the optimizations did not necessitate any changes to the Kine application logic. All I had to do was introduce a few SQL-level enhancements. Consequently, a Kine fork with direct support for YugabyteDB was created.

Meanwhile, there are three optimizations in the YugabyteDB implementation compared to the PostgreSQL one:

  1. The primary index for the kine table has been changed from PRIMARY INDEX (id) to PRIMARY INDEX (id asc). By default, YugabyteDB utilizes hash sharding to distribute records evenly across the cluster. However, Kubernetes runs many range queries over the id column, which makes it reasonable to switch to range sharding.
  2. The kine_name_prev_revision_uindex index has been updated to be a covering index by including the id column in the index definition:

    CREATE UNIQUE INDEX IF NOT EXISTS kine_name_prev_revision_uindex ON kine (name asc, prev_revision asc) INCLUDE(id);

    YugabyteDB distributes indexes similarly to table records. Therefore, it might be that an index entry references an id stored on a different YugabyteDB node. To avoid an extra network round trip between the nodes, we can include the id in the secondary index.
  3. Kine performs many joins while fulfilling Kubernetes requests. If the query planner/optimizer decides to use a nested loop join, then by default, the YugabyteDB query layer will be reading and joining one record at a time. To expedite the process, we can enable batched nested loop joins. The Kine implementation for YugabyteDB does this by executing the following statement at startup:

    ALTER DATABASE " + dbName + " set yb_bnl_batch_size=1024;

Let’s give this optimized YugabyteDB implementation a try.

First, stop the previous K3s service and drop the Kine schema from the YugabyteDB cluster:

1. Stop and delete the K3s service:

Shell
 
sudo /usr/local/bin/k3s-uninstall.sh
sudo rm -r /etc/rancher


2. Drop the schema:

SQL
 
psql -h 127.0.0.1 -p 5433 -U yugabyte

drop table kine cascade;


Next, start a Kine instance that provides the optimized version for YugabyteDB:

1. Clone the fork:

Shell
 
git clone https://github.com/dmagda/kine-yugabytedb.git && cd kine-yugabytedb


2. Start Kine:

Shell
 
go run . --endpoint "yugabytedb://yugabyte:yugabyte@127.0.0.1:5433/yugabyte"


Kine initiates without any issues. The only difference now is that instead of specifying 'postgres' in the connection string, we indicate 'yugabytedb' to enable the optimized YugabyteDB implementation. Regarding the actual communication between Kine and YugabyteDB, Kine continues to use the standard PostgreSQL driver for Go.

Building Kubernetes on an Optimized Version of Kine

Finally, let’s start K3s on this optimized version of Kine. 

To do that, we first need to build K3s from sources:

1. Stop the Kine instance started in the section above.

2. Clone the K3s repository:

Shell
 
git clone --depth 1 https://github.com/k3s-io/k3s.git && cd k3s


3. Open the go.mod file and add the following line to the end of the replace (..) section:

Go
 
github.com/k3s-io/kine => github.com/dmagda/kine-yugabytedb v0.2.0


This instruction tells Go to use the latest release of the Kine fork with the YugabyteDB implementation.

4. Enable support for private repositories and modules:

Shell
 
go env -w GOPRIVATE=github.com/dmagda/kine-yugabytedb


5. Make sure the changes take effect:

Shell
 
go mod tidy


6. Prepare to build a full version of K3s:

Shell
 
mkdir -p build/data && make download && make generate


7. Build the full version:

Shell
 
SKIP_VALIDATE=true make


It should take around five minutes to finish the build. 

Note: once you stop experimenting with this custom K3s build you can uninstall it following this instruction.

Running the Sample Workload on an Optimized Kubernetes Version

Once the build is ready, we can start K3s with the optimized version of Kine.

1. Navigate to the directory with the build artifacts:

Shell
 
cd dist/artifacts/


2. Start K3s by connecting to the local YugabyteDB cluster:

Shell
 
sudo ./k3s server \
  --token=sample_secret_token \
  --datastore-endpoint="yugabytedb://yugabyte:yugabyte@127.0.0.1:5433/yugabyte"


3. Confirm Kubernetes started successfully:

Shell
 
sudo ./k3s kubectl get nodes

NAME        STATUS   ROLES                  AGE     VERSION
ubuntu-vm   Ready    control-plane,master   4m33s   v1.27.4+k3s-36645e73


Now, let's deploy a sample application to ensure that the Kubernetes cluster is capable of more than just bootstrapping itself:

1. Clone a repository with Kubernetes examples:

Shell
 
git clone https://github.com/digitalocean/kubernetes-sample-apps.git


2. Deploy the Emojivoto application:

Shell
 
sudo ./k3s kubectl apply -k ./kubernetes-sample-apps/emojivoto-example/kustomize


3. Make sure all deployments and services start successfully:

Shell
 
sudo ./k3s kubectl get all -n emojivoto

NAME                            READY   STATUS    RESTARTS   AGE
pod/vote-bot-565bd6bcd8-rnb6x   1/1     Running   0          25s
pod/web-75b9df87d6-wrznp        1/1     Running   0          24s
pod/voting-f5ddc8ff6-69z6v      1/1     Running   0          25s
pod/emoji-66658f4b4c-wl4pt      1/1     Running   0          25s

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/emoji-svc    ClusterIP   10.43.106.87    <none>        8080/TCP,8801/TCP   27s
service/voting-svc   ClusterIP   10.43.14.118    <none>        8080/TCP,8801/TCP   27s
service/web-svc      ClusterIP   10.43.110.237   <none>        80/TCP              27s

NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/vote-bot   1/1     1            1           26s
deployment.apps/web        1/1     1            1           25s
deployment.apps/voting     1/1     1            1           26s
deployment.apps/emoji      1/1     1            1           26s

NAME                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/vote-bot-565bd6bcd8   1         1         1       26s
replicaset.apps/web-75b9df87d6        1         1         1       25s
replicaset.apps/voting-f5ddc8ff6      1         1         1       26s
replicaset.apps/emoji-66658f4b4c      1         1         1       26s


4. Make a call to the service/web-svc using the CLUSTER_IP:80 to trigger the application logic:

Shell
 
curl 10.43.110.237:80


The app will respond with the following HTML:

HTML
 
<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>Emoji Vote</title>
    <link rel="icon" href="/img/favicon.ico">

    <script async src="https://www.googletagmanager.com/gtag/js?id=UA-60040560-4"></script>
    <script>
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());
      gtag('config', 'UA-60040560-4');
    </script>
  </head>
  <body>
    <div id="main" class="main"></div>
  </body>

  <script type="text/javascript" src="/js" async></script>

</html>


In Summary

Job done! Kubernetes can now use YugabyteDB as a distributed and highly available SQL database for all its data. 

This allows us to proceed to the next phase: deploying Kubernetes and YugabyteDB in a genuine cloud environment across multiple availability zones and regions, and testing how the solution handles various outages. This warrants a separate blog post, so stayed tuned!

 

 

 

 

Top