Using CockroachDB CDC With Azure Event Hubs

Previous Articles on CockroachDB CDC

Motivation

Azure Event Hubs is a critical part of the Azure ecosystem. We're in the early stages of adopting Azure and while we focus on the official integration, I'd like to provide workarounds in the meantime.

This tutorial is using enterprise changefeeds: you will need an enterprise license, access to a CockroachDB dedicated cluster, or enable billing in your CockroachDB Serverless cluster to activate enterprise features like CDC to Kafka.

High-Level Steps

Step-By-Step Instructions

Deploy Azure Event Hubs

You will need an Azure Event Hubs account. You can sign up for a free account.

Once you're done, follow the steps outlined in this quickstart to create an instance of Azure Event Hubs.

High-level steps:

Once complete, create the deployment:

Create Namespace: Review + Create

Create Event Hub: Name

The steps are equivalent to creating topics in Kafka. For example:

confluent kafka topic create stock --partitions 6
confluent kafka topic create history --partitions 6


I ended up with an Event Hubs namespace called artemeventhubs and Event Hubs named stock and history.

artemeventhubs Screenshot

Once created, you need a SAS Policy to access your Event Hubs:

Add SAS Policy to access Event Hubs

Capture the SAS policy details, as we will need that for the next step. You need to click on the SAS policy to open the details dialog. As of now, it seems every Event Hub has its own associated SAS policy. If I figure out a way to use the same policy for more than one topic, I will highlight it.

SAS Policy: saspolicystock

In the SAS Policy Details, capture the "Connection string-primary key."

Endpoint=sb://artemeventhubs.servicebus.windows.net/;SharedAccessKeyName=saspolicytpcc;SharedAccessKey=<REDACTED>;EntityPath=history


Deploy a CockroachDB Cluster With Enterprise Changefeeds

You can spin up a dedicated cluster using the following directions. My cluster is a 3-node cluster in AWS with AZ failure tolerance in us-east-1.

To enable CDC we need to execute the following command:

SET CLUSTER SETTING kv.rangefeed.enabled = true;


Event Hubs supports Kafka protocol with port 9093. We can use the connection string URL from the SAS policy and protocol kafka://. The equivalent of kafka://<confluent cloud kafka endpoint url>:9092 in Event Hubs is kafka://artemeventhubs.servicebus.windows.net:9093. The sasl_user and sasl_password section is where it gets tricky. I owe a huge thanks to the article "How to Use Kafka Client with Azure Event Hubs" for providing answers for the associated fields. sasl_user will be set to $ConnectionString just as the article says. sasl_password however, is a bit tricky. Considering CockroachDB expects a URL-encoded secret key, it took me several tries before I got it right. The trick is to URL-encode the entire connection string from the SAS Policy. I'm consistently relying on the following URL-encoder service to URL-encode these values.

Create a Changefeed With the Event Hubs Information

CREATE CHANGEFEED FOR TABLE history INTO "kafka://artemeventhubs.servicebus.windows.net:9093?tls_enabled=true&sasl_enabled=true&sasl_user=$ConnectionString&sasl_password=Endpoint%3Dsb%3A%2F%2Fartemeventhubs.servicebus.windows.net%2F%3BSharedAccessKeyName%3Dsaspolicytpcc%3BSharedAccessKey<REDACTED>EntityPath%3Dhistory&sasl_mechanism=PLAIN" WITH updated, format = json;
        job_id
----------------------
  801162005835612162
(1 row)

NOTICE: changefeed will emit to topic history

Time: 235ms total (execution 215ms / network 20ms)


The only thing that remains is generating a workload. We are going to use the TPC-C workload bundled with the Cockroach binary. In a new terminal window, run the following two commands:

Generate Sample Data

cockroach workload fixtures import tpcc --warehouses=10 "postgresql://<user>@<Cockroach Cloud Dedicated url>:26257/tpcc?sslmode=verify-full&sslrootcert=/path/certs/cluster-ca.crt"     


Execute the Workload

cockroach workload run tpcc --warehouses=10 --ramp=3m --duration=1h "postgresql://<user>@<Cockroach Cloud Dedicated url>:26257/tpcc?sslmode=verify-full&sslrootcert=/path/certs/cluster-ca.crt"


Verify

The only thing that's left is to confirm the messages are sent to Azure Event Hubs. In the Azure Console, navigate to the individual Event Hubs. You can see the changing message counters:

Changing message counters

The simplest way to view messages in the Event Hubs is to click the "Process data" option on the Event Hub page, then select the "Enable real-time insights from events" option.

Enable real-time insights from events

There, an SQL Editor window will open and load messages:

SQL Editor window will open and load messages

And this is how you can leverage existing CockroachDB capability with non-standard services like Azure Event Hubs. Hopefully, you've found this a viable solution until Event Hubs is a first-class citizen in CockroachDB.

 

 

 

 

Top