CockroachDB TIL: Volume 8

This is my series of articles covering short "today I learned" topics as I work with CockroachDB. Read the previous installments:

Topics

Topic 1: Generate Workload Data

The standard approach to generating workload data, say for tpcc workload, has always been the following:

 
cockroach workload init tpcc "postgresql://user:password@127.0.0.1:26257?sslmode=require"


This will generate the sample data and then you'd run the workload generator.

Another approach, but saving the individual insert statements to an output file, is basically replaying the entire load and having access to the load data.

 
cockroach gen example-data tpcc > generated.sql


This has the added benefit of having access to the individual schema changes and insert statements used to generate the sample tables. Another bonus is that you don't even need CockroachDB running in the background: this command generates the sample data offline.

Let's peek into the generated SQL file:

 
CREATE DATABASE IF NOT EXISTS tpcc;
SET DATABASE=tpcc;
DROP TABLE IF EXISTS "warehouse";
CREATE TABLE "warehouse" (
                w_id        integer       not null primary key,
                w_name      varchar(10)   not null,
                w_street_1  varchar(20)   not null,
                w_street_2  varchar(20)   not null,
                w_city      varchar(20)   not null,
                w_state     char(2)       not null,
                w_zip       char(9)       not null,
                w_tax       decimal(4,4)  not null,
                w_ytd       decimal(12,2) not null
        );
INSERT INTO "warehouse" VALUES (0,'8','17','13','11','SF','640911111',0.080600,300000.000000);
...
INSERT INTO "customer" VALUES (32,1,0,'8RsaCXoEzmssaF9','OE','BARPRIOUGHT','m9cdLXe0YhgLRr','wsmd68P2bEl','Agrnp8ueWNXJpBB0','PC','308211111','9473294232201446','2006-01-02 15:04:05','GC',50000.000000,0.043600,-10.000000,10.000000,1,0,'ObpVWo1BahdejZrKB2O3Hzk13xWSP8P9fwb2ZjtZAs3NbYdihFxFime6B6Adnt5jrXvRR7OGYhlpdljbDvShaRF4E9zNHsJ7ZvyiJ3n2X1f4fJoMgn5buTDyUmQupcYMoPylHqYo89SqHqQ4HFVNpmnIWHyIowzQN2r4uSQJ8PYVLLLZk9Epp6cNEnaVrN3JXcrBCOuRRSlC0zvh9lctkhRvAvE5H6TtiDNPEJrcjAUOegvQ1Ol7SuF7jPf275wNDlEbdC58hrunlPfhoY1dORoIgb0VnxqkqbEWTXujHUOvCRfqCdVyc8gRGMfAd4nWB1rXYANQ0fa6ZQJJI2uTeFFazaVwxnN13XunKGV6AwCKxhJQVgXWaljKLJ7r175FAuGYFLyxJvnAUXEp2watyJTTtfENexKnKSQN6');
INSERT INTO "customer" VALUES (33,1,0,'mMmp6NHnwiw','OE','BARPRIABLE','Kdcgphy3v1U5yraP','xxELo5B1fcW8RsaCXoE','zmssaF9m9cdLXe0YhgLR','ZT','230811111','2947329423220144','2006-01-02 15:04:05','GC',50000.000000,0.431800,-10.000000,10.000000,1,0,'rwsmd68P2bElAgrnp8ueWNXJpBB0ObpVWo1BahdejZrKB2O3Hzk13xWSP8P9fwb2ZjtZAs3NbYdihFxFime6B6Adnt5jrXvRR7OGYhlpdljbDvShaRF4E9zNHsJ7ZvyiJ3n2X1f4fJoMgn5buTDyUmQupcYMoPylHqYo89SqHqQ4HFVNpmnIWHyIowzQN2r4uSQJ8PYVLLLZk9Epp6cNEnaVrN3JXcrBCOuRRSlC0zvh9lctkhRvAvE5H6TtiDNPEJrcjAUOegvQ1Ol7SuF7jPf275wNDlEbdC58hrunlPfhoY1dORoIgb0VnxqkqbEWTXujHUOvCRfqCdVyc8gRGMfAd4nWB1rXYANQ0fa6ZQJJI2uTeFFazaVwxnN13XunKGV6AwCKxhJQVgXWaljKLJ7r175FAuGYFLyxJvnAUXEp2watyJTTtfENexKnKSQN6vWniabVBVqad2oZO92wV1AnAKYTj7QrlNHQ');


You can also take that a step further by sending the standard output directly to the CockroachDB client. In this case, however, you do need a running cluster.

 
cockroach gen example-data tpcc | cockroach sql --url "postgresql://demo:demo45171@127.0.0.1:26257/defaultdb?sslmode=require"


There may or may not be a reason you'd want to leverage this, but it seems useful to have access to the schema and data in case you'd like to have control over the input data prior to loading the tables.

Topic 2: Convert Hex Keys to Human-Readable Form

Imagine you have to diagnose rows in CockroachDB stored in ranges. Typically using something like SHOW RANGES will output keys in hex representation. For example:

 
demo@127.0.0.1:26257/movr> SELECT start_key, end_key FROM [SHOW RANGES FROM TABLE rides] LIMIT 2;
                                  start_key                                 |                                  end_key
----------------------------------------------------------------------------+----------------------------------------------------------------------------
  NULL                                                                      | /"amsterdam"/"\xc5\x1e\xb8Q\xeb\x85@\x00\x80\x00\x00\x00\x00\x00\x01\x81"
  /"amsterdam"/"\xc5\x1e\xb8Q\xeb\x85@\x00\x80\x00\x00\x00\x00\x00\x01\x81" | /"boston"/"8Q\xeb\x85\x1e\xb8B\x00\x80\x00\x00\x00\x00\x00\x00n"


This form may make sense to our engineers, but for the common folks, we have to defer to some conversion mechanisms. There is however an easy way to look up the underlying value in human-readable form.

Copying the hex string, using a b prefix to decode into bytes (BYTEA) type and cast it to uuid.

 
SELECT b'\xc5\x1e\xb8Q\xeb\x85@\x00\x80\x00\x00\x00\x00\x00\x01\x81'::uuid;


 
                  uuid
----------------------------------------
  c51eb851-eb85-4000-8000-000000000181


I will have to remember this as I find myself debugging replica placements quite often.

Topic 3: Show All K/V Pairs Within the Requested Range

Related to the previous topic, imagine you'd like to look at a particular range within a table, and then look at all of the K/V pairs within that range. You can do it with a function I just learned about: crdb_internal.list_sql_keys_in_range(range_id).

 
SELECT range_id FROM [SHOW RANGES FROM TABLE rides];


 
  range_id
------------
        46
        61
        60
        59
        66
        58
       125
       124
       134


Let's look at range 66:

 
SELECT * FROM crdb_internal.list_sql_keys_in_range(66);


 
...
  /Table/55/1/"paris"/"J\xd3r\xf0b\xb3G\x17\xbbu\x89\x82\x92\x01\xfb\xef"/0             | 72c2f2f70a360570617269731ca633f13c3b664c37bed99d78fdc981051c1e428845cd02453da1cd6cfa1a0f68b21619313832343620416c6c656e204c6f646765204170742e20353228f0848ba60cf0d7d18f03250434890384
  /Table/55/1/"paris"/"N\xce\xf7\xc1\x81aM\xf8\xaeϫ\x8b\x9f\xa0\xc23"/0                 | 53ce68560a360570617269731c3933b51754af4cdab98544bf1ef46ab71cbdf58ff0b2314e59a4b2a2aa8bdf8e6816163738333034204b6174686572696e6520536b7977617928ec848ba60ce0faa898052504348a24b8
...


Now combining the knowledge from the previous topic, you can decode the key associated with the row.

Apparently, this function has been available since CockroachDB 20.2.

Topic 4: Using pgpass With the cockroach Binary

We've covered pgpass in this series a few times: first introduced in TIL 2, and later in TIL 6. Needless to say, this is a topic I'm excited about! The Cockroach Labs engineering team recently switched from the deprecated lib/pq to a more modern jackc/pgx library across the board. This task allows our team to move the innovation forward and deprecate some of the primitive code paths. With that, there are many new capabilities we can gain right out of the gate. The two I am most excited about are the ability to leverage pgpass with the cockroach binary and potentially allowing cockroach workload command to work with gssapi, which I will investigate in a separate article. That said, today, I am going to demonstrate a way to use pgpass natively with the cockroach binary. The setup for pgpass is identical whether you're using a psql or the cockroach client (use the TIL 2 article as a guideline to set it up). I am using a CockroachDB Serverless instance I spun up with our brand new ccloud CLI.

 
ccloud cluster create serverless artem-serverless
ccloud cluster sql artem-serverless
Retrieving cluster info: succeeded
 Downloading cluster cert to /Users/artem/.postgresql/root.crt: succeeded
Retrieving SQL user list: succeeded
No SQL users found. Create one now?: Y
Create a new SQL user:
New username: artem
New password: *************
Confirm password: *************
Looking up cluster ID: succeeded
Creating SQL user: succeeded
Success! Created SQL user 
 name: artem 
 cluster: artem-serverless 
Starting CockroachDB SQL shell...
#
# Welcome to the CockroachDB SQL shell.
...
artem@free-tier11.gcp-us-east1.cockroachlabs.cloud:26257/defaultdb>


Create a pgpass file in the format hostname:port:database:username:password using the connection properties. Note that the database name can be found with the following command:

 
ccloud cluster sql artem-serverless --connection-url
Retrieving cluster info: succeeded
 Downloading cluster cert to /Users/artem/.postgresql/root.crt: succeeded
postgresql://free-tier11.gcp-us-east1.cockroachlabs.cloud:26257/defaultdb?options=--cluster%3Dartem-serverless-1013&sslmode=verify-full&sslrootcert=%2FUsers%2Fartem%2F.postgresql%2Froot.crt


In our case it is artem-serverless-1013.defaultdb. The pgpass file will look like so:

free-tier11.gcp-us-east1.cockroachlabs.cloud:26257:artem-serverless-1013.defaultdb:artem:password

 
chmod 600 ~/.pgpass
export PGPASSFILE=/filepath/.pgpass


As a sanity check, let's try with the psql client.

 
psql -h free-tier11.gcp-us-east1.cockroachlabs.cloud -p 26257 -d artem-serverless-1013.defaultdb                                               
psql (14.3, server 13.0.0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

artem-serverless-1013.defaultdb=> 


And the moment you've been waiting for:

 
cockroach sql --host=free-tier11.gcp-us-east1.cockroachlabs.cloud:26257 --database=artem-serverless-1013.defaultdb --user artem


 
#
# Welcome to the CockroachDB SQL shell.
...
artem@free-tier11.gcp-us-east1.cockroachlabs.cloud:26257/defaultdb> 


Another way we can do this is with the following command:

 
cockroach sql --url "postgresql://artem@free-tier11.gcp-us-east1.cockroachlabs.cloud:26257/artem-serverless-1013.defaultdb?sslmode=verify-full"


The key part of this approach is to get the database name right, i.e. artem-serverless-1013.defaultdb, and not rely on the other form defaultdb?options=--cluster%3Dartem-serverless-1013. This commonly trips me up!

Topic 5: Get Table Size From CLI

A question we get often is how to get table or database size in the CLI, similar to how our DBConsole reports in a web browser. I started looking at this issue when a community Slack user asked whether CockroachDB has a similar command to Postgres's metacommand \l+. Here's a sample output of the said command:

 
                                                           List of databases
   Name    | Owner | Encoding | Collate | Ctype | Access privileges |  Size   | Tablespace |                Description                 
-----------+-------+----------+---------+-------+-------------------+---------+------------+--------------------------------------------
 postgres  | artem | UTF8     | C       | C     |                   | 8665 kB | pg_default | default administrative connection database
 template0 | artem | UTF8     | C       | C     | =c/artem         +| 8697 kB | pg_default | unmodifiable empty database
           |       |          |         |       | artem=CTc/artem   |         |            | 
 template1 | artem | UTF8     | C       | C     | artem=CTc/artem  +| 8665 kB | pg_default | default template for new databases
           |       |          |         |       | =c/artem          |         |            | 
(3 rows)


The quickest way (and in my opinion, the easiest) to get the size of the database is to run a query similar to the following:

 
WITH x AS (select table_name, sum(range_size_mb) AS size from [show ranges from database movr] group by table_name)                                                                            SELECT SUM(size) FROM x;


I'm trying to get a confirmation internally whether this is logically correct; but in the meantime, here's a query provided by engineering to get table sizes:

 
WITH range_stats AS (
SELECT table_name, index_name, range_id, crdb_internal.range_stats(start_key) AS stats
FROM crdb_internal.ranges_no_leases
WHERE database_name = 'movr'
)
SELECT table_name, index_name, round(sum((stats->>'key_bytes')::INT + (stats->>'val_bytes')::INT)/1024/1024) AS MiB, count(*)
FROM range_stats
GROUP BY table_name, index_name
ORDER BY MiB desc;


Our DBConsole is scraping the Prometheus endpoint to come up with a database size. For the purposes of this topic, it is outside the scope of this conversation. I filed an issue to add a metacommand to present the size in an easy way; until then, feel free to use the methods above to get the information.

One other observation about the issue is that neither method works in CockroachDB Serverless. That's probably why Serverless UI chose to scrape the Prometheus endpoint instead. The reason it doesn't work is that the queries above require more privileged access than the tenant user is allowed. For curiosity's sake, here's the output you may see:

 
artem@free-tier11.gcp-us-east1.cockroachlabs.cloud:26257/defaultdb> show ranges from table test;             
ERROR: RangeIterator failed to seek to /Meta2/"\x00": rpc error: code = Unauthenticated desc = requested key /Meta2/"\x00" not fully contained in tenant keyspace /Tenant/164{0-1}


We are aware of the issue and are actively working on it.

Happy querying!

 

 

 

 

Top