How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally

2024-12-02

In today's interconnected world, application users can span multiple countries and continents. Maintaining low latency across distant geographies while dealing with data regulatory requirements can be a challenge. The geo-partitioning feature of distributed SQL databases can help solve that challenge by pinning user data to the required locations.

So, let’s explore how you can deploy a geo-partitioned database cluster that complies with data regulations and delivers low latency across multiple regions using YugabyteDB Managed.

Deploying a Geo-Partitioned Cluster Using YugabyteDB Managed

YugabyteDB is an open-source distributed SQL database built on PostgreSQL. You can deploy a geo-partitioned cluster within minutes using YugabyteDB Managed, the DBaaS version of YugabyteDB.

How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally

Getting started with a geo-partitioned YugabyteDB Managed cluster is easy. Simply follow the below:

Select the Multi-region Deployment option. When creating a dedicated YugabyteDB Managed cluster, choose the “multi-region” option to ensure your data is distributed across multiple regions.
Set the Data Distribution Mode to “partitioned." Select the "partition by region" data distribution option so that you can pin data to specific geographical locations.
Choose target cloud regions. Place database nodes in the cloud regions of your choice. In this blog, we spread data across two regions - South Carolina (us-east1) and Frankfurt (europe-west3).

Once you've set up a geo-partitioned YugabyteDB Managed cluster, you can connect to it and create tables with partitioned data.

Create a Geo-Partitioned Table

To demonstrate how geo-partitioning improves latency and data regulation compliance, let's take a look at an example Account table.

First, create PostgreSQL tablespaces that let you pin data to the YugabyteDB nodes in the USA (usa_tablespace) or in Europe (europe_tablespace):

     SQL 
   
 
 
   CREATE TABLESPACE usa_tablespace WITH (
    replica_placement = '{"num_replicas": 3, "placement_blocks":
  [
    {"cloud":"gcp","region":"us-east1","zone":"us-east1-c","min_num_replicas":1},
    {"cloud":"gcp","region":"us-east1","zone":"us-east1-d","min_num_replicas":1},
    {"cloud":"gcp","region":"us-east1","zone":"us-east1-b","min_num_replicas":1}
  ]}'
);

CREATE TABLESPACE europe_tablespace WITH (
    replica_placement = '{"num_replicas": 3, "placement_blocks":
  [
    {"cloud":"gcp","region":"europe-west3","zone":"europe-west3-a","min_num_replicas":1},
    {"cloud":"gcp","region":"europe-west3","zone":"europe-west3-b","min_num_replicas":1},
    {"cloud":"gcp","region":"europe-west3","zone":"europe-west3-c","min_num_replicas":1}
  ]}'
); 
  

num_replicas: 3 - Each tablespace requires you to store a copy of data across 3 availability zones within a region. This lets you tolerate zone-level outages in the cloud.

Second, create the Account table and partition it by the country_code column:

     SQL 
   
 
 
   CREATE TABLE Account (
    id integer NOT NULL,
    full_name text NOT NULL,
    email text NOT NULL,
    phone text NOT NULL,
    country_code varchar(3)
)
PARTITION BY LIST (country_code); 
  

Third, define partitioned tables for USA and European records:

     SQL 
   
 
 
   CREATE TABLE Account_USA PARTITION 
OF Account (id, full_name, email, phone, country_code, 
            PRIMARY KEY (id, country_code))
FOR VALUES IN ('USA') TABLESPACE usa_tablespace;

CREATE TABLE Account_EU PARTITION 
OF Account (id, full_name, email, phone, country_code, 
            PRIMARY KEY (id, country_code))
FOR VALUES IN ('EU') TABLESPACE europe_tablespace; 
  

FOR VALUES IN ('USA') - If the country_code is equal to the ‘USA’, then the record is automatically placed or queried from the Account_USA partition that is stored in the usa_tablespace (the region in South Carolina).
FOR VALUES IN ('EU') - Otherwise, if the record belongs to the European Union (country_code is equal to 'EU'), then it’s stored in the Account_EU partition from the europe_tablespace (the region in Frankfurt).

Now, let’s examine the read-and-write latency when a user connects from the United States.

Latency When Connecting From the United States

Let’s open a client connection from Iowa (us-central1) to a database node located in South Carolina (us-east1) and insert a new Account record:

     SQL 
   
   INSERT INTO Account (id, full_name, email, phone, country_code) 
    VALUES (1, 'John Smith', 'john@gmail.com', '650-346-1234', 'USA');

As long as the country_code is 'USA', the record will be stored on the database node from South Carolina. The write and read latency will be approximately 30 milliseconds because the client requests need to travel between Iowa and South Carolina.

How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally

Next, let’s see what happens when we add and query an account with the country_code set to 'EU':

     SQL 
   
   INSERT INTO Account (id, full_name, email, phone, country_code) 
    VALUES (2, 'Emma Schmidt', 'emma@gmail.com', '49-346-23-1234', 'EU'); 

SELECT * FROM Account WHERE id=2 and country_code='EU';

Since this account must be stored in a European data center and must be transferred between the United States and Europe, the latency increases.

How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally

The latency for the INSERT (230 ms) is higher than for the SELECT (130 ms) because during the INSERT the record is replicated across three availability zones in Frankfurt.

The higher latency between the client connection in the USA and the database node in Europe signifies that the geo-partitioned cluster makes you compliant with data regulatory requirements. Even if the client from the USA connects to the US-based database node and writes/reads records of residents from the European Union, those records will always be stored/retrieved from database nodes in Europe.

Latency When Connecting From Europe

Let’s see how the latency improves if you open a client connection from Frankfurt (europe-west3) to the database node in the same region, and query the European record recently added from the USA:

How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally

This time the latency is as low as 3 milliseconds (vs. 130 ms when you queried the same record from the USA) because the record is stored in and retrieved from European data centers.

Adding and querying another European record also maintains low latency, as long as the data is not replicated to the United States.

     SQL 
   
   INSERT INTO Account (id, full_name, email, phone, country_code) 
    VALUES (3, 'Otto Weber', 'otto@gmail.com', '49-546-33-0034', 'EU');

SELECT * FROM Account WHERE id=3 and country_code='EU';

How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally

When accessing data stored in the same region, latency is significantly reduced. The result is a much better user experience while remaining compliant with data regulatory requirements.

Wrap Up

Geo-partitioning is an effective way to comply with data regulations and achieve global low latency. By deploying a geo-partitioned cluster using YugabyteDB Managed, it's possible to intelligently distribute data across regions, while maintaining high-performance querying capabilities.