How To Use Geo-Partitioning to Comply With Data Regulations and Deliver Low Latency Globally

In today's interconnected world, application users can span multiple countries and continents. Maintaining low latency across distant geographies while dealing with data regulatory requirements can be a challenge. The geo-partitioning feature of distributed SQL databases can help solve that challenge by pinning user data to the required locations.

So, let’s explore how you can deploy a geo-partitioned database cluster that complies with data regulations and delivers low latency across multiple regions using YugabyteDB Managed.

Deploying a Geo-Partitioned Cluster Using YugabyteDB Managed

YugabyteDB is an open-source distributed SQL database built on PostgreSQL. You can deploy a geo-partitioned cluster within minutes using YugabyteDB Managed, the DBaaS version of YugabyteDB.

Yugabyte DB: Create Cluster/Select regions

Getting started with a geo-partitioned YugabyteDB Managed cluster is easy. Simply follow the below: 

  1. Select the Multi-region Deployment option. When creating a dedicated YugabyteDB Managed cluster, choose the “multi-region” option to ensure your data is distributed across multiple regions.
  2. Set the Data Distribution Mode to “partitioned." Select the "partition by region" data distribution option so that you can pin data to specific geographical locations.
  3. Choose target cloud regions. Place database nodes in the cloud regions of your choice. In this blog, we spread data across two regions - South Carolina (us-east1) and Frankfurt (europe-west3).

Once you've set up a geo-partitioned YugabyteDB Managed cluster, you can connect to it and create tables with partitioned data.

Create a Geo-Partitioned Table

To demonstrate how geo-partitioning improves latency and data regulation compliance, let's take a look at an example Account table.

First, create PostgreSQL tablespaces that let you pin data to the YugabyteDB nodes in the USA (usa_tablespace) or in Europe (europe_tablespace):

SQL
 
CREATE TABLESPACE usa_tablespace WITH (
    replica_placement = '{"num_replicas": 3, "placement_blocks":
  [
    {"cloud":"gcp","region":"us-east1","zone":"us-east1-c","min_num_replicas":1},
    {"cloud":"gcp","region":"us-east1","zone":"us-east1-d","min_num_replicas":1},
    {"cloud":"gcp","region":"us-east1","zone":"us-east1-b","min_num_replicas":1}
  ]}'
);

CREATE TABLESPACE europe_tablespace WITH (
    replica_placement = '{"num_replicas": 3, "placement_blocks":
  [
    {"cloud":"gcp","region":"europe-west3","zone":"europe-west3-a","min_num_replicas":1},
    {"cloud":"gcp","region":"europe-west3","zone":"europe-west3-b","min_num_replicas":1},
    {"cloud":"gcp","region":"europe-west3","zone":"europe-west3-c","min_num_replicas":1}
  ]}'
);


Second, create the Account table and partition it by the country_code column:

SQL
 
CREATE TABLE Account (
    id integer NOT NULL,
    full_name text NOT NULL,
    email text NOT NULL,
    phone text NOT NULL,
    country_code varchar(3)
)
PARTITION BY LIST (country_code);


Third, define partitioned tables for USA and European records:

SQL
 
CREATE TABLE Account_USA PARTITION 
OF Account (id, full_name, email, phone, country_code, 
            PRIMARY KEY (id, country_code))
FOR VALUES IN ('USA') TABLESPACE usa_tablespace;

CREATE TABLE Account_EU PARTITION 
OF Account (id, full_name, email, phone, country_code, 
            PRIMARY KEY (id, country_code))
FOR VALUES IN ('EU') TABLESPACE europe_tablespace;


Now, let’s examine the read-and-write latency when a user connects from the United States.

Latency When Connecting From the United States

Let’s open a client connection from Iowa (us-central1) to a database node located in South Carolina (us-east1) and insert a new Account record:

SQL
 
INSERT INTO Account (id, full_name, email, phone, country_code) 
    VALUES (1, 'John Smith', 'john@gmail.com', '650-346-1234', 'USA');


As long as the country_code is 'USA', the record will be stored on the database node from South Carolina. The write and read latency will be approximately 30 milliseconds because the client requests need to travel between Iowa and South Carolina.

Connection to US nodes


Next, let’s see what happens when we add and query an account with the country_code set to 'EU':

SQL
 
INSERT INTO Account (id, full_name, email, phone, country_code) 
    VALUES (2, 'Emma Schmidt', 'emma@gmail.com', '49-346-23-1234', 'EU'); 

SELECT * FROM Account WHERE id=2 and country_code='EU';


Since this account must be stored in a European data center and must be transferred between the United States and Europe, the latency increases. 

Latency increase

The higher latency between the client connection in the USA and the database node in Europe signifies that the geo-partitioned cluster makes you compliant with data regulatory requirements. Even if the client from the USA connects to the US-based database node and writes/reads records of residents from the European Union, those records will always be stored/retrieved from database nodes in Europe.

Latency When Connecting From Europe

Let’s see how the latency improves if you open a client connection from Frankfurt (europe-west3) to the database node in the same region, and query the European record recently added from the USA:

Open a client connection from Frankfurt

This time the latency is as low as 3 milliseconds (vs. 130 ms when you queried the same record from the USA) because the record is stored in and retrieved from European data centers.

Adding and querying another European record also maintains low latency, as long as the data is not replicated to the United States.

SQL
 
INSERT INTO Account (id, full_name, email, phone, country_code) 
    VALUES (3, 'Otto Weber', 'otto@gmail.com', '49-546-33-0034', 'EU');

SELECT * FROM Account WHERE id=3 and country_code='EU';


Adding and querying another European record-

When accessing data stored in the same region, latency is significantly reduced. The result is a much better user experience while remaining compliant with data regulatory requirements.

Wrap Up

Geo-partitioning is an effective way to comply with data regulations and achieve global low latency. By deploying a geo-partitioned cluster using YugabyteDB Managed, it's possible to intelligently distribute data across regions, while maintaining high-performance querying capabilities.

 

 

 

 

Top