A Guide to Transparent Data Encryption in PostgreSQL
Introduction
In the past decade, Data Security has become one of the biggest concerns for both small and large organizations. With more and more data breaches occurring every day, and legislators struggling to keep up with innovation, it is essential to protect sensitive data by all means necessary. Therefore, the first defense mechanism of every organization should be setting up proper network access and firewalls to prevent unauthorized access and secure all the endpoints.
However, security analysts should always work under the assumption that an attacker will find a way to break into a system. This is where encryption becomes crucial. Data encryption is a method by which one transforms data to make it ineligible by rewriting it in some code. Data encryption is not a new concept. We have been using various encryption techniques from ancient times to protect information from enemies in a data breach. However, encryption has come a long way in the past decade or two.
Transparent Data Encryption, or TDE, is used to secure the data at rest. In other words, it encrypts the data in a database to prevent an attacker from reading the data if they break the first line of defense.
How Does Transparent Data Encryption Work?
The inner workings of Transparent Data Encryption are pretty simple. A TDE does real-time Input/Output encryption and description of the data in the database. When you first enter some data, Transparent Data Encryption uses a database encryption key or a DEK to encrypt the data.
The data will be stored in encrypted form, and whenever you request some data from the database, you provide an authorization key or whitelist the IP address in the database. Transparent Data Encryption will then decrypt the data as it is being sent to you.
Usually, the database boot record will save the database encryption key. You can use the saved key in the boot record to receive the data if you forget the key. The database encryption key or the DEK is a symmetric key and is usually secured by a certificate stored in the server's master database. You can also secure a database encryption key using an asymmetric key that an Extensible Key Management (EKM) module stores.
One of the best things about Transparent Data Encryption is that it helps you follow various guidelines of the data storage industry. Transparent Data Encryption makes it simpler for an application developer to work on the data as they don't have to design logic to encrypt the data again. Using Transparent Data Encryption, developers can encrypt the data using AES, 3DS, or any standard data encryption algorithm.
Transparent Data Encryption does not guarantee the security of your data. It merely delays the attacker from being able to convert the data into a legible form. Security is not always in our hands, but when it is, in cases like these, we can make the best use of it. In this case, using a 128-bit key will make the attacker's work a herculean task. It takes around one billion billion years for someone to crack a 128-bit key using brute force. An attacker will have to resort to social engineering to get rough information on the key, so ensure that you implement the best anti-social engineering methods in your organization.
What Could Happen If We Don't Use Transparent Data Encryption?
Transparent Data Encryption is essential to ensure that the data stays safe even if the outer defense layers are broken. In today's IT space, organizations are being attacked by hackers left and right. Even though we cannot completely prevent the attacks, encryption is the best way to ensure that data stays safe even in an attack.
One of the first things every database administrator checks is whether the data at rest is encrypted. Defense in depth (DiD) is a set of principles and techniques that help protect an organization's data even in a breach, and encryption is the first step in this process.
If we don't use transparent data encryption, the data at rest will be at very high risk. Every security system should be designed with the assumption that an attacker can find their way into the network through one means or the other. Without Transparent Data Encryption, a security plan will fall apart rather quickly.
Data encryption has become a major part of the security standard in the past decade. As a result, there haven't been too many cases in which the data was completely unencrypted. However, there have been cases in which the encryption was not good enough.
One such case was the Zynga data breach. According to this case study, Zynga used an SHA-1 encryption algorithm to store users' passwords. Unfortunately, the encryption was not strong enough, and an attacker named 'Gnosticplayers' breached the database obtaining millions of users' data.
The above incident didn't have to be so complicated. Using a strong Transparent Data Encryption could have easily stopped or delayed the attacker's attempts. Furthermore, using Transparent Data Encryption could have helped control many similar incidents as the Zynga data breach. Today, most companies realized the importance of Transparent Data Encryption and are actively using it to secure the data. Even database providers like IBM Cloudant, Azure SQL Database are offering Transparent Data Encryption by default.
What Kinds of Encryption Are Available?
The first thing you need to do is to decide on an encryption level for your PostgreSQL instance. However, before you can decide which balance of security considerations, performance, stability, and ease of use is the right one, let's review the overall architecture of PostgreSQL:
As shown above, the user data is first received by the server from the client, then written to the server cache, and finally flushed into the disk.
The physical structure of PostgreSQL storage is cluster -> table space -> database -> relationship object.
From this, we can divide the database into 6 levels for encryption.
Client-Level Encryption
Client-level encryption generates a key by the user and encrypts the segment.
Pros: The encryption granularity is small, and the amount of encrypted data is controllable. The existing encryption plug-in pgcrypto can be used for client data encryption.
Cons: The cost-of-use is high since the existing application system needs to be adjusted, and the data insertion statement is modified. Also, since the encryption is started from the data generation, it is equal to the cache level encryption, the performance is poor, and the index cannot be used.
Server-Level Encryption
Here, the server encrypts, establishes an encryption type, and encrypts the segment.
Pros: The cost-of-use is relative to the encrypted copy of the client, only need to adjust the database, no need to modify the application, the encryption granularity is small, and the amount of encrypted data is controllable.
Cons: The same is cache-level encryption, poor performance, the index can not be used.
Cluster-Level Encryption
Cluster-level encryption encrypts the entire cluster and determines whether the cluster is encrypted during initialization.
Pros: Simple architecture, low cost-of-use, operating system cache level encryption (data cache brushing, encryption, and decryption when reading disk), good performance.
Cons: The encryption is fine-grained, and all cluster internal objects are encrypted, which will cause performance degradation.
Tablespace-Level Encryption
Tablespace-level encryption sets encryption attributes for a certain table space, all encrypted inside the encrypted tablespace.
Pros: Simple architecture, low cost-of-use, operating system cache-level encryption, good performance, reduced fine-grained encryption, better control of the amount of encrypted data, and favorable data encryption efficiency.
Cons: The concept of the tablespace in PostgreSQL is not clear enough, users are easily misunderstood, and secondly, the cost of use in backup management is higher.
Database-Level Encryption
Database-level encryption specifies a library as an encryption library, and all objects in the encryption library are replaced with encryption;
Pros: Simple architecture, low cost-of-use, operating system cache level encryption, reduced fine-grained encryption, high data encryption efficiency.
Cons: The fine-grained encryption doesn't always work, and backup recovery can be cumbersome.
Table-Level Encryption
Table-level encryption specifies individual objects to be encrypted.
Pros: Simple architecture, low cost of use, operating system cache level encryption, lower and lower encryption granularity, and high data encryption efficiency.
Cons: The key management cost is reduced, and the development complexity is higher. Also, when the object to be encrypted is vertical, the cost-of-use is high.
How to Implement Tablespace-Level Encryption
Each of these encryption methods of implementation, and there isn't one simple way to summarize them. To limit the scope of this tutorial, I will only discuss a single one. My personal go-to method is tablespace-level encryption which is why I will focus on it today. The process is relatively straightforward and consists of the following steps:
1. Edit the Postgres configuration file postgresql.conf and update the settings below:
■ keystore_location: This tells the database where to store the encryption key.
■ tablespace_encryption_algorithm: This specifies the type of encryption you want to use - in this example, we set it to AES256 (Advanced Encryption Standard 256-bit).
2. Execute the SQL function pgx_set_master_key with a passphrase of your choice to create the master key - in this example, we are using safe data. This will create the file keystore.ks in the location specified in the previous step (see the inset image below).
3. Restart the server so that it can reflect the new settings in postgresql.conf - in this example, I included the option -- Keystore-passphrase to the command to tell Postgres to open the Keystore upon restart.
4. Create a tablespace (I've called it tbsencrypt) and specify its physical location.
And that is all; data written to and associated with this tablespace from now on will be encrypted: table data, indexes, temp files, WAL, backups, and archive logs.
What If Someone Gains Access to Your Environment?
Even if a hacker gains access to your system and reads the files that store your data, it will be secured with an encrypted tablespace. The data would be exposed in non-encrypted situations.
TDE, as previously stated, does not prevent hackers from obtaining data; rather, it prevents them from viewing the material. How safe is it, then? Is it possible to decrypt it?
A hacker's only option for dealing with this type of encrypted material is to use brute force assaults against it. The attacker will use this method to test every conceivable key combination until they succeed.
This is why the best way to secure your data increase the key length and, therefore, to increase the number of viable combinations as much as possible.
An 8-bit encryption key only allows for 256 permutations, as seen in the diagram below. While many companies consider 128-bit encryption to be sufficient, FEP also offers 256-bit encryption, allowing for many possible keys.
Key size | Number of possible combinations |
1-bit | 2 |
8-bit | 256 |
16-bit | 65,536 |
64-bit | 1.8 x 1019 |
128-bit*† | 3.4 x 1038 |
192-bit* | 6.2 x 1057 |
256-bit*† | 1.1 x 1077 |
Even with the processing power of today's supercomputers, the amount of time required to process the number of combinations for 128-bit encryption is greater than the age of the earth. I'm not a security expert, but I would certainly consider a multiple-front approach to securing my data.
Conclusion
Transparent Data Encryption is a method to keep the data at rest safe. Even though encryption won't prevent a hacker from breaking into the system, it will make it implausible for an attacker to understand the information they stole. Transparent Data Encryption uses a single database encryption key or a DEK to keep the data safe. Ensure that this key is stored in a safe place for recovery. If an attacker obtains this key, the whole Transparent Data Encryption will be quite useless.