Automate Azure Databricks Unity Catalog Permissions at the Catalog Level

Disclaimer: All the views and opinions expressed in the blog belong solely to the author and not necessarily to the author's employer or any other group or individual. This article is not a promotion for any cloud/data management platform. All the images and code snippets are publicly available on the Azure/Databricks website.

What Is Unity Catalog in Databricks?

Databricks Unity Catalog is a data cataloging tool that helps manage and organize data across an organization in a simple, secure way. It allows companies to keep track of all their data, making it easier to find, share, and control who can access it. Unity Catalog works across different cloud storage systems and lets teams manage permissions, governance, and data access from one place, ensuring data is used safely and efficiently. 

How Privileges Work in the Unity Catalog Hierarchical Model

Securable objects in Unity Catalog

Securable objects in Unity Catalog

In Unity Catalog, privileges (permissions) work in a hierarchical model, meaning they follow a specific structure from top to bottom. Here's how it works in simple terms:

  1. At the Top: Metastore —This is like a big container that holds all the databases and data. If someone has access to this level, they can control everything inside.
  2. Next: Catalog — Inside the metastore, there are catalogs. These are smaller containers that group related data. Privileges here control access to everything in that catalog.
  3. Inside the Catalog: Schema — Each catalog holds one or more schema. Privileges at this level decide who can access the data tables within a specific schema.
  4. At the Bottom: Tables/Views — Inside the schema, there are tables and views, which are the actual data. Privileges here allow control over who can read or modify specific pieces of data.

Privileges flow downward. For example, if you have access at the catalog level, you automatically have access to the schema(s) and table(s) within it, unless more specific permissions are set at lower levels.

When to Set Privileges at the Catalog Level

One should set privileges at the catalog level when you want to control access to multiple schemas and tables/views within that catalog at once. This can be useful in several situations:

  1. Broad Access Control: If you want to give users or teams access to all the schemas and data inside a catalog, setting privileges at the catalog level saves time. For example, granting a data analyst team access to all sales-related data in one go.
  2. Consistent Permissions: When you need to ensure that everyone with access to the catalog has consistent permissions across its schema(s) and tables. This is helpful for keeping things organized and avoiding mismatches in access rules.
  3. Ease of Management: If you have many schemas under the same catalog and you don’t want to manage permissions for each one individually, setting privileges at the catalog level simplifies permission management.
  4. Department/Team-Based Access: If a catalog represents a specific department (like Marketing or Finance), you can set privileges at the catalog level to give that department access to all relevant data without managing each schema(s) separately.

Why We Need to Automate Unity Catalog Privileges

Automating Unity Catalog privileges is important for several key reasons:

How the Script Works

Prerequisites

Step 1: Declare the Variables 

Create a notebook in Databricks workspace. To create a notebook in your workspace, click the "+" New in the sidebar, and then choose Notebook

Create a notebook in the Databricks workspace

A blank notebook opens in the workspace. Make sure Python is selected as the notebook language.

Copy and paste the code snippet below in the notebook cell and run the cell.

Python
 
catalog = '' # Specify the catalog name in the blank text section
principals_arr = '' # Specify the Comma(,) seperated values for principals in the blank text section (e.g. groups, username)
principals = principals_arr.split(',')
privileges_arr =  'SELECT,BROWSE' # Specify the Comma(,) seperated values for priviledges in the blank text section (e.g. SELECT,BROWSE)
privileges = privileges_arr.split(',')


Step 2: Set the Catalog

Copy, paste and run the below code block in a new or in the existing cell.

Python
 
query = f"USE CATALOG `{catalog}`"
spark.sql(query) 


Step 3: Loop Through the Principals and Privileges and Apply Grant at the Catalog

Copy, paste, and run the below code block in a new or in the existing cell.

Python
 
for principal in principals:
  for privilege in privileges:
      query = f"GRANT `{privilege}` ON CATALOG `{catalog}` TO `{principal}`" 
      spark.sql(query)


Validation

You can validate the privileges by opening Databricks UI and navigating to "Catalog" in the Data Explorer. Once the catalog shows up in the Data section, click on the catalog and go to the "permissions" tab. You can now see all the privileges applied to the catalog.

Below is a screenshot of a publicly available catalog image that shows the permission tab on a unity catalog schema.

Permission tab on a unity catalog schema.


Conclusion

Automating privilege management in Databricks Unity Catalog helps ensure consistent and efficient access control. The code provided demonstrates a practical way to assign catalog-level privileges, making it easier to manage permissions across users and groups. This approach reduces the chance of manual errors and supports scalable governance as data and teams grow. By implementing these methods, organizations can maintain better control over their data while simplifying ongoing management tasks.

 

 

 

 

Top