Skip to main content
Technical guide

Databricks Serverless SQL Compute Automation private connectivity with Terraform and NCC

When developing FFA TITAN 2.0, our managed cloud-native data platform, we made a strategic decision to shift from Azure Synapse to Azure Databricks. Databricks Serverless SQL Compute is enabled by default so clients can perform SQL-based data analysis.

ISO27001 requirements VNet Injection Azure Data Lake Gen2
Target architecture
Private

FFA TITAN VNet

Azure Data Lake Gen2 with public access disabled.

Databricks managed VNet

Serverless SQL Compute connects through managed private endpoints.

Private connectivity path

NCC creates private endpoint rules for blob and dfs, then binds the configuration to the Databricks workspace.

NCC
Blob
DFS

Introduction

Why we needed secure Serverless SQL connectivity

When developing FFA TITAN 2.0, our managed cloud-native data platform, we made a strategic decision to shift from Azure Synapse to Azure Databricks. One of the core features for our clients is the ability to perform SQL-based data analysis. That's why FFA TITAN 2.0 has Databricks Serverless SQL Compute enabled by default.

Our platform is ISO27001 certified. That means we have strict security requirements. All data in motion must remain within the Azure network and not traverse the public internet. To achieve this, we configured Azure Databricks with VNet Injection and deployed other platform components in a similar fashion. Shielding them from public access.

However, deploying Databricks Serverless SQL Compute and enabling it to connect to other FFA TITAN platform resources in the platform's vnet required extra steps. In this blog, we'll walk you through how we automated this process.

Databricks network overview

Control plane, compute plane and serverless workloads

Azure Databricks operates out of a control plane and a compute plane. The latter comprises of Classic compute workloads and Serverless workloads.

In this blog our goal is to secure and privately connect the serverless compute plane and our resources.

Three distinct network flows

[1] Users and application to Azure Databricks network flow

[2] The control plane and classic compute plane network flow

[3] The serverless compute plane network flow

This is the flow we secure and privately connect in this blog.

Databricks network architecture
Control plane, classic compute and serverless compute
Databricks network architecture showing users and applications, the Databricks control plane, classic compute workloads, serverless workloads and private resources

Concepts

Serverless SQL Compute and VNet Injection

What is Databricks Serverless SQL Compute?

Databricks Serverless SQL Compute is a fully managed service that allows users to execute SQL queries without the need to manually manage clusters. It's optimized for interactive SQL workloads and scales automatically based on demand.

What is VNet Injection?

VNet Injection allows you to securely deploy Databricks in an Azure Virtual Network (VNet). This ensures that all traffic between Databricks and other Azure services, like storage accounts or databases in the same vnet, go through the Azure backbone. This avoids exposure to the public internet.

Challenge

Serverless SQL Compute networking

In classic compute workloads, Databricks clusters run within your own cloud account using VNet Injection. This allows you to securely access Azure resources like Data Lake or SQL Databases via private endpoints.

However, Serverless SQL Compute operates in a Databricks-managed VNet, which means it doesn't have built-in access to your private resources out-of-the-box. This limitation required us to configure a private network connection to ensure secure access to our platform's resources.

Solution

Managed Private Endpoints

Fortunately, Azure Databricks uses a similar network setup as Azure Data Factory, which allows the use of managed private endpoints. These endpoints create a secure connection between the Serverless SQL Compute managed network and other resources within your Azure environment.

Using Terraform we were able to automate the process of setting up Databricks Network Connectivity Configuration (NCC), establishing and auto-approving the required managed private connections, and binding our workspace.

Architecture note

Private path from Serverless SQL to Azure Data Lake Gen2

Databricks NCC
Managed private endpoints
Workspace binding
Databricks Serverless SQL Compute networking
Managed private connectivity to platform resources
Databricks Serverless SQL Compute networking diagram showing the control plane, serverless workloads and private resources

Terraforming secure connectivity

Platform components and Infrastructure-as-Code goals

We want Databricks Serverless SQL Compute to securely and privately connect to our azure data lake gen 2 storage account. Enabling it to perform queries on external tables in Unity catalog.

Platform components

# Platform component Location Public access
1 FFA Titan azure data lake storage gen 2 account FFA Titan Azure VNET Disabled
2 Databricks Serveless SQL Compute Databricks Managed VNET Disabled

IaC setup

# IaC goal Language
1Create Databricks NCC in Databricks accountterraform
2Create blob-private-endpoint in [1]terraform
3Create dfs-private-endpoint in [2]terraform
4Associate Databricks NCC with Databricks workspace instanceterraform
5auto-approve managed private endpoint on platform-component 'FFA Titan azure data lake storage gen 2 account'terraform / powershell

Implementation

Terraform steps

Step 1: Create Databricks Network Connectivity Configuration (NCC)

This step creates an NCC that governs private endpoint creation and firewall enablement.

Terraform
resource "databricks_mws_network_connectivity_config" "ffa_titan_ncc" {
  provider = databricks.accounts
  name     = "your-databricks-ncc"
  region   = var.location
}

Step 2: Create Blob Private Endpoint Rule

This creates a private endpoint rule for blob storage on our Azure Data Lake.

Terraform
resource "databricks_mws_ncc_private_endpoint_rule" "ffa_titan_ncc_per_blob" {
  provider = databricks.accounts
  network_connectivity_config_id = databricks_mws_network_connectivity_config.ffa_titan_ncc.network_connectivity_config_id
  resource_id                    = azurerm_storage_account.ffa_titan_adls_gen2.id
  group_id                       = "blob"
}

Step 3: Create DFS Private Endpoint Rule

This creates a private endpoint rule for DFS on our Azure Data Lake.

Terraform
resource "databricks_mws_ncc_private_endpoint_rule" "ffa_titan_ncc_per_dfs" {
  provider = databricks.accounts
  network_connectivity_config_id = databricks_mws_network_connectivity_config.ffa_titan_ncc.network_connectivity_config_id
  resource_id                    = azurerm_storage_account.ffa_titan_adls_gen2.id
  group_id                       = "dfs"
}

Step 4: Associate NCC with Databricks Workspace

Link the NCC to our Databricks Workspace to enforce private connectivity.

Terraform
resource "databricks_mws_ncc_binding" "ffa_titan_ncc_binding" {
  provider = databricks.accounts
  network_connectivity_config_id = databricks_mws_network_connectivity_config.ffa_titan_ncc.network_connectivity_config_id
  workspace_id                   = azurerm_databricks_workspace.ffa_titan_databricks_workspace.workspace_id
}

Step 5: Auto-approve Managed Private Endpoint (Blob and DFS)

This step ensures the automatic approval of the private endpoint connection.

Terraform
resource "null_resource" "approve_synapse_dfs_private_endpoint" {
  triggers = {
    always_run = timestamp()
  }
  provisioner "local-exec" {
    command = ".'.ps1' -azureSubscriptionId '${var.subscription}' -azureResourceGroupName '${var.resourcegroup_name}' -azureResourceId '${azurerm_storage_account.ffa_titan_adls_gen2.id}'"
  }
}

powershell-script

PowerShell
param (
    $azureSubscriptionId,
    $azureResourceGroupName,
    $azureResourceId
)

$azureTenantId = Get-ChildItem Env:ARM_TENANT_ID
$azurePrincipalAppId = Get-ChildItem Env:ARM_CLIENT_ID
$azurePrincipalSecret = Get-ChildItem Env:ARM_CLIENT_SECRET

# connect to tenant
az login --service-principal -u $azurePrincipalAppId.Value -p $azurePrincipalSecret.Value --tenant $azureTenantId.Value

# select correct azure subscription
az account set --subscription $azureSubscriptionId

# approve pending managed private endpoint
$text = $(az network private-endpoint-connection list -g $azureResourceGroupName --id $azureResourceId)
$json = $text | ConvertFrom-Json

foreach($connection in $json)
{
    $id = $connection.id
    $status = $connection.properties.privateLinkServiceConnectionState.status

    if($status -eq "Pending")
    {
        Write-Host $id ' is in a pending state'
        Write-Host $status
        az network private-endpoint-connection approve --id $Connection.Id --description "Approved by FFA Titan Terraform"
    }
}

Conclusion

Secure private connectivity between Serverless SQL and Azure Data Lake Gen2

By running the Terraform code, we successfully created secure, private connectivity between Databricks Serverless SQL Compute and our Azure Data Lake Gen 2 account.

We are now able to run SQL-queries against external tables in the Unity Catalog of which their data files sit in FFA Titan's storage-component in the FFA Titan vnet.

This setup enables us to run SQL queries on external tables in Unity Catalog, with data files stored in FFA Titan's VNet. Automating this process with Terraform ensures that our platform remains ISO27001 compliant while leveraging a fully managed SQL environment.

Databricks Network Connectivity Configuration created

Managed private endpoint rules established

External tables in Unity Catalog can be queried

NCC status overview
Private endpoint rules established for blob and dfs
Databricks Network Connectivity Configuration status overview showing private endpoint rules for blob and dfs with established status

FAQ

Databricks NCC and Serverless SQL networking questions

Practical answers about Databricks Network Connectivity Configuration, Serverless SQL private endpoints, Terraform automation and secure access to Azure Data Lake Gen2.

What is Databricks Network Connectivity Configuration, or NCC?

Databricks Network Connectivity Configuration, often shortened to Databricks NCC, is an account-level configuration used to manage private connectivity for serverless compute. In this setup, NCC is used to create private endpoint rules and bind secure serverless connectivity to the Databricks workspace.

What does databricks_mws_network_connectivity_config do in Terraform?

The databricks_mws_network_connectivity_config resource creates the Databricks Network Connectivity Configuration in the Databricks account. This NCC becomes the container for private endpoint rules and is later associated with the Azure Databricks workspace.

What does databricks_mws_ncc_private_endpoint_rule do?

The databricks_mws_ncc_private_endpoint_rule resource creates a managed private endpoint rule inside the Databricks NCC. In this blog, it is used twice, once for the blob endpoint and once for the dfs endpoint of the Azure Data Lake Gen2 storage account.

What does databricks_mws_ncc_binding do?

The databricks_mws_ncc_binding resource associates the Databricks Network Connectivity Configuration with a Databricks workspace. This makes the NCC private endpoint rules available to the workspace that runs Serverless SQL Compute.

Why do Databricks Serverless SQL workloads need managed private endpoints?

Serverless SQL Compute runs in a Databricks-managed network, not directly inside the customer VNet. Managed private endpoints allow the serverless compute plane to privately connect to Azure resources such as Azure Data Lake Gen2 without exposing those resources to the public internet.

Is Databricks Serverless SQL the same as VNet Injection?

No. VNet Injection is used for classic compute workloads that run inside your own Azure VNet. Databricks Serverless SQL Compute runs in a Databricks-managed network, so private connectivity to your resources requires NCC and managed private endpoint rules.

How do you connect Databricks Serverless SQL to Azure Data Lake Gen2 privately?

You create a Databricks NCC, add private endpoint rules for the storage account subresources, usually blob and dfs, bind the NCC to the workspace, and approve the pending managed private endpoint connections on the Azure storage account.

Can Databricks NCC be automated with Terraform?

Yes. The setup can be automated with the Databricks Terraform provider using databricks_mws_network_connectivity_config, databricks_mws_ncc_private_endpoint_rule and databricks_mws_ncc_binding. The private endpoint approval can be handled separately with Azure CLI or PowerShell.

Which private endpoint rules are needed for Unity Catalog external tables on ADLS Gen2?

For Azure Data Lake Gen2, you typically need private endpoint rules for both blob and dfs. The dfs endpoint is used for hierarchical namespace access, while blob access may still be required by parts of the storage interaction.

What prerequisites are required to execute the Terraform code in this blog?

You need an Azure subscription, an Azure Databricks workspace with Serverless SQL Compute enabled, a storage account, Terraform, the azurerm and databricks providers, Azure CLI, PowerShell and permissions to create Databricks NCC resources and approve private endpoint connections.

What permissions are required for Databricks NCC and managed private endpoints?

You need Databricks account-level permissions to create and manage NCC resources, Azure RBAC permissions on the target Azure resources, and permission to approve private endpoint connections on the Azure Data Lake Gen2 storage account.

How does Databricks Serverless SQL pricing compare to Classic Compute?

Serverless SQL is usage-based, charging per second, making it cost-effective for variable workloads. Classic Compute involves hourly pricing and may incur costs even when idle.

Can Databricks Serverless SQL support cross-region private networking?

Yes, Serverless SQL can support cross-region networking using Azure's global VNet peering. This allows secure private traffic between regions. However, please consider additional costs.

Need help?

Set up secure and compliant Databricks environments

For a secure and compliant Databricks Serverless SQL setup, follow these steps. Would you like assistance? Contact us for help with setting up secure and compliant Databricks environments.

Explore consultancy

1. NCC

Create Databricks Network Connectivity Configuration.

2. Blob and DFS

Create private endpoint rules for Azure Data Lake Gen2.

3. Workspace binding

Associate NCC with the Databricks workspace.

4. Approval

Auto-approve managed private endpoint connections.