FFA logo wit
FFA logo wit
FFA logo wit

Databricks Serverless SQL Compute with Private VNet Connectivity

HomeDatabricks Serverless SQL Compute with Private VNet Connectivity
Kalender
9 september 2024

Introduction

When developing FFA TITAN 2.0, our managed cloud-native data platform, we made a strategic decision to shift from Azure Synapse to Azure Databricks. One of the core features for our clients is the ability to perform SQL-based data analysis. That's why FFA TITAN 2.0 has Databricks Serverless SQL Compute enabled by default.

Our platform is ISO27001 certified. That means we have strict security requirements. All data in motion must remain within the Azure network and not traverse the public internet. To achieve this, we configured Azure Databricks with VNet Injection and deployed other platform components in a similar fashion. Shielding them from public access.

However, deploying Databricks Serverless SQL Compute and enabling it to connect to other FFA TITAN platform resources in the platform's vnet required extra steps. In this blog, we’ll walk you through how we automated this process.

Databricks Network overview

Azure Databricks operates out of a control plane and a compute plane. Where the latter comprises of ‘Classic compute workloads’ and ‘Serverless workloads’.

databricks-network-architecture

We identify three distinct flows:

  • [1] - Users and application to Azure Databricks network flow
  • [2] - The control plane and classic compute plane network flow
  • [3] - The serverless compute plane network flow

In this blog our goal is to secure and privately connect the serverless compute plane and our recourses

What is Databricks Serverless SQL Compute?

Databricks Serverless SQL Compute is a fully managed service that allows users to execute SQL queries without the need to manually manage clusters. It's optimized for interactive SQL workloads and scales automatically based on demand.

What is VNet Injection?

VNet Injection allows you to securely deploy Databricks in an Azure Virtual Network (VNet). This ensures that all traffic between Databricks and other Azure services (like storage accounts or databases in the same vnet) go through the Azure backbone. This avoids exposure to the public internet.

The Challenge: Serverless SQL Compute networking

In classic compute workloads, Databricks clusters run within your own cloud account using VNet Injection. This allows you to securely access Azure resources like Data Lake or SQL Databases via private endpoints.

However, Serverless SQL Compute operates in a Databricks-managed VNet, which means it doesn’t have built-in access to your private resources out-of-the-box. This limitation required us to configure a private network connection to ensure secure access to our platform’s resources.

The Solution: Managed Private Endpoints

Fortunately, Azure Databricks uses a similar network setup as Azure Data Factory, which allows the use of managed private endpoints. These endpoints create a secure connection between the Serverless SQL Compute managed network and other resources within your Azure environment.

databricks-serverless-sql-compute-networking

Using Terraform we were able to automate the process of setting up Databricks Network Connectivity Configuration (NCC), establishing and auto-approving the required managed private connections, and binding our workspace. In the next section we’ll show you how to set this up using Terraform.

Terraforming Secure Databricks Serverless SQL Connectivity

We want Databricks Serverless SQL Compute to securely and privately connect to our azure data lake gen 2 storage account. Enabling it to perform queries on external tables in Unity catalog. Before we start, let's get clear what FFA TITAN 2.0 platform-components are involved to reach our goal:

#Platform componentLocationPublic access
1FFA Titan azure data lake storage gen 2 accountFFA Titan Azure VNETDisabled
2Databricks Serveless SQL ComputeDatabricks Managed VNETDisabled

Infrastructure-as-Code (IaC) Setup

To achieve secure and private connectivity between [1] and [2] we need to achieve the following in our Infrastructure-as-Code:

#IaC-goallanguage
1Create Databricks NCC in Databricks accountterraform
2Create blob-private-endpoint in [1]terraform
3Create dfs-private-endpoint in [2]terraform
4Associate Databricks NCC with Databricks workspace instanceterraform
5auto-approve managed private endpoint on platform-component 'FFA Titan azure data lake storage gen 2 account'terraform / powershell

Step 1: Create Databricks Network Connectivity Configuration (NCC)

This step creates an NCC that governs private endpoint creation and firewall enablement.

resource "databricks_mws_network_connectivity_config" "ffa_titan_ncc" {
  provider = databricks.accounts
  name     = "your-databricks-ncc"
  region   = var.location
}

Step 2: Create Blob Private Endpoint Rule

This creates a private endpoint rule for blob storage on our Azure Data Lake.

Step 3: Create DFS Private Endpoint Rule

This creates a private endpoint rule for DFS on our Azure Data Lake

Step 4: Associate NCC with Databricks Workspace

Link the NCC to our Databricks Workspace to enforce private connectivity.

Step 5: Auto-approve Managed Private Endpoint (Blob and DFS)

This step ensures the automatic approval of the private endpoint connection.

powershell-script

param (
    $azureSubscriptionId,
    $azureResourceGroupName,
    $azureResourceId
)

$azureTenantId = Get-ChildItem Env:ARM_TENANT_ID
$azurePrincipalAppId = Get-ChildItem Env:ARM_CLIENT_ID
$azurePrincipalSecret = Get-ChildItem Env:ARM_CLIENT_SECRET

# connect to tenant
az login --service-principal -u $azurePrincipalAppId.Value -p $azurePrincipalSecret.Value --tenant $azureTenantId.Value

# select correct azure subscription
az account set --subscription $azureSubscriptionId

# approve pending managed private endpoint
$text = $(az network private-endpoint-connection list -g $azureResourceGroupName --id $azureResourceId)
$json = $text | ConvertFrom-Json

foreach($connection in $json)
{
    $id = $connection.id
    $status = $connection.properties.privateLinkServiceConnectionState.status

    if($status -eq "Pending")
    {
        Write-Host $id ' is in a pending state'
        Write-Host $status
        az network private-endpoint-connection approve --id $Connection.Id --description "Approved by FFA Titan Terraform"
    }
}

Conclusion

By running the Terraform code, we successfully created secure, private connectivity between Databricks Serverless SQL Compute and our Azure Data Lake Gen 2 account, as shown in the NCC status overview below:


We are now able to run SQL-queries against external tables in the Unity Catalog of which their data files sit in FFA Titan's storage-component in the FFA Titan vnet.

This setup enables us to run SQL queries on external tables in Unity Catalog, with data files stored in FFA Titan's VNet. Automating this process with Terraform ensures that our platform remains ISO27001 compliant while leveraging a fully managed SQL environment.

For a secure and compliant Databricks Serverless SQL setup, follow these steps. Would you like assistance? Contact us for help with setting up secure and compliant Databricks environments!

FAQ

How does Databricks Serverless SQL pricing compare to Classic Compute?

Serverless SQL is usage-based, charging per second, making it cost-effective for variable workloads. Classic Compute involves hourly pricing and may incur costs even when idle.

Can Databricks Serverless SQL support cross-region private networking?

Yes, Serverless SQL can support cross-region networking using Azure’s global VNet peering. This allows secure private traffic between regions. However, please consider additional costs.

What prerequisites are required to execute the code in this blog?


Tooling
- Azure Subscription: Active subscription with necessary permissions.
- Azure Databricks: Workspace with Serverless SQL Compute enabled.
- Terraform: Installed locally or in CI/CD, compatible with Azure/Databricks providers.
- Azure CLI: Installed for managing Azure resources and approving private endpoints.
- PowerShell: For running the private endpoint approval script.

Variables
- Azure Subscription & Resource Group IDs: Required for resource deployment.
- Azure Region: Databricks and other resources' region (e.g., eastus).
- Databricks Workspace & NCC IDs: Workspace and Network Connectivity Configuration IDs.
- Storage Account ID: ID for the Azure Data Lake Gen 2 storage account.
- Tenant ID, Client ID, Client Secret: Credentials for Azure service principal.

Libraries
- Terraform Providers:azurerm: For Azure resource management.
databricks: For Databricks configuration.
- Azure CLI Libraries: Support for az network private-endpoint-connection.

Permissions
- Azure RBAC: Contributor/Owner access to Azure resources.
- Databricks: Permissions to create/manage NCCs.
- Private Link Approval: Role permissions for approving private endpoints.

Networking Setup
- VNet/Subnet: Preconfigured VNet and subnet for VNet Injection.
- Private Endpoints: Enabled private endpoints in the target Azure region.

Sjors Otten
Management

“Insights without action is worthless”

Sjors Otten is a pragmatic and passionate data & analytics architect. He excels in leveraging the untapped potential of your organization’s data. Sjors has a solid background in Business Informatics and Software Development.

With his years of experience on all levels of IT, Sjors is the go-to-person for breaking down business- and IT-strategies in workable and understandable data & analytics solutions for all levels within your organization whilst maintaining alignment with the defined corporate strategies.

Related blogs

Databricks Serverless SQL Compute with Private VNet Connectivity

Secure Databricks Serverless Compute environment

Read more
Azure Resource Providers

Bootstrap Azure Subscription for DevOps

Read more
AI in the Food Industry

Applicability of AI in the food industry

Read more

Ready to become a titan in the food industry?

Get your own Titan
Food For Analytics © 2024 All rights reserved
Chamber of Commerce 73586218
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram