Skip to main content

Documentation Index

Fetch the complete documentation index at: https://openlayer.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Databricks hero Openlayer integrates with Databricks so you can run data quality tests directly on your Databricks tables. The integration uses a personal access token (PAT) tied to a secure Databricks connection. This ensures auditable, key-based access without requiring usernames or passwords.

Prerequisites

To follow this guide, you need:
  • A Databricks account and workspace with SQL warehouses enabled
  • Permissions to create and use a personal access token (PAT)
  • A table in Databricks you want to monitor (with timestamp and unique ID columns recommended)
  • An Openlayer project with monitoring mode enabled

Setup Guide

Step 1: Generate a personal access token

In your Databricks workspace:
  1. Go to User Settings → Developer → Access Tokens.
  2. Click Generate new token.
  3. Copy and store the PAT securely — you will provide it when connecting Openlayer.
See Databricks documentation for details.

Step 2: Collect connection details

You will need:
  • Hostname: your workspace URL (e.g. https://dbc-247310bd-93fc.cloud.databricks.com)
  • Port: typically 443
  • SQL Warehouse endpoint: path to the warehouse, e.g. /sql/1.0/warehouses/<warehouse-id>
  • Personal access token (PAT): generated in step 1

Step 3: Connect inside Openlayer

In your Openlayer workspace:
  1. Go to Data sources and select Databricks.
  2. Click Connect.
  3. Fill in the fields:
  • Hostname: your workspace hostname (e.g. https://dbc-247310bd-93fc.cloud.databricks.com)
  • Port: usually 443
  • SQL Warehouse endpoint: path to your warehouse
  • Personal access token: PAT you generated
  • Name: a descriptive label for this connection

Step 4: Configure your table

After the connection is created, select the table to monitor:
  • Catalog: Databricks catalog containing the table
  • Schema: schema containing the table
  • Table: table name (e.g. workspace.openlayer_demo.landing_inferences)
  • Timestamp column: column used to order/filter rows (e.g. timestamp)
  • Unique ID column: column identifying unique rows (e.g. inference_id)
  • Data source name: a descriptive label in Openlayer

Optional: ML-specific settings

If the table contains ML outputs, you can provide additional context:
  • Class names
  • Feature names
  • Categorical feature names
  • Predictions column
This enables Openlayer to run ML-aware tests such as drift detection and performance monitoring.

Troubleshooting

  • Authentication errors → verify that your PAT is valid and not expired.
  • Connection errors → confirm the hostname, port, and SQL warehouse endpoint are correct.
  • Empty results → check that the timestamp column is populated and you’ve selected the correct table.
  • Permission errors → ensure your PAT user has access to the warehouse and the target tables.