Databricks
This guide explains how to use datasets available in a Databricks IPython kernel with Openlayer.
Convert a Spark dataframe to a pandas dataframe
The Databricks IPython kernel is an environment used to interact with a Spark cluster. Therefore, the only assumption made by this integration guide is that your datasets can be read as Spark dataframes.
Openlayer currently accepts datasets in two formats: pandas dataframes and CSV files. Consequently, the first step is to ensure that the data you wish to use is in one of these formats.
Databricks uses the Delta Lake for tables by default. Therefore, to read a table and convert it to a pandas dataframe, you can use the code below:
Alternatively, if your dataset is saved in your Databricks environment as a file, you can read it and convert it to a pandas dataframe using the code:
Upload to Openlayer
With the dataset as a pandas dataframe, you can upload it to the the platform either in development or monitoring mode.