{ "cells": [ { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { "byteLimit": 2048000, "rowLimit": 10000 }, "inputWidgets": {}, "nuid": "58fab4bb-231e-48cf-8ed4-fc15a1b22845", "showTitle": false, "title": "" } }, "source": [ "
This Notebook adds information related to the following requirements:
Download this notebook at format ipynb here.
\n", "N/A
" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "9d425d7d-6963-4712-8b0b-f082aa43e8ab", "showTitle": false, "title": "" } }, "source": [ "\n", "You can use a serving endpoint to serve models from the Databricks Model Registry or from Unity Catalog.
Endpoints expose the underlying models as scalable REST API endpoints using serverless compute. This means the endpoints and associated compute resources are fully managed by Databricks and will not appear in your cloud account.
A serving endpoint can consist of one or more MLflow models from the Databricks Model Registry, called served models.
A serving endpoint can have at most ten served models.
You can configure traffic settings to define how requests should be routed to your served models behind an endpoint.
Additionally, you can configure the scale of resources that should be applied to each served model.
\n", "For more information about how to create a model serving enpoint using MLflow, see this video.
" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "af303554-0625-435f-96e3-3aa2b0e71983", "showTitle": false, "title": "" } }, "source": [ "The purpose of a served model is to provide predictions in real-time. When users or anyone/any service make a request to the endpoint to get predictions, he/it should not have to wait for a cluster to start, results should be provided instantly. Serving endpoints use serverless compute. See this page
" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "78054f4d-c49b-4b63-a8d5-7ee9e8d64c9a", "showTitle": false, "title": "" } }, "source": [ "\n", "Hereafter is the minimal Python code to use to get predictions from a served model. Model can be either in Production stage or Staging stage, the way to get predictions is the same.
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, "inputWidgets": {}, "nuid": "395a7524-a272-42e4-9920-7f685de5c714", "showTitle": false, "title": "" } }, "outputs": [], "source": [ "# this is to get a temporary token. Best is to create a token within Databricks interface\n", "token = mlflow.utils.databricks_utils._get_command_context().apiToken().get()\n", "#\n", "# With the token, create the authorization header for the subsequent REST calls\n", "headers = {\"Authorization\": f\"Bearer {token}\"}\n", "#\n", "# get endpoint at which to execute the request\n", "api_url = mlflow.utils.databricks_utils.get_webapp_url()\n", "#\n", "# create url\n", "url = f\"{api_url}/model/Alternatively, sample url or code (Curl/Python) to make a request and get predictions from a served model is provided in the Serving UI (source: this video):
\n", "Containers are suitable for real-time production deployments due to their ease of management, lightweight characteristics, and scalable capabilities facilitated by services like Kubernetes.
" ] } ], "metadata": { "application/vnd.databricks.v1+notebook": { "dashboards": [], "language": "python", "notebookMetadata": { "mostRecentlyExecutedCommandWithImplicitDF": { "commandId": 1158789969180638, "dataframes": [ "_sqldf" ] }, "pythonIndentUnit": 2 }, "notebookName": "Databricks-ML-professional-S03c-Real-Time", "widgets": {} }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }