<h4 style="font-variant-caps: small-caps;font-size:35pt;">Databricks-ML-professional-S03c-Real-Time</h4>

<div style='background-color:black;border-radius:5px;border-top:1px solid'></div>
<br/>
<p>This Notebook adds information related to the following requirements:</p><br/>
<b>Real-time:</b>
<ul>
<li>Describe the benefits of using real-time inference for a small number of records or when fast prediction computations are needed</li>
<li>Identify JIT feature values as a need for real-time deployment</li>
<li>Describe model serving deploys and endpoint for every stage</li>
<li>Identify how model serving uses one all-purpose cluster for a model deployment</li>
<li>Query a Model Serving enabled model in the Production stage and Staging stage</li>
<li>Identify how cloud-provided RESTful services in containers is the best solution for production-grade real-time deployments</li>
</ul>
<br/>
<p><b>Download this notebook at format ipynb <a href="Databricks-ML-professional-S03c-Real-Time.ipynb">here</a>.</b></p>
<br/>
<div style='background-color:black;border-radius:5px;border-top:1px solid'></div>

<a id="realtime"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">1. Describe the benefits of using real-time inference for a small number of records or
when fast prediction computations are needed</span></div>
<ul>
<li>For on-demand response</li>
<li>Generates predictions for a small number of records with fast results (e.g. results in milliseconds)</li>
<li>Rely on REST API development - need to create a REST endpoint for example MLflow model serving endpoint</li>
<li>Real-time or near Real-time predictions</li>
<li>Has lowest latency but also highest costs because it requires serving infrastructures which have a cost</li>
<li>Users provide data to the model through REST API, model predicts the target in real-time</li>
<li>5-10% of use cases</li>
<li>Example of use cases: Financial (fraud detection), mobile, ad tech</li></ul>
<div style="display:block;text-align:center"><img width="500px" src="https://i.ibb.co/rxzz2vS/databricks-ml-pro-latency.png"/></div>

<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">2. Identify JIT feature values as a need for real-time deployment</span></div>
<p>N/A</p>

<a id="modelservingendpoints"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">3. Describe model serving deploys and endpoint for every stage</span></div>
<p><i>You can use a serving endpoint to serve models from the Databricks Model Registry or from Unity Catalog.</i></p><p><i>Endpoints expose the underlying models as scalable REST API endpoints using serverless compute. This means the endpoints and associated compute resources are fully managed by Databricks and will not appear in your cloud account.</i></p><p><i>A serving endpoint can consist of one or more MLflow models from the Databricks Model Registry, called served models.</i></p><p><i>A serving endpoint can have at most ten served models.</i></p><p><i>You can configure traffic settings to define how requests should be routed to your served models behind an endpoint.</i></p><p><i>Additionally, you can configure the scale of resources that should be applied to each served model.</i></p><p><a href="https://docs.databricks.com/api/workspace/servingendpoints" target="_blank">source</a></p>
<p>For more information about how to create a model serving enpoint using MLflow, see <a href="https://customer-academy.databricks.com/learn/course/1522/play/9706/real-time-demo" target="_blank">this video</a>.</p>

<ul>
<li>A model need to be <b>logged</b> and <b>registered</b> to MLflow before being linked to a serving endpoint</li>
</ul>
<i>see previous chapters and/or <a href="https://customer-academy.databricks.com/learn/course/1522/play/9706/real-time-demo" target="_blank">this video</a>.</i>

<ul>
<li>Model(s) to be served should be selected at endpoint creation by the selection of model(s) name and model(s) version</li>
<img width="1000px" src="https://i.ibb.co/vdgL1mD/servingendpointcreation1.png"/>
<li><b>Up to 10 models can be served</b> and <b>percentage of traffic</b> for each of them is <b>configurable</b>:</li>
<img width="1000px" src="https://i.ibb.co/1ngQgKP/multiplemodels.png"/>
</ul>

<ul>
<li>A newly created endoint is disabled. It will become active after having been enabled.</li>
</ul>
<b>Here after are the lines to enable a serving enpoint:</b>

In [None]:
import mlflow
import requests
#
# this is to get a temporary token. Best is to create a token within Databricks interface
token = mlflow.utils.databricks_utils._get_command_context().apiToken().get()
#
# With the token, create the authorization header for the subsequent REST calls
headers = {"Authorization": f"Bearer {token}"}
#
# get endpoint at which to execute the request
api_url = mlflow.utils.databricks_utils.get_webapp_url()
#
# create the url
url = f"{api_url}/api/2.0/mlflow/endpoints/enable"
#
# send request to enable endpoint
requests.post(url, headers=headers, json={"registered_model_name": "<model_name>"})

<ul>
<li>User who need to create a model serving endpoint in MLflow will need <b>cluster creation persmission</b>.</li>
</ul>

<a id="allpurpose"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">4. Identify how model serving uses one all-purpose cluster for a model deployment</span></div>
<p>The purpose of a served model is to provide predictions in real-time. When users or anyone/any service make a request to the endpoint to get predictions, he/it should not have to wait for a cluster to start, results should be provided instantly. Serving endpoints use serverless compute. See <a href="https://learn.microsoft.com/en-us/azure/databricks/serverless-compute/" tager="_blank">this page</a></p>

<a id="querymodelserving"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">5. Query a Model Serving enabled model in the Production stage and Staging stage</span></div>
<p>Hereafter is the minimal Python code to use to get predictions from a served model. Model can be either in Production stage or Staging stage, the way to get predictions is the same.</p>

In [None]:
# this is to get a temporary token. Best is to create a token within Databricks interface
token = mlflow.utils.databricks_utils._get_command_context().apiToken().get()
#
# With the token, create the authorization header for the subsequent REST calls
headers = {"Authorization": f"Bearer {token}"}
#
# get endpoint at which to execute the request
api_url = mlflow.utils.databricks_utils.get_webapp_url()
#
# create url
url = f"{api_url}/model/<model_name>/invocations"
#
# data to predict should be formatted this way. As an example, let's consider we want to predict X_test
ds_dict = X_test.to_dict(orient="split")
#
# request predictions
response = requests.request(method="POST", headers=headers, url=url, json=ds_dict)
#
# for predictions in JSON, this is the command
response.json()

<p>Alternatively, <b>sample url or code (Curl/Python)</b> to make a request and get predictions from a served model is provided in the Serving UI <i>(source: <a href="https://customer-academy.databricks.com/learn/course/1522/play/9706/real-time-demo" target="_blank">this video</a>)</i>:</p>
<img width="1000px" src="https://i.ibb.co/KrQtDNZ/serving.png"/>

<a id="cloudprovidedrestfulservices"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">6. Identify how cloud-provided RESTful services in containers is the best solution for
production-grade real-time deployments</span></div>
<p>Containers are suitable for real-time production deployments due to their ease of management, lightweight characteristics, and scalable capabilities facilitated by services like Kubernetes.</p>