Databricks-ML-professional-S04a-Drift-Types
This Notebook adds information related to the following requirements:
Drift Types:
- Compare and contrast label drift and feature drift
- Identify scenarios in which feature drift and/or label drift are likely to occur
- Describe concept drift and its impact on model efficacy
Download this notebook at format ipynb here.
1. Compare and contrast label drift and feature drift
- Feature drift: a change related to features. Model was trained on a train set at given moment. The time going, data is evolving and the data the model was trained on is not reflecting anymore the current data. Consequently, the model is not able to predict correctly new data. Feature drift occurs when the distribution of input features (the data the model uses to make predictions) changes over time.
- Label drift: a change related to target. Model was trained on a train set at given moment. The time going, data is evolving and the data the model was trained on is not reflecting anymore the current data. Consequently, the model is not able to predict correctly new data. Label drift occurs when the distribution of the target variable (the variable the model is predicting) changes over time.
Best to avoid these issues is to regularly re-train models taking into account new data.
2. Identify scenarios in which feature drift and/or label drift are likely to occur
Feature drift examples:
- Seasonal Changes: In an e-commerce recommendation system, user preferences may change based on seasons. For example, clothing preferences might vary between summer and winter.
- Technological Changes: In applications involving sensor data, a software or hardware upgrade might change the way features are recorded or measured.
- Policy Changes: If a model is used in a financial system, changes in regulations or policies may lead to shifts in feature distributions.
Label drift examples:
- Changing User Behavior: In a spam detection system, user behavior might change over time. What was considered spam previously might be accepted or vice versa.
- Evolving Trends: In a stock market prediction model, market conditions may change due to evolving economic trends, impacting the distribution of labels (stock price movements)
- Policy or Business Changes: A customer churn prediction model might face label drift if there are changes in how customer churn is defined or measured by the business.
3. Describe concept drift and its impact on model efficacy
Concept drift requires more in depth comprehensive analysis of the entire end-to-end pipeline. When concept drift appears, it might mean that the model may not be applicable to the use case anymore. It might eventually be necessary to consider alternative approaches and solutions and redesign the MLOps architecture or redesign the ML algorithms. This should be a work to do in collaboration with the business team.