Post-Deployment ML
Drift

Drift Detection

Open In Colab (opens in a new tab)

Drift detection is a crucial part of ML observability. As is the case with other components, drift detection in TurboML is a continuous streaming process. In this notebook, we'll see how to compute data drift (univariate and multivariate) and model drift.

For univariate drift detection, by default we're using Adaptive Windowing method, and for multivariate drift detection, by default we're using PCA based reconstruction method.

import pandas as pd
import turboml as tb
transactions_df = pd.read_csv("data/transactions.csv").reset_index()
labels_df = pd.read_csv("data/labels.csv").reset_index()
try:
    transactions = tb.PandasDataset(
        dataset_name="transactions_drift",
        key_field="index",
        dataframe=transactions_df,
        upload=True,
    )
except:
    transactions = tb.PandasDataset(dataset_name="transactions_drift")
 
try:
    labels = tb.PandasDataset(
        dataset_name="labels_drift", key_field="index", dataframe=labels_df, upload=True
    )
except:
    labels = tb.PandasDataset(dataset_name="labels_drift")
model = tb.RCF(number_of_trees=50)
numerical_fields = [
    "transactionAmount",
    "localHour",
    "isProxyIP",
    "physicalItemCount",
    "digitalItemCount",
]
features = transactions.get_input_fields(numerical_fields=numerical_fields)
label = labels.get_label_field(label_field="is_fraud")
deployed_model = model.deploy(name="drift_demo", input=features, labels=label)

We can register univariate drift by using numerical_field and optionally a label. By default, label is same as numerical_field.

transactions.register_univariate_drift(numerical_field="transactionAmount")
transactions.register_univariate_drift(
    label="demo_uv_drift", numerical_field="physicalItemCount"
)

For multivariate drift, providing label is required.

transactions.register_multivariate_drift(
    label="demo_mv_drift", numerical_fields=numerical_fields
)
deployed_model.add_drift()
import matplotlib.pyplot as plt
 
 
def plot_drift(drifts):
    plt.plot([drift["record"].score for drift in drifts])

We can use either label or numerical_field(s) to fetch drift results.

plot_drift(transactions.get_univariate_drift(numerical_field="transactionAmount"))
plot_drift(transactions.get_univariate_drift(label="demo_uv_drift"))
plot_drift(transactions.get_multivariate_drift(label="demo_mv_drift"))
plot_drift(deployed_model.get_drifts())