Drift Detection
Drift detection is a crucial part of ML observability. As is the case with other components, drift detection in TurboML is a continuous streaming process. In this notebook, we'll see how to compute data drift (univariate and multivariate) and model drift.
For univariate drift detection, by default we're using Adaptive Windowing method, and for multivariate drift detection, by default we're using PCA based reconstruction method.
import turboml as tb
import pandas as pd
transactions_df = pd.read_csv("data/transactions.csv").reset_index()
labels_df = pd.read_csv("data/labels.csv").reset_index()
transactions = tb.PandasDataset(
dataset_name="transactions_drift",
key_field="index",
dataframe=transactions_df,
upload=True,
)
labels = tb.PandasDataset(
dataset_name="labels_drift", key_field="index", dataframe=labels_df, upload=True
)
model = tb.RCF(number_of_trees=50)
numerical_fields = [
"transactionAmount",
"localHour",
"isProxyIP",
"physicalItemCount",
"digitalItemCount",
]
features = transactions.get_input_fields(numerical_fields=numerical_fields)
label = labels.get_label_field(label_field="is_fraud")
deployed_model = model.deploy(name="drift_demo", input=features, labels=label)
We can register univariate drift by using numerical_field
and optionally a label
. By default, label is same as numerical_field
.
transactions.register_univariate_drift(numerical_field="transactionAmount")
transactions.register_univariate_drift(
label="demo_uv_drift", numerical_field="physicalItemCount"
)
For multivariate drift, providing label
is required.
transactions.register_multivariate_drift(
label="demo_mv_drift", numerical_fields=numerical_fields
)
deployed_model.add_drift()
import matplotlib.pyplot as plt
def plot_drift(drifts):
plt.plot([drift["record"].score for drift in drifts])
We can use either label
or numerical_field(s)
to fetch drift results.
plot_drift(transactions.get_univariate_drift(numerical_field="transactionAmount"))
plot_drift(transactions.get_univariate_drift(label="demo_uv_drift"))
plot_drift(transactions.get_multivariate_drift(label="demo_mv_drift"))
plot_drift(deployed_model.get_drifts())