Post-Deployment ML
Drift

Drift Detection

Open In Colab (opens in a new tab)

Drift detection is a crucial part of ML observability. As is the case with other components, drift detection in TurboML is a continuous streaming process. In this notebook, we'll see how to compute data drift (univariate and multivariate) and model drift.

For univariate drift detection, by default we're using Adaptive Windowing method, and for multivariate drift detection, by default we're using PCA based reconstruction method.

    import turboml as tb
    import pandas as pd
    transactions_df = pd.read_csv("data/transactions.csv").reset_index()
    labels_df = pd.read_csv("data/labels.csv").reset_index()
    transactions = tb.PandasDataset(
        dataset_name="transactions_drift",
        key_field="index",
        dataframe=transactions_df,
        upload=True,
    )
    labels = tb.PandasDataset(
        dataset_name="labels_drift", key_field="index", dataframe=labels_df, upload=True
    )
    model = tb.RCF(number_of_trees=50)
    numerical_fields = [
        "transactionAmount",
        "localHour",
        "isProxyIP",
        "physicalItemCount",
        "digitalItemCount",
    ]
    features = transactions.get_input_fields(numerical_fields=numerical_fields)
    label = labels.get_label_field(label_field="is_fraud")
    deployed_model = model.deploy(name="drift_demo", input=features, labels=label)

We can register univariate drift by using numerical_field and optionally a label. By default, label is same as numerical_field.

    transactions.register_univariate_drift(numerical_field="transactionAmount")
    transactions.register_univariate_drift(
        label="demo_uv_drift", numerical_field="physicalItemCount"
    )

For multivariate drift, providing label is required.

    transactions.register_multivariate_drift(
        label="demo_mv_drift", numerical_fields=numerical_fields
    )
    deployed_model.add_drift()
    import matplotlib.pyplot as plt
    
    
    def plot_drift(drifts):
        plt.plot([drift["record"].score for drift in drifts])

We can use either label or numerical_field(s) to fetch drift results.

    plot_drift(transactions.get_univariate_drift(numerical_field="transactionAmount"))
    plot_drift(transactions.get_univariate_drift(label="demo_uv_drift"))
    plot_drift(transactions.get_multivariate_drift(label="demo_mv_drift"))
    plot_drift(deployed_model.get_drifts())