Post-Deployment ML
Model Explanations

Model Explanations using iXAI

Open In Colab (opens in a new tab)

The iXAI module can be used in combination with TurboML to provide incremental explanations for the models being trained.

We start by importing the ixai package and relevant datasets from river.

    import turboml as tb
    import pandas as pd
    from ixai.explainer import IncrementalPFI
    from river.metrics import Accuracy
    from river.utils import Rolling
    from river.datasets.synth import Agrawal
    from river.datasets.synth import ConceptDriftStream

The sample size for the model to train on is defined.

Also, we initialize a concept drift data stream using the Agrawal synthetic dataset from river.

    n_samples = 150_000
    stream = Agrawal(classification_function=1, seed=42)
    drift_stream = Agrawal(classification_function=2, seed=42)
    stream = ConceptDriftStream(
        stream,
        drift_stream,
        position=int(n_samples * 0.5),
        width=int(n_samples * 0.1),
        seed=42,
    )
    feature_names = list([x_0 for x_0, _ in stream.take(1)][0].keys())

A batch DataFrame is constructed from the stream defined above to train our model.

    features_list = []
    labels_list = []
    
    for features, label in stream:
        if len(features_list) == n_samples:
            break
        features_list.append(features)
        labels_list.append(label)
    
    features_df = pd.DataFrame(features_list).reset_index()
    labels_df = pd.DataFrame(labels_list, columns=["label"]).reset_index()
    numerical_fields = feature_names

We use the PandasDataset class provided by TurboML to convert the DataFrame into a compatible dataset.

As part of defining the dataset, we specify the column to be used for primary keys.

Then, we get the relevant features from our dataset as defined by the numerical_fields list.

    dataset_full = tb.PandasDataset(
        dataframe=features_df, key_field="index", streaming=False
    )
    labels_full = tb.PandasDataset(dataframe=labels_df, key_field="index", streaming=False)
    features = dataset_full.get_input_fields(numerical_fields=numerical_fields)
    label = labels_full.get_label_field(label_field="label")

We will be using and training the Hoeffding Tree Classifier for this task.

    model = tb.HoeffdingTreeClassifier(n_classes=2)
    model_learned = model.learn(features, label)

Once the model has finished training, we get ready to deploy it so that it can be used for prediction.

To begin with, we re-define our dataset to now support streaming data, and get the relevant features as before.

    dataset_full = tb.PandasDataset(
        dataset_name="agrawal_model_explanation",
        dataframe=features_df,
        key_field="index",
        upload=True,
    )
    labels_full = tb.PandasDataset(
        dataset_name="labels_model_explanation",
        key_field="index",
        dataframe=labels_df,
        upload=True,
    )
    features = dataset_full.get_input_fields(numerical_fields=numerical_fields)
    label = labels_full.get_label_field(label_field="label")

We specify that the model being deployed is to be used only for prediction using the predict_only parameter of the deploy() method.

    deployed_model = model_learned.deploy(
        name="demo_model_ixai", input=features, labels=label, predict_only=True
    )

Now, the get_endpoints() method is used to fetch an endpoint to which inference requests will be sent.

    model_endpoints = deployed_model.get_endpoints()

We define model_function as a wrapper for the inference requests being sent to the deployed model such that the outputs are compatible with iXAI's explanations API.

    import requests
    
    
    def model_function(x):
        resp = requests.post(
            model_endpoints[0], json=x, headers=tb.common.api.headers
        ).json()
        resp["output"] = resp.pop("predicted_class")
        return resp

We instantiate the IncrementalPFI class from iXAI with our prediction function defined above, along with the relevant fields from the dataset and the loss function to calculate the feature importance values.

    incremental_pfi = IncrementalPFI(
        model_function=model_function,
        loss_function=Accuracy(),
        feature_names=numerical_fields,
        smoothing_alpha=0.001,
        n_inner_samples=5,
    )

Finally, we loop through the stream for the first 10000 samples, updating our metric and incremental_pfi after each encountered sample.

At every 1000th step, we print out the metric with the feature importance values.

    training_metric = Rolling(Accuracy(), window_size=1000)
    for n, (x_i, y_i) in enumerate(stream, start=1):
        if n == 10000:
            break
    
        incremental_pfi.explain_one(x_i, y_i)
    
        if n % 1000 == 0:
            print(
                f"{n}: Accuracy: {training_metric.get()} PFI: {incremental_pfi.importance_values}"
            )