Model Explanations using iXAI

(opens in a new tab)

The iXAI module can be used in combination with TurboML to provide incremental explanations for the models being trained.

import turboml as tb

We start by importing the ixai package and relevant datasets from river.

!pip install river git+https://github.com/mmschlk/iXAI

import pandas as pd
from ixai.explainer import IncrementalPFI
from river.metrics import Accuracy
from river.utils import Rolling
from river.datasets.synth import Agrawal
from river.datasets.synth import ConceptDriftStream

The sample size for the model to train on is defined.

Also, we initialize a concept drift data stream using the Agrawal synthetic dataset from river.

n_samples = 150_000
stream = Agrawal(classification_function=1, seed=42)
drift_stream = Agrawal(classification_function=2, seed=42)
stream = ConceptDriftStream(
    stream,
    drift_stream,
    position=int(n_samples * 0.5),
    width=int(n_samples * 0.1),
    seed=42,
)

feature_names = list([x_0 for x_0, _ in stream.take(1)][0].keys())

A batch DataFrame is constructed from the stream defined above to train our model.

features_list = []
labels_list = []
 
for features, label in stream:
    if len(features_list) == n_samples:
        break
    features_list.append(features)
    labels_list.append(label)
 
features_df = pd.DataFrame(features_list).reset_index()
labels_df = pd.DataFrame(labels_list, columns=["label"]).reset_index()

numerical_fields = feature_names

We use the LocalDataset class provided by TurboML to convert the DataFrame into a compatible dataset.

As part of defining the dataset, we specify the column to be used for primary keys.

Then, we get the relevant features from our dataset as defined by the numerical_fields list.

dataset_full = tb.LocalDataset.from_pd(df=features_df, key_field="index")
labels_full = tb.LocalDataset.from_pd(df=labels_df, key_field="index")

features = dataset_full.get_model_inputs(numerical_fields=numerical_fields)
label = labels_full.get_model_labels(label_field="label")

We will be using and training the Hoeffding Tree Classifier for this task.

model = tb.HoeffdingTreeClassifier(n_classes=2)

model_learned = model.learn(features, label)

Once the model has finished training, we get ready to deploy it so that it can be used for prediction.

To begin with, we re-define our dataset to now support streaming data, and get the relevant features as before.

dataset_full = dataset_full.to_online(
    id="agrawal_model_explaination", load_if_exists=True
)
labels_full = labels_full.to_online(id="labels_model_explaination", load_if_exists=True)

features = dataset_full.get_model_inputs(numerical_fields=numerical_fields)
label = labels_full.get_model_labels(label_field="label")

We specify that the model being deployed is to be used only for prediction using the predict_only parameter of the deploy() method.

deployed_model = model_learned.deploy(
    name="demo_model_ixai", input=features, labels=label, predict_only=True
)

Now, the get_endpoints() method is used to fetch an endpoint to which inference requests will be sent.

model_endpoints = deployed_model.get_endpoints()

We define model_function as a wrapper for the inference requests being sent to the deployed model such that the outputs are compatible with iXAI's explanations API.

import requests
 
 
def model_function(x):
    resp = requests.post(
        model_endpoints[0], json=x, headers=tb.common.api.headers
    ).json()
    resp["output"] = resp.pop("predicted_class")
    return resp

We instantiate the IncrementalPFI class from iXAI with our prediction function defined above, along with the relevant fields from the dataset and the loss function to calculate the feature importance values.

incremental_pfi = IncrementalPFI(
    model_function=model_function,
    loss_function=Accuracy(),
    feature_names=numerical_fields,
    smoothing_alpha=0.001,
    n_inner_samples=5,
)

Finally, we loop through the stream for the first 10000 samples, updating our metric and incremental_pfi after each encountered sample.

At every 1000th step, we print out the metric with the feature importance values.

training_metric = Rolling(Accuracy(), window_size=1000)
for n, (x_i, y_i) in enumerate(stream, start=1):
    if n == 10000:
        break
 
    incremental_pfi.explain_one(x_i, y_i)
 
    if n % 1000 == 0:
        print(
            f"{n}: Accuracy: {training_metric.get()} PFI: {incremental_pfi.importance_values}"
        )

Drift Custom Evaluation Metric