Model Explanations using iXAI
The iXAI
module can be used in combination with TurboML to provide incremental explanations for the models being trained.
We start by importing the ixai
package and relevant datasets from river
.
import turboml as tb
import pandas as pd
from ixai.explainer import IncrementalPFI
from river.metrics import Accuracy
from river.utils import Rolling
from river.datasets.synth import Agrawal
from river.datasets.synth import ConceptDriftStream
The sample size for the model to train on is defined.
Also, we initialize a concept drift data stream using the Agrawal
synthetic dataset from river
.
n_samples = 150_000
stream = Agrawal(classification_function=1, seed=42)
drift_stream = Agrawal(classification_function=2, seed=42)
stream = ConceptDriftStream(
stream,
drift_stream,
position=int(n_samples * 0.5),
width=int(n_samples * 0.1),
seed=42,
)
feature_names = list([x_0 for x_0, _ in stream.take(1)][0].keys())
A batch DataFrame is constructed from the stream defined above to train our model.
features_list = []
labels_list = []
for features, label in stream:
if len(features_list) == n_samples:
break
features_list.append(features)
labels_list.append(label)
features_df = pd.DataFrame(features_list).reset_index()
labels_df = pd.DataFrame(labels_list, columns=["label"]).reset_index()
numerical_fields = feature_names
We use the PandasDataset
class provided by TurboML to convert the DataFrame into a compatible dataset.
As part of defining the dataset, we specify the column to be used for primary keys.
Then, we get the relevant features from our dataset as defined by the numerical_fields
list.
dataset_full = tb.PandasDataset(
dataframe=features_df, key_field="index", streaming=False
)
labels_full = tb.PandasDataset(dataframe=labels_df, key_field="index", streaming=False)
features = dataset_full.get_input_fields(numerical_fields=numerical_fields)
label = labels_full.get_label_field(label_field="label")
We will be using and training the Hoeffding Tree Classifier
for this task.
model = tb.HoeffdingTreeClassifier(n_classes=2)
model_learned = model.learn(features, label)
Once the model has finished training, we get ready to deploy it so that it can be used for prediction.
To begin with, we re-define our dataset to now support streaming data, and get the relevant features as before.
dataset_full = tb.PandasDataset(
dataset_name="agrawal_model_explanation",
dataframe=features_df,
key_field="index",
upload=True,
)
labels_full = tb.PandasDataset(
dataset_name="labels_model_explanation",
key_field="index",
dataframe=labels_df,
upload=True,
)
features = dataset_full.get_input_fields(numerical_fields=numerical_fields)
label = labels_full.get_label_field(label_field="label")
We specify that the model being deployed is to be used only for prediction using the predict_only
parameter of the deploy()
method.
deployed_model = model_learned.deploy(
name="demo_model_ixai", input=features, labels=label, predict_only=True
)
Now, the get_endpoints()
method is used to fetch an endpoint to which inference requests will be sent.
model_endpoints = deployed_model.get_endpoints()
We define model_function
as a wrapper for the inference requests being sent to the deployed model such that the outputs are compatible with iXAI
's explanations API.
import requests
def model_function(x):
resp = requests.post(
model_endpoints[0], json=x, headers=tb.common.api.headers
).json()
resp["output"] = resp.pop("predicted_class")
return resp
We instantiate the IncrementalPFI
class from iXAI
with our prediction function defined above, along with the relevant fields from the dataset and the loss function to calculate the feature importance values.
incremental_pfi = IncrementalPFI(
model_function=model_function,
loss_function=Accuracy(),
feature_names=numerical_fields,
smoothing_alpha=0.001,
n_inner_samples=5,
)
Finally, we loop through the stream for the first 10000 samples, updating our metric and incremental_pfi
after each encountered sample.
At every 1000th step, we print out the metric with the feature importance values.
training_metric = Rolling(Accuracy(), window_size=1000)
for n, (x_i, y_i) in enumerate(stream, start=1):
if n == 10000:
break
incremental_pfi.explain_one(x_i, y_i)
if n % 1000 == 0:
print(
f"{n}: Accuracy: {training_metric.get()} PFI: {incremental_pfi.importance_values}"
)