Python Model: PySAD Example

In this example, we use the PySAD package to monitor anomalies in our streaming data.

import turboml as tb

We start off by installing and importing the pysad package along with its dependencies.

!pip install pysad mmh3==2.5.1

import pandas as pd
import numpy as np
from pysad.models import xStream

Model Definition

TurboML's inbuilt PythonModel can be used to define custom models which are compatible with TurboML.

Here we define PySADModel as a wrapper using PySAD's xStream model, making sure to properly implement the required instance methods for a Python model.

import turboml.common.pytypes as types
 
 
class PySADModel:
    def __init__(self):
        self.model = xStream()
 
    def init_imports(self):
        from pysad.models import xStream
        import numpy as np
 
    def learn_one(self, input: types.InputData):
        self.model = self.model.fit_partial(np.array(input.numeric))
 
    def predict_one(self, input: types.InputData, output: types.OutputData):
        score = self.model.score_partial(np.array(input.numeric))
        output.set_score(score)

Now, we create a custom venv so that the custom model defined above has access to all the required dependencies. PySAD required mmh3==2.5.1 as per their docs.

venv = tb.setup_venv("my_pysad_venv", ["mmh3==2.5.1", "pysad", "numpy"])
venv.add_python_class(PySADModel)

Dataset

We choose our standard FraudDetection dataset, using the to_online method to push it to the platform.

transactions = tb.datasets.FraudDetectionDatasetFeatures().to_online(
    id="transactions", load_if_exists=True
)
labels = tb.datasets.FraudDetectionDatasetLabels().to_online(
    id="transaction_labels", load_if_exists=True
)

numerical_fields = [
    "transactionAmount",
    "localHour",
    "isProxyIP",
    "digitalItemCount",
    "physicalItemCount",
]

features = transactions.get_model_inputs(numerical_fields=numerical_fields)
label = labels.get_model_labels(label_field="is_fraud")

Model Deployment

Now, we deploy our model and extract its outputs.

pysad_model = tb.Python(class_name=PySADModel.__name__, venv_name=venv.name)

deployed_model_pysad = pysad_model.deploy("pysad_model", input=features, labels=label)

outputs = deployed_model_pysad.get_outputs()

len(outputs)

Evaluation

Finally, we use any of PySAD's metrics for giving a numerical value to the degree of the presence of anomalies in our data.

from pysad.evaluation import AUROCMetric

auroc = AUROCMetric()
labels_df = labels.preview_df
for output, y in zip(
    outputs, labels_df["is_fraud"].tolist()[: len(outputs)], strict=False
):
    auroc.update(y, output.score)

auroc.get()

Batch Python Model UDF