Python Model: PySAD Example
In this example, we use the PySAD package to monitor anomalies in our streaming data.
import turboml as tbWe start off by installing and importing the pysad package along with its dependencies.
!pip install pysad mmh3==2.5.1import pandas as pd
import numpy as np
from pysad.models import xStreamModel Definition
TurboML's inbuilt PythonModel can be used to define custom models which are compatible with TurboML.
Here we define PySADModel as a wrapper using PySAD's xStream model, making sure to properly implement the required instance methods for a Python model.
import turboml.common.pytypes as types
class PySADModel:
def __init__(self):
self.model = xStream()
def init_imports(self):
from pysad.models import xStream
import numpy as np
def learn_one(self, input: types.InputData):
self.model = self.model.fit_partial(np.array(input.numeric))
def predict_one(self, input: types.InputData, output: types.OutputData):
score = self.model.score_partial(np.array(input.numeric))
output.set_score(score)Now, we create a custom venv so that the custom model defined above has access to all the required dependencies. PySAD required mmh3==2.5.1 as per their docs.
venv = tb.setup_venv("my_pysad_venv", ["mmh3==2.5.1", "pysad", "numpy"])
venv.add_python_class(PySADModel)Dataset
We choose our standard FraudDetection dataset, using the to_online method to push it to the platform.
transactions = tb.datasets.FraudDetectionDatasetFeatures().to_online(
id="transactions", load_if_exists=True
)
labels = tb.datasets.FraudDetectionDatasetLabels().to_online(
id="transaction_labels", load_if_exists=True
)numerical_fields = [
"transactionAmount",
"localHour",
"isProxyIP",
"digitalItemCount",
"physicalItemCount",
]features = transactions.get_model_inputs(numerical_fields=numerical_fields)
label = labels.get_model_labels(label_field="is_fraud")Model Deployment
Now, we deploy our model and extract its outputs.
pysad_model = tb.Python(class_name=PySADModel.__name__, venv_name=venv.name)deployed_model_pysad = pysad_model.deploy("pysad_model", input=features, labels=label)outputs = deployed_model_pysad.get_outputs()len(outputs)Evaluation
Finally, we use any of PySAD's metrics for giving a numerical value to the degree of the presence of anomalies in our data.
from pysad.evaluation import AUROCMetricauroc = AUROCMetric()
labels_df = labels.preview_df
for output, y in zip(
outputs, labels_df["is_fraud"].tolist()[: len(outputs)], strict=False
):
auroc.update(y, output.score)auroc.get()