Write Your Own Models
PySAD Example

Python Model: PySAD Example

Open In Colab (opens in a new tab)

In this example, we use the PySAD package to monitor anomalies in our streaming data.

We start off by installing and importing the pysad package along with its dependencies.

import pandas as pd
import turboml as tb
!pip install pysad mmh3==2.5.1
import numpy as np
from pysad.models import xStream

Model Definition

TurboML's inbuilt PythonModel can be used to define custom models which are compatible with TurboML.

Here we define PySADModel as a wrapper using PySAD's xStream model, making sure to properly implement the required instance methods for a Python model.

import turboml.common.pytypes as types
 
 
class PySADModel:
    def __init__(self):
        self.model = xStream()
 
    def init_imports(self):
        from pysad.models import xStream
        import numpy as np
 
    def learn_one(self, input: types.InputData):
        self.model = self.model.fit_partial(np.array(input.numeric))
 
    def predict_one(self, input: types.InputData, output: types.OutputData):
        score = self.model.score_partial(np.array(input.numeric))
        output.set_score(score)

Now, we create a custom venv so that the custom model defined above has access to all the required dependencies. PySAD required mmh3==2.5.1 as per their docs.

venv = tb.setup_venv("my_pysad_venv", ["mmh3==2.5.1", "pysad", "numpy"])
venv.add_python_class(PySADModel)

Dataset

We define our dataset using TurboML's PandasDataset class, and set upload=True to specify that it is meant for streaming data.

transactions_df = pd.read_csv("data/transactions.csv").reset_index()
labels_df = pd.read_csv("data/labels.csv").reset_index()
try:
    transactions = tb.PandasDataset(
        dataset_name="transactions_pysad",
        key_field="index",
        dataframe=transactions_df,
        upload=True,
    )
except:
    transactions = tb.PandasDataset(dataset_name="transactions_pysad")
 
try:
    labels = tb.PandasDataset(
        dataset_name="labels_pysad", key_field="index", dataframe=labels_df, upload=True
    )
except:
    labels = tb.PandasDataset(dataset_name="labels_pysad")
numerical_fields = [
    "transactionAmount",
    "localHour",
    "isProxyIP",
    "digitalItemCount",
    "physicalItemCount",
]
features = transactions.get_input_fields(numerical_fields=numerical_fields)
label = labels.get_label_field(label_field="is_fraud")

Model Deployment

Now, we deploy our model and extract its outputs.

pysad_model = tb.Python(class_name=PySADModel.__name__, venv_name=venv.name)
deployed_model_pysad = pysad_model.deploy("pysad_model", input=features, labels=label)
outputs = deployed_model_pysad.get_outputs()
len(outputs)

Evaluation

Finally, we use any of PySAD's metrics for giving a numerical value to the degree of the presence of anomalies in our data.

from pysad.evaluation import AUROCMetric
auroc = AUROCMetric()
 
for output, y in zip(
    outputs, labels_df["is_fraud"].tolist()[: len(outputs)], strict=False
):
    auroc.update(y, output.score)
auroc.get()