Python Model: PySAD Example
In this example, we use the PySAD
package to monitor anomalies in our streaming data.
We start off by installing and importing the pysad
package along with its dependencies.
!pip install pysad mmh3==2.5.1
import turboml as tb
import pandas as pd
import numpy as np
from pysad.models import xStream
Model Definition
TurboML's inbuilt PythonModel
can be used to define custom models which are compatible with TurboML.
Here we define PySADModel
as a wrapper using PySAD
's xStream
model, making sure to properly implement the required instance methods for a Python model.
import turboml.common.pytypes as types
class PySADModel:
def __init__(self):
self.model = xStream()
def init_imports(self):
from pysad.models import xStream
import numpy as np
def learn_one(self, input: types.InputData):
self.model = self.model.fit_partial(np.array(input.numeric))
def predict_one(self, input: types.InputData, output: types.OutputData):
score = self.model.score_partial(np.array(input.numeric))
output.set_score(score)
Now, we create a custom venv
so that the custom model defined above has access to all the required dependencies. PySAD required mmh3==2.5.1 as per their docs.
venv = tb.setup_venv("my_pysad_venv", ["mmh3==2.5.1", "pysad", "numpy"])
venv.add_python_class(PySADModel)
Dataset
We define our dataset using TurboML's PandasDataset
class, and set upload=True
to specify that it is meant for streaming data.
transactions_df = pd.read_csv("data/transactions.csv").reset_index()
labels_df = pd.read_csv("data/labels.csv").reset_index()
transactions = tb.PandasDataset(
dataset_name="transactions_pysad",
key_field="index",
dataframe=transactions_df,
upload=True,
)
labels = tb.PandasDataset(
dataset_name="labels_pysad", key_field="index", dataframe=labels_df, upload=True
)
numerical_fields = [
"transactionAmount",
"localHour",
"isProxyIP",
"digitalItemCount",
"physicalItemCount",
]
features = transactions.get_input_fields(numerical_fields=numerical_fields)
label = labels.get_label_field(label_field="is_fraud")
Model Deployment
Now, we deploy our model and extract its outputs.
pysad_model = tb.Python(class_name=PySADModel.__name__, venv_name=venv.name)
deployed_model_pysad = pysad_model.deploy("pysad_model", input=features, labels=label)
outputs = deployed_model_pysad.get_outputs()
len(outputs)
Evaluation
Finally, we use any of PySAD
's metrics for giving a numerical value to the degree of the presence of anomalies in our data.
from pysad.evaluation import AUROCMetric
auroc = AUROCMetric()
for output, y in zip(
outputs, labels_df["is_fraud"].tolist()[: len(outputs)], strict=False
):
auroc.update(y, output.score)
auroc.get()