Write Your Own Models
PySAD Example

Python Model: PySAD Example

Open In Colab (opens in a new tab)

In this example, we use the PySAD package to monitor anomalies in our streaming data.

We start off by installing and importing the pysad package along with its dependencies.

    !pip install pysad mmh3==2.5.1
    import turboml as tb
    import pandas as pd
    import numpy as np
    from pysad.models import xStream

Model Definition

TurboML's inbuilt PythonModel can be used to define custom models which are compatible with TurboML.

Here we define PySADModel as a wrapper using PySAD's xStream model, making sure to properly implement the required instance methods for a Python model.

    import turboml.common.pytypes as types
    
    
    class PySADModel:
        def __init__(self):
            self.model = xStream()
    
        def init_imports(self):
            from pysad.models import xStream
            import numpy as np
    
        def learn_one(self, input: types.InputData):
            self.model = self.model.fit_partial(np.array(input.numeric))
    
        def predict_one(self, input: types.InputData, output: types.OutputData):
            score = self.model.score_partial(np.array(input.numeric))
            output.set_score(score)

Now, we create a custom venv so that the custom model defined above has access to all the required dependencies. PySAD required mmh3==2.5.1 as per their docs.

    venv = tb.setup_venv("my_pysad_venv", ["mmh3==2.5.1", "pysad", "numpy"])
    venv.add_python_class(PySADModel)

Dataset

We define our dataset using TurboML's PandasDataset class, and set upload=True to specify that it is meant for streaming data.

    transactions_df = pd.read_csv("data/transactions.csv").reset_index()
    labels_df = pd.read_csv("data/labels.csv").reset_index()
    transactions = tb.PandasDataset(
        dataset_name="transactions_pysad",
        key_field="index",
        dataframe=transactions_df,
        upload=True,
    )
    labels = tb.PandasDataset(
        dataset_name="labels_pysad", key_field="index", dataframe=labels_df, upload=True
    )
    numerical_fields = [
        "transactionAmount",
        "localHour",
        "isProxyIP",
        "digitalItemCount",
        "physicalItemCount",
    ]
    features = transactions.get_input_fields(numerical_fields=numerical_fields)
    label = labels.get_label_field(label_field="is_fraud")

Model Deployment

Now, we deploy our model and extract its outputs.

    pysad_model = tb.Python(class_name=PySADModel.__name__, venv_name=venv.name)
    deployed_model_pysad = pysad_model.deploy("pysad_model", input=features, labels=label)
    outputs = deployed_model_pysad.get_outputs()
    len(outputs)

Evaluation

Finally, we use any of PySAD's metrics for giving a numerical value to the degree of the presence of anomalies in our data.

    from pysad.evaluation import AUROCMetric
    auroc = AUROCMetric()
    
    for output, y in zip(
        outputs, labels_df["is_fraud"].tolist()[: len(outputs)], strict=False
    ):
        auroc.update(y, output.score)
    auroc.get()