TurboML Ibis Quickstart
import turboml as tb
import pandas as pd
import ibis
transactions_df = pd.read_csv("data/transactions.csv").reset_index()
labels_df = pd.read_csv("data/labels.csv").reset_index()
transactions = tb.PandasDataset(
dataset_name="transactions_ibis",
key_field="index",
dataframe=transactions_df,
upload=True,
)
labels = tb.PandasDataset(
dataset_name="labels_ibis", key_field="index", dataframe=labels_df, upload=True
)
The following cells shows how to define features in ibis. The table parameter in the create_ibis_features function takes in the ibis expression to be used to prepare the feature.
table = transactions.to_ibis()
@ibis.udf.scalar.python()
def add_one(x: float) -> float:
return x + 1
table = table.mutate(updated_transaction_amount=add_one(table.transactionAmount))
agged = table.select(
total_transaction_amount=table.updated_transaction_amount.sum().over(
window=ibis.window(preceding=100, following=0, group_by=[table.index]),
order_by=table.timestamp,
),
index=table.index,
is_potential_fraud=(
table.ipCountryCode != table.paymentBillingCountryCode.lower()
).ifelse(1, 0),
ipCountryCode=table.ipCountryCode,
paymentBillingCountryCode=table.paymentBillingCountryCode,
)
transactions.feature_engineering.create_ibis_features(agged)
transactions.feature_engineering.get_local_features()
We need to tell the platform to start computations for all pending features for the given topic. This can be done by calling the materialize_ibis_features function.
transactions.feature_engineering.materialize_ibis_features()
model = tb.RCF(number_of_trees=50)
numerical_fields = ["total_transaction_amount", "is_potential_fraud"]
features = transactions.get_input_fields(numerical_fields=numerical_fields)
label = labels.get_label_field(label_field="is_fraud")
deployed_model_rcf = model.deploy(name="demo_model_ibis", input=features, labels=label)
outputs = deployed_model_rcf.get_outputs()
len(outputs)
sample_output = outputs[-1]
sample_output
import matplotlib.pyplot as plt
plt.plot([output["record"].score for output in outputs])
model_endpoints = deployed_model_rcf.get_endpoints()
model_endpoints
model_query_datapoint = (
transactions_df[["index", "ipCountryCode", "paymentBillingCountryCode"]]
.iloc[-1]
.to_dict()
)
model_query_datapoint
import requests
resp = requests.post(
model_endpoints[0], json=model_query_datapoint, headers=tb.common.api.headers
)
resp.json()
Batch Inference on Models
While the above method is more suited for individual requests, we can also perform batch inference on the models. We use the get_inference function for this purpose.
outputs = deployed_model_rcf.get_inference(transactions_df)
outputs