Feature Engineering - Python UDFs
import turboml as tb
import pandas as pd
transactions = pd.read_csv("data/transactions.csv").reset_index()
transactions = tb.PandasDataset(
dataset_name="transactions_udf",
key_field="index",
dataframe=transactions,
upload=True,
)
Simple User Defined function
For creating a user defined function first create a separate python file containing the function along with the imports used by it; the function should process the data and return a value. In the below example we have shown a simple example of a function that takes a value and then returns its sine value.
myfunction_contents = open("udf_sine_of_amount.py").read()
print(myfunction_contents)
User Defined Functions - Multiple Input example
We saw that the above user defined function is very simple. We can also create a more complicated function with multiple inputs, we can perform string processing etc
my_complex_function_contents = open("udf_transaction_location_overlap.py").read()
print(my_complex_function_contents)
Rich User Defined Functions
%pip install psycopg_pool psycopg['binary'] psycopg2-binary
my_rich_function_contents = open("rich_udf.py").read()
print(my_rich_function_contents)
Feature Engineering using User Defined Functions (UDF)
Make sure the libraries that are specified are pip installable and hence named appropriately, for example, if the UDF uses a sklearn function, then the library to be installed should be "scikit-learn" (and not "sklearn")
transactions.feature_engineering.create_udf_features(
new_feature_name="sine_of_amount",
argument_names=["transactionAmount"],
function_name="myfunction",
function_file_contents=myfunction_contents,
libraries=["numpy"],
)
transactions.feature_engineering.create_udf_features(
new_feature_name="transaction_location_overlap",
argument_names=["ipCountryCode", "paymentBillingCountryCode"],
function_name="my_complex_function",
function_file_contents=my_complex_function_contents,
libraries=[],
)
transactions.feature_engineering.create_rich_udf_features(
new_feature_name="lookup_feature",
argument_names=["index"],
function_name="lookup",
class_file_contents=my_rich_function_contents,
libraries=["psycopg_pool", "psycopg['binary']", "psycopg2-binary"],
class_name="PostgresLookup",
dev_initializer_arguments=["root", "risingwave", "4566", "dev"],
prod_initializer_arguments=["root", "risingwave", "4566", "dev"],
)
transactions.feature_engineering.get_local_features()
transactions.feature_engineering.materialize_features(
["sine_of_amount", "transaction_location_overlap", "lookup_feature"]
)
transactions.feature_engineering.get_materialized_features()