Feature Engineering
UDF

Feature Engineering - Python UDFs

Open In Colab (opens in a new tab)

    import turboml as tb
    import pandas as pd
    transactions = pd.read_csv("data/transactions.csv").reset_index()
    transactions = tb.PandasDataset(
        dataset_name="transactions_udf",
        key_field="index",
        dataframe=transactions,
        upload=True,
    )

Simple User Defined function

For creating a user defined function first create a separate python file containing the function along with the imports used by it; the function should process the data and return a value. In the below example we have shown a simple example of a function that takes a value and then returns its sine value.

    myfunction_contents = open("udf_sine_of_amount.py").read()
    print(myfunction_contents)

User Defined Functions - Multiple Input example

We saw that the above user defined function is very simple. We can also create a more complicated function with multiple inputs, we can perform string processing etc

    my_complex_function_contents = open("udf_transaction_location_overlap.py").read()
    print(my_complex_function_contents)

Rich User Defined Functions

    %pip install psycopg_pool psycopg['binary'] psycopg2-binary
    my_rich_function_contents = open("rich_udf.py").read()
    print(my_rich_function_contents)

Feature Engineering using User Defined Functions (UDF)

Make sure the libraries that are specified are pip installable and hence named appropriately, for example, if the UDF uses a sklearn function, then the library to be installed should be "scikit-learn" (and not "sklearn")

    transactions.feature_engineering.create_udf_features(
        new_feature_name="sine_of_amount",
        argument_names=["transactionAmount"],
        function_name="myfunction",
        function_file_contents=myfunction_contents,
        libraries=["numpy"],
    )
    transactions.feature_engineering.create_udf_features(
        new_feature_name="transaction_location_overlap",
        argument_names=["ipCountryCode", "paymentBillingCountryCode"],
        function_name="my_complex_function",
        function_file_contents=my_complex_function_contents,
        libraries=[],
    )
    transactions.feature_engineering.create_rich_udf_features(
        new_feature_name="lookup_feature",
        argument_names=["index"],
        function_name="lookup",
        class_file_contents=my_rich_function_contents,
        libraries=["psycopg_pool", "psycopg['binary']", "psycopg2-binary"],
        class_name="PostgresLookup",
        dev_initializer_arguments=["root", "risingwave", "4566", "dev"],
        prod_initializer_arguments=["root", "risingwave", "4566", "dev"],
    )
    transactions.feature_engineering.get_local_features()
    transactions.feature_engineering.materialize_features(
        ["sine_of_amount", "transaction_location_overlap", "lookup_feature"]
    )
    transactions.feature_engineering.get_materialized_features()