LLMs
LLM Embeddings

LLM Embeddings

Open In Colab (opens in a new tab)

One of the most important ways to model NLP tasks is to use pre-trained language model embeddings. This notebook covers how to download pre-trained models, use them to get text embeddings and build ML models on top of these embeddings using TurboML. We'll demonstrate this on a SMS Spam classification use-case.

Getting the dataset

    from river import datasets
    import pandas as pd
    import turboml as tb
    dataset = datasets.SMSSpam()
    dataset
    dict_list_x = []
    dict_list_y = []
    for x, y in dataset:
        dict_list_x.append(x)
        dict_list_y.append({"label": float(y)})
    df_features = pd.DataFrame.from_dict(dict_list_x).reset_index()
    df_labels = pd.DataFrame.from_dict(dict_list_y).reset_index()
    df_features
    df_labels
    features = tb.PandasDataset(
        dataset_name="sms_spam_features",
        key_field="index",
        dataframe=df_features,
        upload=True,
    )
    labels = tb.PandasDataset(
        dataset_name="sms_spam_labels", key_field="index", dataframe=df_labels, upload=True
    )
    model_features = features.get_input_fields(textual_fields=["body"])
    model_label = labels.get_label_field(label_field="label")

Downloading pre-trained models

Huggingface Hub (https://huggingface.co/models (opens in a new tab)) is one of the largest collection of pre-trained language models. It also has native intergrations with the GGUF format (https://huggingface.co/docs/hub/en/gguf (opens in a new tab)). This format is quickly becoming the standard for saving and loading models, and popular open-source projects like llama.cpp and GPT4All use this format. TurboML also uses the GGUF format to load pre-trained models. Here's how you can specify a model from Huggingface Hub, and TurboML will download and convert this in the right format.

We also support quantization of the model for conversion. The supported options are "f32", "f16", "bf16", "q8_0", "auto", where "f32" is for float32, "f16" for float16, "bf16" for bfloat16, "q8_0" for Q8_0, "auto" for the highest-fidelity 16-bit float type depending on the first loaded tensor type. "auto" is the default option.

For this notebook, we'll use the https://huggingface.co/BAAI/bge-small-en-v1.5 (opens in a new tab) model, with "f16" quantization.

    gguf_model = tb.acquire_hf_model_as_gguf("BAAI/bge-small-en-v1.5", "f16")
    gguf_model

Once we have converted the pre-trained model, we can now use this to generate embeddings. Here's how

    embedding_model = tb.LLAMAEmbedding(gguf_model_id=gguf_model)
    deployed_model = embedding_model.deploy(
        "bert_embedding", input=model_features, labels=model_label
    )
    outputs = deployed_model.get_outputs()
    embedding = outputs[0].get("record").embeddings
    print(
        "Length of the embedding vector is:",
        len(embedding),
        ". The first 5 values are:",
        embedding[:5],
    )

But embeddings directly don't solve our use-case! We ultimately need a classification model for spam detection. We can build a pre-processor that converts all our text data into numerical embeddings, and then these numerical values can be passed to a classifier model.

    model = tb.LlamaCppPreProcessor(base_model=tb.SGTClassifier(), gguf_model_id=gguf_model)
    deployed_model = model.deploy(
        "bert_sgt_classifier", input=model_features, labels=model_label
    )
    outputs = deployed_model.get_outputs()
    outputs[0]