Pipeline Components
LLAMA Embedding

LLAMA Embedding

Use the GGUF format to load pre-trained language models. Invoke them on the textual features in the input to get embeddings for them.

Parameters

  • gguf_model_id(List[int]) → A model id issued by tb.acquire_hf_model_as_gguf.

  • max_tokens_per_input(int) → The maximum number of tokens to consider in the input text. Tokens beyond this limit will be truncated. Default is 512.

Example Usage

We can create an instance and deploy LLAMAEmbedding model like this.

import turboml as tb
embedding = tb.LLAMAEmbedding(gguf_model_id=tb.acquire_hf_model_as_gguf("BAAI/bge-small-en-v1.5", "f16"), max_tokens_per_input=512)