LLAMA Embedding
Use the GGUF format to load pre-trained language models. Invoke them on the textual features in the input to get embeddings for them.
Parameters
-
gguf_model_id(
List[int]
) → A model id issued bytb.acquire_hf_model_as_gguf
. -
max_tokens_per_input(
int
) → The maximum number of tokens to consider in the input text. Tokens beyond this limit will be truncated. Default is 512.
Example Usage
We can create an instance and deploy LLAMAEmbedding model like this.
import turboml as tb
embedding = tb.LLAMAEmbedding(gguf_model_id=tb.acquire_hf_model_as_gguf("BAAI/bge-small-en-v1.5", "f16"), max_tokens_per_input=512)