Skip to main content
View Source on Github
minimum required for Auto Embeddings
Install extra dependencies in the SDK:
pip install arize[AutoEmbeddings]

The EmbeddingGenerator Class

Arize class to generate embeddings data. Import and initialize EmbeddingGenerator from arize.pandas.embeddings:
from arize.pandas.embeddings import EmbeddingGenerator

Methods

from_use_case View Source Pass in use_case and more options depending on the use case.
ArgumentDescription
use_caseUseCases.NLP.SEQUENCE_CLASSIFICATION orUseCases.NLP.SUMMARIZATION orUseCases.CV.IMAGE_CLASSIFICATION
model_nameRefer to Supported Models
list_pretrained_models View Source Returns updated table listing of supported models.
EmbeddingGenerator.list_pretrained_models()

Code Example

from arize.pandas.embeddings import EmbeddingGenerator, UseCases

# example CV
generator = EmbeddingGenerator.from_use_case(
    use_case=UseCases.CV.IMAGE_CLASSIFICATION,
    model_name="google/vit-base-patch16-224-in21k",
    batch_size=100
)
df["image_vector"] = generator.generate_embeddings(
    local_image_path_col=df["local_path"]
)

# example NLP
generator = EmbeddingGenerator.from_use_case(
    use_case=UseCases.NLP.SEQUENCE_CLASSIFICATION,
    model_name="distilbert-base-uncased",
    tokenizer_max_length=512,
    batch_size=100
)
df["text_vector"] = generator.generate_embeddings(text_col=df["text"])