> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Embeddings

> Auto-generate embeddings for text, images, and structured data using pre-trained models. Required for UMAP visualizations and drift detection.

Automatically generate embeddings for text, images, and structured data using pre-trained models.

## Key Capabilities

* Pre-trained models for common use cases
* Batch processing for efficiency
* Automatic handling of tokenization and preprocessing
* Support for custom models

## What is an Embedding?

Embeddings are vector representations of data. Embeddings are everywhere in modern
deep learning, such as transformers, recommendation engines, layers of deep neural
networks, encoders, and decoders.

## Why Embeddings for Analyzing Deep Learning Models?

Data drift in unstructured data like images is complicated to measure. The measures
typically used for drift in structured data do not extend to unstructured data.
The general challenge with measuring unstructured data drift is that you need to
understand the change in relationships inside the unstructured data itself.

Embeddings are needed for users to access Arize's UMAP product line.

## Quick Start

```python theme={null}
import pandas as pd
from arize.embeddings import EmbeddingGenerator, UseCases

# List available models
print(EmbeddingGenerator.list_pretrained_models())

# Create example data
df = pd.DataFrame({
    "text": [
        "The product quality is excellent.",
        "Shipping was delayed by 3 days.",
        "Customer service was very helpful.",
    ],
})

# Generate embeddings for NLP
generator = EmbeddingGenerator.from_use_case(
    use_case=UseCases.NLP.SEQUENCE_CLASSIFICATION,
    model_name="distilbert-base-uncased",
    tokenizer_max_length=512,
    batch_size=100,
)

df["text_vector"] = generator.generate_embeddings(text_col=df["text"])
```

## Supported Use Cases

| Use Case                                 | Model Types               |
| ---------------------------------------- | ------------------------- |
| `UseCases.NLP.SEQUENCE_CLASSIFICATION`   | BERT, DistilBERT, RoBERTa |
| `UseCases.NLP.SUMMARIZATION`             | BART, T5, Pegasus         |
| `UseCases.CV.IMAGE_CLASSIFICATION`       | ResNet, VGG, EfficientNet |
| `UseCases.CV.OBJECT_DETECTION`           | YOLO, Faster R-CNN        |
| `UseCases.STRUCTURED.TABULAR_EMBEDDINGS` | Custom tabular encoders   |
