Natural Language Processing (NLP)

NLP Model Overview

Text Classification Models predict the categories a piece of text might belong to.

NLP Cases	Expected Fields	Performance Metrics
NLP Classification	*prediction label, actual label, prediction score, actual score	Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity
NLP NER	*prediction label, actual label, prediction score, actual score	Accuracy, Recall, Precision, FPR, FNR, F1, Sensitivity, Specificity

*all classification variant specifications apply to the NLP model type, with the addition of embeddings

Code Example

The EmbeddingColumnNames class constructs your embedding objects. You can log them into the platform using a dictionary that maps the embedding feature names to the embedding objects. See our API reference for more details.

Python Pandas
Python Single Record
UI Import JSON Input
Import for API

Example Row

text_vector	text	prediction_label	actual_label	prediction_score	actual_score	Timestamp
`[4.0, 5.0, 6.0, 7.0]`	`"This is a test sentence"`	`positive`	`neutral`	`0.3`	`1`	`1618590882`

from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments, EmbeddingColumnNames

API_KEY = 'ARIZE_API_KEY'
SPACE_ID = 'YOUR SPACE ID'
arize_client = Client(space_id=SPACE_ID, api_key=API_KEY)


# Declare which columns are the feature columns
feature_column_names=[
    "MERCHANT_TYPE", 
    "ENTRY_MODE", 
    "STATE", 
    "MEAN_AMOUNT", 
    "STD_AMOUNT", 
    "TX_AMOUNT",
]

# feature & tag columns can be optionally defined with typing:
tag_columns = TypedColumns(
    inferred=["name"],
    to_int=["zip_code", "age"]
)

# Declare embedding feature columns
embedding_feature_column_names = {
    # Dictionary keys will be the name of the embedding feature in the app
    "embedding_display_name": EmbeddingColumnNames(
        vector_column_name="text_vector",  # column name of the vectors, required
        data_column_name="text", # column name of the raw data vectors are representing, optional
    )
}

# Defina the Schema, including embedding information
schema = Schema(
    prediction_id_column_name="prediction_id",
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="PREDICTION",
    prediction_score_column_name="PREDICTION_SCORE",
    actual_label_column_name="ACTUAL",
    actual_score_column_name="ACTUAL_SCORE",
    feature_column_names=feature_column_names,
    embedding_feature_column_names=embedding_feature_column_names,
    tag_column_names=tag_columns,
)

# Log the dataframe with the schema mapping 
response = arize_client.log(
    model_id="sample-model-1",
    model_version= "v1",
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.PRODUCTION,
    dataframe=test_dataframe,
    schema=schema,
)

https://storage.googleapis.com/arize-phoenix-assets/assets/images/phoenix-docs-images/gc.ico

Google Colab

NLP Embedding FeaturesArize supports logging the embedding features associated with the text the model is acting on and the text itself using the EmbeddingColumnNames object.

The vector_column_name should be the name of the column where the embedding vectors are stored. The embedding vector is the dense vector representation of the unstructured input. ⚠️ Note: embedding features are not sparse vectors.
The data_column_name should be the name of the column where the raw text associated with the vector is stored. It is the field typically chosen for NLP use cases. The column can contain both strings (full sentences) or a list of strings (token arrays).

{ 
    "embedding_display_name": EmbeddingColumnNames(
        vector_column_name="text_vector", 
        data_column_name="text" 
    ) 
}

See here for more information on embeddings and options for generating them.

from arize.api import Client
from arize.utils.types import ModelTypes, Environments, Embedding

API_KEY = 'ARIZE_API_KEY'
SPACE_ID = 'YOUR SPACE ID'
arize_client = Client(space_id=SPACE_ID, api_key=API_KEY)

# Example features; features & tags can be optionally defined with typing
features = {
    'state': 'ca',
    'city': 'berkeley',
    'merchant_name': 'Peets Coffee',
    'pos_approved': TypedValue(value=False, type=ArizeTypes.INT),
    'item_count': 10,
    'merchant_type': 'coffee shop',
    'charge_amount': TypedValue(value=20.11, type=ArizeTypes.FLOAT),
}
    
# Example embedding features
embedding_features = {
    "nlp_embedding": Embedding(
        vector=np.array([4.0, 5.0, 6.0, 7.0]),
        data="This is a test sentence",
    ),
}

# Log data into the Arize platform
response = arize.log(
    model_id='sample-model-1', 
    model_version='v1", 
    model_type=ModelTypes.SCORE_CATEGORICAL, 
    environment=Environments.PRODUCTION,
    features=features
    prediction_label="not fraud",
    prediction_score = 0.3
    actual_label="fraud",
    actual_score = 1
    features=features,
    embedding_features=embedding_features 
)

NLP Embedding FeaturesArize supports logging the embedding features associated with the text the model is acting on and the text itself using the Embedding object.

The embedding vector is the dense vector representation of the unstructured input. ⚠️ Note: embedding features are not sparse vectors.
The embedding data is the raw data associated with the vector. It is the field typically chosen for NLP use cases since you can introduce both strings (full sentences) or a list of strings (token arrays).

{ 
    "embeddinEmbeddingg_display_name": EmbeddingColumnNames(
        vector=np.array([4.0, 5.0, 6.0, 7.0]),
        data="This is a test sentence",
    ) 
}

See here for more information on embeddings and options for generating them.

When configuring an embedding in the UI using File Import

"embedding_features": [{
"my_feature": #required, my_feature is the name of the feature
        {
        vector: "vector_col", #required, vector_col is the column name of the vector
        raw_data: "raw_data_col", #optional
        link_to_data: "link_to_data_col" #optional
        }
}]

Example file schema with embedding features

{
"prediction_id": "prediction_id",
"timestamp": "timestamp",
"tags": "tag/",
"prediction_score": "prediction_score",
"prediction_label": "prediction_label",
"actual_label": "actual_label",
"actual_score": "actual_score",
"shap_values": "shap/",
"version": "version", #lookup the column "version" in the file
"batch_id": "batch_id",
"exclude": [
    "<column1 name>",
    "<column2 name>"
],
"embedding_features": [
    {
    "embedding_1": {
        "vector": "vector_column_1"
        "raw_data": "raw_data_column_1",
        "link_to_data": "link_to_data_column"
    }
    }
]
}

When configuring an embedding in the UI using the API

"embeddingFeatures": [{
"featureName": "my_feature",
"vectorCol": "vector_col",
"rawDataCol": "raw_data_col",
"linkToDataCol": "link_to_data_col"
}]

Example file schema with embedding features

prediction_id: prediction_id
timestamp: timestamp
features: feature/
tags: tag/
prediction_score: prediction_score
prediction_label: prediction_label
actual_label: actual_label
actual_score: actual_score
shap_values: shap/
version: version #lookup the column "version" in the file
batch_id: batch_id
exclude: #leave empty to omit column exclusions
embedding_features: #leave empty to omit embeddings

How to Use Arize AX

Quickstart

Instrument

Observe

Evaluate

Improve

Machine Learning

Settings

Security

Natural Language Processing (NLP)

NLP Model Overview

Code Example

How to Use Arize AX

Quickstart

Instrument

Observe

Evaluate

Improve

Machine Learning

Settings

Security

Documentation Index

​NLP Model Overview

​Code Example

NLP Model Overview

Code Example