> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure Blob Storage

> Set up an import job to ingest data into Arize from Azure

Set up an import job to log inference files to Arize. Users generally find a sweet spot around a few hundred thousand to a million rows in each file, with the total file limit being 1GB.

## Step 1. Get The Storage Container Name & Prefix

Create a blob storage container and folder *(optional)* where you would like Arize to pull your model's inferences.

<Info>
  For example you might set up a container named `bucket1` and folder `/click-thru-rate/production/v1/` that contains CSV files of your model inferences.

  In this example, your bucket name is `bucket1` and your prefix is `click-thru-rate/production/v1/`
</Info>

There are multiple ways to structure model data. To easily ingest model inference data from storage, adopt a standardized directory structure across all models.

## Step 2. Add the Arize Service Principal

Follow the steps to download the Azure CLI: [https://learn.microsoft.com/en-us/cli/azure/install-azure-cli](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)

Add the Arize Service Principal by referencing our application id:

```python theme={null}
az ad sp create --id eb6cb4d2-f42d-4ef6-bacb-2417d3086e47
```

## **Step 3. Grant role to the Arize Service Principal**

<Tabs>
  <Tab title="Azure Portal">
    Find the storage account name that your container is created under, and click "Access Control"

    <Frame caption="">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/6feb9df4-image.jpeg" />
    </Frame>

    Go to "Role Assignments" and click "Add"

    <Frame caption="">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/5ca556ee-image.jpeg" />
    </Frame>

    Search for "Storage Blob Data Reader" and click on it

    <Frame caption="">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/a1caca2a-image.jpeg" />
    </Frame>

    Click "Next" and check "Assign access to: User, group, or service principal". Click on "Select Members" and search for "Arize".

    <Frame caption="">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/71d17972-image.jpeg" />
    </Frame>

    Click on "Review + Assign"

    <Frame caption="">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/1f255429-image.jpeg" />
    </Frame>

    Ensure our Service Principal appears as having the "Storage Blob Data Reader" role

    <Frame caption="">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/dee2edb6-image.jpeg" />
    </Frame>
  </Tab>

  <Tab title="Azure CLI">
    Run the following azure CLI command:

    Note the following environment variable substitutions

    * `${OBJECT_ID}`: The object ID returned from creating the Arize Service Principal (**not** the same as the application ID, and will be unique for your account)

    <Frame caption="">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/52b9e14e-image.jpeg" />
    </Frame>

    * `${YOUR_SUBSCRIPTION_ID}`: The Azure subscription ID for your storage account.

    * `${YOUR_RESOURCE_GROUP}`: The resource group your storage account resides in.

    <Frame caption="">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/fdd79367-image.jpeg" />
    </Frame>

    * `${YOUR_STORAGE_ACCOUNT_NAME}`: The storage account name

    ```bash theme={null}
    az role assignment create \
    --role "Storage Blob Data Reader" \
    --assignee-object-id ${OBJECT_ID} \
    --assignee-principal-type ServicePrincipal \
    --scope /subscriptions/${YOUR_SUBSCRIPTION_ID}/resourceGroups/${YOUR_RESOURCE_GROUP}/providers/Microsoft.Storage/storageAccounts/${YOUR_STORAGE_ACCOUNT_NAME}
    ```
  </Tab>
</Tabs>

## **Step 4. Select Azure Storage**

Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'Azure Blob Storage' card to begin **a new file import job.**

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/2fc9301f-image.jpeg" />
</Frame>

Fill in the file path where you would like Arize to pull your model's inferences. Arize will automatically infer your bucket name and prefix.

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/e99583cb-image.jpeg" />
</Frame>

Also specify your Azure AD Tenant ID and Azure Storage Account Name. The Tenant ID can be found in the following page on the portal:

Search for "Azure Active Directory"

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/0844b504-image.jpeg" />
</Frame>

Take note of your tenant ID:

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/3458bd70-image.jpeg" />
</Frame>

In this example, you might have a bucket and folder named `azure``://example-demo-bucket/click-thru-rate/production/v1/` that contains parquet files of your model inferences. Your bucket name is `example-demo-bucket` and your prefix is `click-thru-rate/production/v1/`*.*

<Info>
  The file structure can take into consideration various model environments (training, production, etc) and locations of ground truth. In addition, Azure blob store import allows recursive operations. This means that it will include all nested subdirectories within the specified bucket prefix, regardless of the number or depth of these directories
</Info>

**File Directory Example**

There are multiple ways to structure your file directory. If actuals and predictions can be sent together, simply store this data in a the same file and import this data together through a single file importer job.

In the case of delayed actuals, we recommend you separate your predictions and actuals into **separate folders** and loading this data through two separate file importer jobs. Learn more [here](/ax/machine-learning/machine-learning/concepts-ml/how-to-send-delayed-actuals).

```bash theme={null}
azure://bucket1/click-thru-rate/production/prediction/
├── 11-19-2022.parquet 
├── 11-20-2022.parquet
├── 11-21-2022.parquet

azure://bucket1/click-thru-rate/production/actuals/
├── 12-1-2022.parquet # same prediction id column, model, and space as the corresponding prediction
├── 12-2-2022.parquet
└── 12-3-2022.parquet
```

## Step 5. Add Proof Of Ownership To Your Container

In your container metadata, add an entry with the key as `arize_ingestion_key` and the provided tag value.

* **In Arize UI:** Copy the `arize_ingestion_key` value.

* **In Azure UI:** Navigate to your Container -> Settings -> Metadata.

<Frame caption="Click on Metadata and fill out the key value pair defined in the Arize UI">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/d0820999-image.jpeg" />
</Frame>

## **Step 6a. Define Your Model Schema**

Model schema parameters are a way of organizing model inference data to ingest to Arize. When configuring your schema, be sure to match your data column headers with the model schema.

You can either **use a form** or a simple **JSON-based schema** to specify the column mapping.

Arize supports **CSV, Parquet, Avro**, and **Apache Arrow.** Refer [here](/ax/machine-learning/machine-learning/how-to-ml/upload-data-to-arize/sending-data-faq#what-are-the-expected-data-types-for-my-file-type) for a list of the expected data types by input type.

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/ce247e5e-image.jpeg" />
</Frame>

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/00de9283-image.jpeg" />
</Frame>

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/c07d5ae7-image.jpeg" />
</Frame>

Learn more about Schema fields [here](https://arize.com/docs/ax/machine-learning/machine-learning/concepts-ml/model-schema-reference).

## Step 6b. Validate Your Model Schema

Once you fill in your applicable predictions, actuals, and model inputs, click 'Validate Schema' to visualize your model schema in the Arize UI. Check that your column names and corresponding data match for a successful import job.

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/aa1d250a-image.jpeg" />
</Frame>

Learn more about Schema fields [here](https://arize.com/docs/ax/machine-learning/machine-learning/concepts-ml/model-schema-reference).

Once finished, your import job will be created and will start polling your bucket for files.

<Info>
  If your model receives delayed actuals, connect your predictions and actuals using the same **prediction ID**, which links your data together in the Arize platform. Arize regularly checks your data source for both predictions and actuals, and ingests them separately as they become available. Learn more [here](/ax/machine-learning/machine-learning/how-to-ml/upload-data-to-arize/sending-data-faq#delayed-actuals-q-and-a).
</Info>

## **Step 7. Check Job Status**

Arize will attempt a dry run to validate your job for any access, schema, or record-level errors. If the dry run is successful, you can proceed to create the import job. From there, you will be taken to the 'Job Status' tab.

All **active jobs** will regularly sync new data from your data source with Arize. You can view the job details by clicking on the job ID, which reveals more information about the job.

<Frame caption="Job Status tab showing job listings">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/05e23085-image.jpeg" />
</Frame>

To pause, delete, or edit your file schema, click on 'Job Options'.

* **Delete** a job if it is no longer needed or if you made an error connecting to the wrong bucket. This will set your job status as 'deleted' in Arize.

* **Pause** a job if you have a set cadence to update your table. This way, you can 'start job' when you know there will be new data to reduce query costs. This will set your job status as 'inactive' in Arize.

* **Edit** a file schema if you have added, renamed, or missed a column in the original schema declaration.

## Step 8. Troubleshoot Import Job

An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.

#### Validation Errors

If there is an error validating a file against the model schema, Arize will surface an **actionable** error message. From there, click on the 'Fix Schema' button to adjust your model schema.

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/ef8d0ee8-image.jpeg" />
</Frame>

#### Dry Run File/Table Passes But The Job Fails

If your dry run is successful, but your job fails, click on the job ID to view the **job details**. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/7e073dc0-image.jpeg" />
</Frame>

Once you've identified the job failure point, fix the file errors and reupload the file to Arize with a new name.
