> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Google Cloud Storage (GCS)

> Set up an import job to ingest data into Arize from GCS

<Info>
  If you prefer to use Terraform, jump to [Applying Bucket Policy & Tag via Terraform](/ax/machine-learning/machine-learning/integrations-ml/gcs-example#applying-bucket-policy-and-tag-via-terraform)
</Info>

Set up an import job to log inference files to Arize. Updates to files are checked every 10 seconds.

There is a tradeoff between file size and ingestion; an efficient balance between the two are around a few hundred thousand to a million rows in each file.

Please reach out to [support@arize.com](mailto:support@arize.com) for guidance and a prescribed optimal file size + structure.

## **Step 1. Select Google Storage**

Navigate to the 'Upload Data' page on the left navigation bar in the Arize platform. From there, select the 'GCS' card to begin **a new file import job.**

<Frame caption="Step 1: Pick GCS">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/8183588d-image.jpeg" />
</Frame>

## **Step 2. Specify File Path**

**Get File Path In GCS:**

Within your project, navigate to the folder you wish to ingest and click on a file to easily copy your file path.

<Frame caption="Copy File Path From GCS">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/Claud.gif" />
</Frame>

**In Arize UI:**

Paste the file path where you would like Arize to pull your model's inferences. Arize will automatically infer your bucket name and prefix (based on the folder).

**Note:** The file path must start with `gs://`

<Frame caption="Example File Path In Arize UI">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/Claud1.gif" />
</Frame>

In this simple example, your GCS path is `gs://docs-example-bucket/production/` that contains parquet files of your model inferences.

<Info>
  The file structure can take into consideration various model environments (training, production, etc) and locations of ground truth. In addition, GCS bucket import allows recursive operations. This means that it will include all nested subdirectories within the specified bucket prefix, regardless of the number or depth of these directories.
</Info>

**Example 1: Predictions & Actuals Stored in Separate Folders (different prefixes)**

This example contains model predictions and actuals in **separate files**. This helps in cases of delayed actuals. Learn more [here](/ax/machine-learning/machine-learning/how-to-ml/upload-data-to-arize/sending-data-faq#delayed-actuals-q-and-a).

```php theme={null}
gs://bucket1/click-thru-rate/production/
├── prediction-folder/
│   ├── 12-1-2022.parquet #this file can contain multiple versions
│   ├── 12-2-2022.parquet
│   ├── 12-3-2022.parquet
├── actuals-folder/
│   ├── 12-1-2022.parquet
│   ├── 12-2-2022.parquet
│   └── 12-3-2022.parquet
```

**Example 2: Production & Training Stored in Separate Folders**

This example separates **model environments** *(production and training).*

```bash theme={null}
gs://bucket1/click-thru-rate/v1/
├── production-folder/
│   ├── 12-1-2022.parquet
│   ├── 12-2-2022.parquet
│   ├── 12-3-2022.parquet
├── training-folder/
│   ├── 12-1-2022.parquet
│   ├── 12-2-2022.parquet
│   └── 12-3-2022.parquet
```

##

<Info>
  Arize supports up to three layers of wildcards
</Info>

## **Step 3. Add GCS Project ID**

The GCS Project ID is a unique identifier for a project. See [GCS Docs](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects) for steps on how to retrieve this ID.

<Frame caption="Example Project ID In Arize UI">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/054fba0f-image.jpeg" />
</Frame>

## **Step 4. Add Proof of Ownership To Your GCS Bucket**

Tag your GCS bucket with the key `arize-ingestion-key` and the provided tag value. For more details, see docs on [Using Bucket Labels](https://cloud.google.com/storage/docs/using-bucket-labels).

**1) In Arize UI:** Copy `arize-ingestion-key` value

<Frame caption="Copy Arize Ingestion Key From Arize UI">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/575b67d2-image.jpeg" />
</Frame>

**2) In Google Cloud console**: Navigate to Cloud Storage

Here, you will see a list of your buckets. Find the bucket matching the bucket name set in your job (step 2), select the button for more options, and update the **label** of the bucket to include the arize-ingestion-key.

* Key: **arize-ingestion-key**

* Value: arize-ingestion-key ***value*** from the Arize UI

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/b06a1a2e-image.jpeg" />
</Frame>

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/4abb3494-image.jpeg" />
</Frame>

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/eb90ec47-image.jpeg" />
</Frame>

## **Step 5. Grant Arize Access Privileges**

Create a **custom role** and copy the command from the Custom IAM Role field in Arize UI.

<Frame caption="Copy Custom IAM Role From Arize UI">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/470ab368-image.jpeg" />
</Frame>

Paste and run the above [gcloud](https://cloud.google.com/iam/docs/creating-custom-roles#iam-custom-roles-create-gcloud) commands in Google Cloud Shell. Be sure to set **--project** to your project id.

<Frame caption="Start the Google Cloud Shell">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/bd9bb3e5-image.jpeg" />
</Frame>

<Frame caption="Create the IAM Role">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/0d447c7c-image.jpeg" />
</Frame>

```csharp theme={null}
gcloud iam roles create arizeFileImporter --project=<YOUR PROJECT> --title="Arize FileImporter" --description="Custom IAM role for Arize FileImporter" --permissions=storage.buckets.get,storage.objects.get,storage.objects.list --stage=ALPHA
```

**Grant Arize access to the custom role**

Copy the command from the Apply IAM Permission Field in the Arize UI.

<Frame caption="Copy IAM Permission From Arize UI">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/45055ce9-image.jpeg" />
</Frame>

Paste and run the above [gsutil](https://cloud.google.com/storage/docs/gsutil/commands/iam) command in the Google Cloud Shell. Be sure to update your project id in the service account path.

<Frame caption="Apply the IAM Permissions">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/91f072f9-image.jpeg" />
</Frame>

```swift theme={null}
gsutil iam ch serviceAccount:fileimporter@production-269901.iam.gserviceaccount.com:projects/YOUR-PROJECT/roles/arizeFileImporter gs://gcs-fileimporter-demo
```

## **Step 6a. Define Your Model Schema**

Model schema parameters are a way of organizing model inference data to ingest to Arize. When configuring your schema, be sure to match your data column headers with the model schema.

Either **use a form** or a simple **JSON-based schema** to specify the column mapping.

Arize supports **CSV, Parquet, Avro**, and **Apache Arrow.** Refer [here](/ax/machine-learning/machine-learning/how-to-ml/upload-data-to-arize/sending-data-faq#what-are-the-expected-data-types-for-my-file-type) for a list of the expected data types by input type.

<Frame caption="Set up model configurations">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/ce247e5e-image.jpeg" />
</Frame>

<Frame caption="Map your file using form inputs">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/00de9283-image.jpeg" />
</Frame>

<Frame caption="Map your file using a JSON schema">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/c07d5ae7-image.jpeg" />
</Frame>

Learn more about Schema fields [here](https://arize.com/docs/ax/machine-learning/machine-learning/concepts-ml/model-schema-reference).

## 6b. Validate Your Model Schema

Once you fill in your applicable predictions, actuals, and model inputs, click 'Validate Schema' to visualize your model schema in the Arize UI. Check that your column names and corresponding data match for a successful import job.

<Frame caption="File Preview in UI">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/f9c9843d-image.jpeg" />
</Frame>

Learn more about Schema fields [here](https://arize.com/docs/ax/machine-learning/machine-learning/concepts-ml/model-schema-reference).

Once finished, your import job will be created and will start polling your bucket for files.

<Info>
  If your model receives delayed actuals, connect your predictions and actuals using the same **prediction ID**, which links your data together in the Arize platform. Arize regularly checks your data source for both predictions and actuals, and ingests them separately as they become available. Learn more [here](/ax/machine-learning/machine-learning/how-to-ml/upload-data-to-arize/sending-data-faq#delayed-actuals-q-and-a).
</Info>

## **Step 7. Check Job Status**

Arize will attempt a dry run to validate your job for any access, schema, or record-level errors. If the dry run is successful, you can proceed to create the import job. From there, you will be taken to the 'Job Status' tab.

All active jobs will regularly sync new data from your data source with Arize. You can view the job details by clicking on the job ID, which reveals more information about the job.

<Frame caption="Job Status tab showing job listings">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/05e23085-image.jpeg" />
</Frame>

To pause, delete, or edit your file schema, click on 'Job Options'.

* **Delete** a job if it is no longer needed or if you made an error connecting to the wrong bucket. This will set your job status as 'deleted' in Arize.

* **Pause** a job if you have a set cadence to update your table. This way, you can 'start job' when you know there will be new data to reduce query costs. This will set your job status as 'inactive' in Arize.

* **Edit** a file schema if you have added, renamed, or missed a column in the original schema declaration.

## Step 8. Troubleshoot Import Job

An import job may run into a few problems. Use the dry run and job details UI to troubleshoot and quickly resolve data ingestion issues.

#### Validation Errors

If there is an error validating a file the model schema, Arize will surface an **actionable** error message. From there, click on the 'Fix Schema' button to adjust your model schema.

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/c4fd00b6-image.jpeg" />
</Frame>

#### Dry Run File/Table Passes But The Job Fails

If your dry run is successful, but your job fails, click on the job ID to view the **job details**. This uncovers job details such as information about the file path or query id, the last import job, potential errors, and error locations.

<Frame caption="">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/d55514b9-image.jpeg" />
</Frame>

Once you've identified the job failure point, fix the file errors and reupload the file to Arize with a new name.

## Applying Bucket Policy & Tag via Terraform

```java theme={null}
resource "google_storage_bucket" "arize-example-bucket" {
  // (optional) uniform_bucket_level_access = true
  name           = "arize-example-bucket"
  project        = google_project.development.project_id
  labels         = {
    "arize-ingestion-key" = "value_from_arize_ui"
  }
}

resource "google_project_iam_custom_role" "arize-example-bucket" {
  description = "permission to view storage bucket, and view and list objects"
  permissions = [
    "storage.buckets.get",
    "storage.objects.get",
    "storage.objects.list"
  ]
  project = google_project.development.project_id
  role_id = "FileImporterViewer"
  title   = "File Importer Viewer"
  stage   = "ALPHA"
}

resource "google_storage_bucket_iam_binding" "arize-example-bucket-iam-binding" {
  bucket = google_storage_bucket.arize-example-bucket.name
  role = "projects/<PROJECT_ID>/roles/FileImporterViewer"
  members = [
    "serviceAccount:fileimporter@production-269901.iam.gserviceaccount.com",
  ]
}
```

<Info>
  An alternative to applying new IAM permissions via a binding ([`google_storage_bucket_iam_binding`](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_bucket_iam#google_storage_bucket_iam_binding)), which is limited to the current role and could be lost in complex terraform deployments, is to modify the actual policy ([`google_storage_bucket_iam_policy`](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_bucket_iam#google_storage_bucket_iam_policy)). This is the most authoritative method of configuration and can be utilized for large enterprise deployments of Google Cloud services.

  More details: [https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage\_bucket\_iam](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_bucket_iam)

  **Note**: If you elect to manage IAM privileges for the Arize user via a policy configuration, verify that all configurations are correct as it is a powerful enough configuration that users can mistakenly lock themselves out of their own projects.
</Info>
