> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Sync traces with Data Fabric

> Seamlessly sync traces to external data sources

<Info>
  Data Fabric is currently on waitlist - reach out to Arize Support to get started.
</Info>

## What is Data Fabric?

Data Fabric automatically syncs production trace data, evaluations, and annotations to your cloud data warehouse every 60 minutes in Iceberg format—giving you a single, always-current source of truth. Having access to the raw trace data enables teams to leverage this data for analytics and custom workflows.

<Frame>
  ![](https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/data-fabric-1.avif)
</Frame>

This is a very powerful feature of [adb - the Arize GenAI native datastore](https://arize.com/adb/). Learn more about adb:

* [Introducing adb](https://arize.com/blog/introducing-adb-arizes-proprietary-olap-database/)
* [Realtime Ingestion at Scale](https://arize.com/blog/adb-database-realtime-ingestion/)
* [adb Benchmarks](https://arize.com/blog/adb-benchmarks/)

## Why is this better than a standard data export?

* **No lock-in:** Your data is always available to you. You can move it to data warehouse of your choice so you can use it in other tools you already use.
* **Single source of truth, always:** No need to maintain separate copies or manage export jobs
* **Automatic updates:** Your data, up-to-date. Any updates via evaluations or annotations—even on months-old data—are captured in the next sync regardless of timestamp. Data syncs every 60 minutes.
* **Query-ready format:** Data is stored in Iceberg format for direct querying in BigQuery, Snowflake, and other data warehouses
* **Time-partitioned:** Leverages Hive standard storage for efficient time-based queries

## How does Data Fabric work?

* Data Fabric is only enabled for Enterprise accounts. Reach out to Arize Support ([support@arize.com](mailto:support@arize.com)) if you'd like to trial access.
* To set up Data Fabric, you must have write permissions to your target cloud storage bucket and at least one tracing project in your space
* Connectors are a connection to a filepath within your bucket. When you create a connector, you'll be able to specify a bucket and namespace, as well as projects to sync. You can add any number of projects or create one connector per project. Each connector must have a unique filepath.
* Once you've created your connectors and added your projects, your data will sync automatically every 60 minutes. This includes updates to historical data that may have changed.
* Project syncs can be paused, resumed or deleted as needed.
* **Supported Blobstore Providers:** Google Cloud Storage (GCS) and Amazon S3
  * Coming Soon: Azure Storage
* **Supported Big Table Providers**: BigQuery (GCS), Databricks Unity Catalog
  * Coming Soon: Snowflake

## Setting up Data Fabric

### Step 1: Create a Data Connector

1. Navigate to **Settings > Data Fabric** in your space
2. Click **New Connector**
3. Fill out the basic connector information:
   1. **Connector Name:** A descriptive name for your connector
   2. **Select Projects:** Choose which tracing projects to sync. You can modify these projects later

### Step 2: Configure Cloud Storage

1. **Select Data Storage:** Choose your cloud storage provider.
2. **File Path:** Enter your bucket path in the format shown below.

<Tabs>
  <Tab title="Google Cloud Storage">
    Enter your GCS path: `my-data-bucket/arize-sync/production`
  </Tab>

  <Tab title="Amazon S3">
    Enter your S3 path: `my-bucket/optional-prefix`
  </Tab>
</Tabs>

### Step 3: Set Up Permissions

<Tabs>
  <Tab title="Google Cloud Storage">
    1. **Label Your Bucket:** In the GCS bucket, set a bucket **label** with a key of `arize-ingestion-key` and the corresponding value copied from the setup dialog. This proves ownership of the bucket.

    * Key: `arize-ingestion-key`
    * Value: See setup dialog

    2. **Create IAM Role:** Run the provided command to create a custom IAM role.

    ```bash theme={null}
    gcloud iam roles create arizeDataFabric \
        --project=YOUR_PROJECT_ID \
        --title="Arize Data Fabric Role" \
        --description="Custom IAM role for Arize Data Fabric" \
        --permissions=storage.buckets.get,storage.objects.get,storage.objects.list,storage.objects.create,storage.objects.update,storage.objects.delete \
        --stage=ALPHA
    ```

    3. **Apply IAM Permissions:** Grant the IAM role permission to your bucket.

    ```bash theme={null}
    gsutil iam ch serviceAccount:arize-data-fabric@production-269901.iam.gserviceaccount.com:projects/<YOUR_PROJECT_ID>/roles/arizeDataFabric <YOUR_FILEPATH>
    ```
  </Tab>

  <Tab title="Amazon S3">
    1. **Tag Your Bucket:** In the S3 bucket, add a bucket **tag** with a key of `arize-ingestion-key` and the corresponding value copied from the setup dialog. This proves ownership of the bucket.

    * Key: `arize-ingestion-key`
    * Value: See setup dialog

    You can add the tag via the AWS Console or CLI:

    ```bash theme={null}
    aws s3api put-bucket-tagging \
        --bucket YOUR_BUCKET_NAME \
        --tagging 'TagSet=[{Key=arize-ingestion-key,Value=YOUR_VALUE_FROM_SETUP_DIALOG}]'
    ```

    2. **Create IAM Policy:** Create an IAM policy that grants the Arize Data Fabric service role access to your bucket.

    ```json theme={null}
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "s3:GetObject",
            "s3:PutObject",
            "s3:DeleteObject",
            "s3:ListBucket",
            "s3:GetBucketLocation",
            "s3:GetBucketTagging"
          ],
          "Resource": [
            "arn:aws:s3:::YOUR_BUCKET_NAME",
            "arn:aws:s3:::YOUR_BUCKET_NAME/*"
          ]
        }
      ]
    }
    ```

    3. **Apply IAM Permissions:** Attach the policy to the Arize Data Fabric IAM role. The role ARN is provided in the setup dialog.

    ```bash theme={null}
    aws iam put-role-policy \
        --role-name arize-data-fabric \
        --policy-name arize-data-fabric-s3-access \
        --policy-document file://policy.json
    ```

    <Info>
      The exact Arize IAM role ARN is provided in the Data Fabric setup dialog. Use this ARN when configuring trust policies or cross-account access.
    </Info>
  </Tab>
</Tabs>

### Step 4: Validate and Start Sync

1. **Validate:** Click Validate to verify your configuration
2. **Start Syncing:** Once validated, click Start Job to begin syncing. Your first sync will begin immediately and then continue every 60 minutes

### Step 5: Set Up Query Tables

1. **Allow the initial sync to complete:** Allow the first sync to complete. Sync time depends on data size and shape, and may vary.
2. **Create Table:** Once your data is syncing, create external tables to query the data directly. For each project being synced, create an external table using Iceberg or Delta format:

<Tabs>
  <Tab title="BigQuery">
    ```sql theme={null}
    CREATE EXTERNAL TABLE `your-project.your-dataset.your-table`
    OPTIONS (
       format = 'ICEBERG',
       uris = ['gs://your-bucket/path/namespace/project-name/metadata/latest.metadata.json']
    );
    ```
  </Tab>

  <Tab title="Databricks">
    Once the first sync finishes, follow the standard Unity Catalog pattern for governing external S3 data: create a storage credential, create an external location that uses it, then register the Iceberg or Delta table at that location.

    1. **Create a storage credential** that wraps the IAM role with access to the Arize sync bucket. See the Databricks [<u>storage credentials docs</u>](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-storage-credentials) for the full reference.
       ```sql theme={null}
       CREATE STORAGE CREDENTIAL arize_data_fabric_cred
       WITH AWS_IAM_ROLE 'arn:aws:iam::123456789012:role/arize-data-fabric'
       COMMENT 'Credential for the Arize Data Fabric S3 sync bucket';
       ```
    2. **Create an external location** that binds the credential to the Arize sync prefix in your bucket.
       ```sql theme={null}
       CREATE EXTERNAL LOCATION arize_traces_loc
       URL 's3://my-bucket/arize-sync/production'
       WITH (CREDENTIAL arize_data_fabric_cred)
       COMMENT 'Arize Data Fabric trace sync output';
       ```
    3. **Register the external table** at the Iceberg metadata file under that location. 
       ```sql theme={null}
       CREATE TABLE my_catalog.arize_traces.agent_project
       USING DELTA
       LOCATION 's3://my-bucket/arize-sync/production/namespace/agent_project_a/metadata/latest.metadata.json';
       ```

    Arize provides dual compatibility for both Delta and Iceberg formats without disrupting your existing Databricks pipelines. Unity Catalog seamlessly takes care of the rest, enforcing permissions, tracking end-to-end lineage, and maintaining audit logs across both formats.\\
  </Tab>

  <Tab title="Snowflake / Athena">
    <Info>
      Support for Snowflake and Athena is coming soon. Reach out to Arize Support for early access.
    </Info>
  </Tab>
</Tabs>

## Frequently Asked Questions

* **How often does data sync?** Data syncs every 60 minutes automatically.
* **Can I sync multiple spaces to the same bucket?** Yes, you can configure multiple connectors to write to the same bucket using different namespaces or prefixes. For stronger access isolation, we recommend using separate buckets or prefixes per space — see [Access Control Best Practices](/ax/security-and-settings/data-fabric/access-control) for details.
* **What happens if I delete a project that's being synced?** The sync will stop for that project, but existing data in your storage will remain.
* **Can I change the sync frequency?**\
  Currently, the sync frequency is fixed at 60 minutes and cannot be customized. Customization is coming soon.\\
* **Is historical data included in the sync?** Yes, all historical data is included in the initial sync, and any changes to historical data between syncs will be included in the next sync.
* **What's the difference between Data Fabric and manual exports?** Data Fabric provides automatic, continuous syncing with evaluations and annotations, while exports are manual snapshots at a point in time.
