March 2024 - Arize AX Docs

What’s New

March 28, 2024

Pre-Joined Evals in Arize

Arize now supports LLM assisted evals that have been generated by the Arize Phoenix evals package. Use evals to determine the performance of your LLM application across dimensions such as Hallucination, Toxicity, QA Correctness and more. Evals can also be run on a job and sent to Arize on a regular cadence. See our docs here to get started with Evals in Arize, with more releases coming to Evals soon.

GPT-4V(ision) Integration in Prompt Playground

March 18, 2024 Arize now offers multi-modal support with GPT-4V allowing users to pass an image as part of the request to OpenAI.

Custom LLM Endpoint Support in Prompt Playground

Connect custom or third-party Large Language Models seamlessly. Test and compare different LLMs to identify optimal configurations. Learn more → Note: This feature is gated - please contact support@arize.com for access. Endpoint must conform to OpenAI ChatCompletion or Completions format.

Enhancements

March 28, 2024

deleteData Endpoint

This update allows users to self-serve data deletion through GraphQL. Learn more →

Area Under the Curve (AUC) as a Custom Metric

We now support AUC in custom metrics. Learn more →

Python SDK v7.12.0

Users can now send evals and spans together via the log_spans method of the Arize Pandas Client
On-prem users can pass a path to certificate files or disable the TLS verification.

Learn about Python SDK fixes and improvements here.

📚 New Content

LLM Observability Certification: Search & Retrieval Course
LLM Benchmarks & Retrieval for RAG Systems
Numeric Evals: Why You Should Not Use for LLM-As-a-Judge
Klick Health: Q&A on Healthcare LLM Use Cases
Ragas: How To Evaluate and Analyze Your RAG Pipeline
Needle in a Haystack LLM: New Research
RAG Evaluation: How-To Troubleshoot LLMs and Retrieval-Augmented Generation with Retrieval and Response Metrics
Phi 2
Mistral’s 8x7b
RAG vs Fine Tuning
Sora AI from OpenAI
Tutorial: Everything You Need to Set Up a SQL Router Query Engine for Text-To-SQL
LLM Task Evaluations vs Model Evals
Anthropic Claude 3: Performance and Review
Cerebral Valley on “How Arize Is Expanding the Field of AI Observability”
Paper Read: Reinforcement Learning In An Era of LLMs

​What’s New

​Pre-Joined Evals in Arize

​GPT-4V(ision) Integration in Prompt Playground

​Custom LLM Endpoint Support in Prompt Playground

​Enhancements

​deleteData Endpoint

​Area Under the Curve (AUC) as a Custom Metric

​Python SDK v7.12.0

​📚 New Content