Introducing Arize Copilot

Published July 11, 2024

If you used Microsoft Office in the early days, you probably remember Clippy. Clippy was an animated paper clip and go-to assistant for all things Microsoft Office. It provided users with tips, help, and shortcuts for using Microsoft Office applications. Well, it got us thinking…what if there was a Clippy but AI…for AI?

Introducing Arize AI Copilot, the first AI Assistant for AI. Copilot stands out by providing an intelligent, integrated solution for model and application improvement. It reduces manual effort, accelerates troubleshooting, and offers advanced tools for LLM development and data curation, making it an invaluable assistant for data scientists and AI engineers.

Elevate Your AI Workflow with Arize Copilot

Arize Copilot revolutionizes your workflow by integrating traditional processes and automating complex tasks. Copilot surfaces relevant information and suggests actions, reducing the need for multiple steps and manual effort.

Versatile Skill Set: Copilot offers high-level model insights, data quality analysis, and LLM-specific functionalities like evaluation summarization and retrieval process troubleshooting.
Advanced LLM Development: Copilot identifies issues and patterns in your evaluation results, suggesting pre-built or custom evaluations.
Prompt Optimization: In the prompt playground, Copilot optimizes your prompts based on specific concerns or evaluation data.
Powerful Data Curation: Use Copilot’s AI Search to curate data with natural language queries combined with traditional filters.

Explore these powerful features with Arize Copilot and transform the way you develop and optimize your models and application.

Unleashing the Power of Copilot: Workflows and Specifics

Get Model Insights

Have you ever run into an issue in production and felt overwhelmed by where to start? There are so many different factors that can cause an issue. Getting to the root of the problem can be painful, frustrating, and sometimes a complete time suck. With Copilot, that’s no longer the case. Copilot has skills that allow you to easily identify issues so you can efficiently manage your model’s performance and take action quickly. Simply ask Copilot for insights, and it will provide you with a high-level analysis of your model’s performance metrics, including trends over time, prediction volumes, and prediction drift. Once you have your high-level insights, you can start diving in with our many other debugging tools designed to help you isolate issues without taking a ton of manual steps.

ai copilot get model insights

Prompt Optimization

A lot of LLM application development revolves around getting your prompt just right so that your application behaves as expected. The process involves a ton of back-and-forth testing, iterating, and observing how your changes affect the outcome. It’s exhausting. Instead of implementing numerous manual changes, what if the AI optimized itself? That’s where Copilot comes in. Copilot is the ultimate prompt optimization tool. Simply prompt Copilot with your goals or concerns, and it will look at a sample of data and optimize the prompt to address those goals or concerns. You can iterate with Copilot, adding more criteria as you go until you are happy with the provided template. Then, take that to our playground, where you can test the prompt on your chosen dataset to observe the results. No more iterating in code, running manual tests, and consulting notebooks or docs to understand if your changes were successful or introduced a regression. Copilot takes the lift off of having to get the best prompt and best responses, while Arize Prompt Playground provides the perfect testing infrastructure.

prompt optimization ai assistant copilot llm
Pro tip: The prompt optimization flow works great on eval templates too 😉

Build a Custom Eval

One of the most challenging aspects of building an LLM application is assessing performance. Human annotation is costly and time-consuming, and even user feedback can be sparse. For this reason, using an LLM Judge has become a popular technique to evaluate LLM applications.

If you’ve decided to use an LLM as a judge but don’t know where to start with defining the evaluation criteria, Copilot eliminates this concern. Copilot can suggest one of our pre-built Phoenix templates for you. Sometimes these pre-built templates aren’t suitable for specialized tasks, but no worries. We’ve built a Custom Eval builder to help you create a custom evaluation for your task. Simply specify your goal or let Copilot analyze your data and make suggestions. Copilot will do the rest, creating tailored evaluations for your application.

build a custom eval

AI Search

Have you ever found yourself completely overwhelmed by your data, struggling to find the exact piece of information you need? Whether you’re developing a cutting-edge LLM application or refining a traditional machine learning model, data search and curation can often feel like finding a needle in a haystack. This is where the power of Copilot’s AI Search comes into play.

Imagine you’re working on an LLM application that generates customer service responses. One day, you notice a spike in negative feedback and need to understand why. With traditional methods, you’d manually sift through countless records, trying to identify patterns or specific instances of “angry responses.” This process is not only time-consuming but also prone to human error.

Now, picture having an intelligent assistant that allows you to search and curate your data using natural language queries. You simply ask Copilot to find “angry responses,” and it quickly identifies relevant traces, bringing those critical data points to the forefront. This not only saves you valuable time but also ensures you’re working with the most pertinent data, leading to more accurate and effective solutions.

The AI Search feature empowers users to search and curate their data effortlessly, using natural language queries. Combined with traditional filters, it enables seamless data management, ensuring you can quickly locate and utilize relevant data for your models.

copilot ai search

Step Into the Future of AI Development with Arize Copilot

Arize Copilot is more than just a tool—it’s your partner in development. By integrating advanced AI capabilities, Copilot not only simplifies your daily tasks but also propels you towards exceptional levels of efficiency and insight. Ready to transform how you work with AI? Join us now on the Arize platform. Start exploring with Copilot today!

📚View Documentation

For more assistance, feel free to contact our support team at support@arize.com or join our community.

Additional Recent Releases

Of course, Copilot is just one update among many premiering onstage today. Recent updates to the Arize platform include:

Datasets: Datasets are collections of examples that are used to run experiments, label, or evaluate the performance of an application. Users can easily add selected examples to a new or existing dataset.
Experiments: In AI development, it’s hard to understand how a change will affect performance. This breaks the dev flow, making iteration more guesswork than engineering. With Experiments, users can apply a change such as a prompt template change, retrieval approach change, or even LLM change, and apply it across a dataset for evaluation prior to deploying the change into production.
Tasks: Online LLM Evals: As your application scales and the number of production logs increases, it can be cumbersome to manage manually manage your data. Tasks let you create and run automated actions on your LLM spans. Users can now set up Tasks to automate actions on data, with Online LLM Evaluations (continuous evals) being the first supported task type. Look out for new Task templates in the coming weeks.
LLM Guardrails: Guardrails for LLMs ensure real-time safety, context management, compliance, and user experience. Guardrails provide immediate correction of inappropriate content, and can be applied to either user input messages (e.g. jailbreak attempts) or LLM output messages (e.g. answer relevance). If a message in a LLM chat fails a Guard, then the Guard will take a corrective action, either providing a default response to the user or prompting the LLM to generate a new response.
Prompt Playground: Prompt Playground has gotten a redesign to allow for better experimentation and customization of prompt templates and variables. Users can now chain together a series of system and user messages to test the chatbot on a specific example, adding a list of input Variables in {mustache} notation and specifying their values in the Variables column.

And more!