Co-Authored by Anthony Abercrombie, AI Solutions Engineer & Lucas Moehlenbrock, AI Solutions Engineer & John Gilhuly, Developer Advocate.
Working with GraphQL can often feel overwhelming, especially when you’re navigating massive schemas with tens of thousands of lines. Writing GraphQL queries is often a time-consuming task prone to errors, making it a common pain point for developers. Recognizing this, our solutions team embarked on a recent hackathon with one clear goal: develop a smarter, easier way to generate and optimize GraphQL queries directly from natural language prompts.
Over an intense weekend, we built an AI-powered GraphQL agent capable of transforming plain language requests into fully optimized, executable GraphQL queries. Not only does this agent save significant time, but it also ensures accuracy by validating queries against complex schemas — some exceeding 75,000 tokens when introspected.
While the version we created was for Arize specifically, we are proud to debut an open source version so anyone can use it to construct GraphQL queries that are validated against a GraphQL API of their choice.
Here’s the Github repo that shows you how to plug this MCP server into Cursor or Claude Desktop.
What It Is and How to Use It
This agent lets users construct GraphQL queries that are validated against a GraphQL API of their choice. In our internal build the target API was Arize, but you can point it at any GraphQL endpoint. The only real difference when you hook it up to another service is how package‑loading works. For hands-on instructions, see the GitHub repo.

Why We Built It
The Arize GraphQL schema unifies everything AI engineering teams care about with regard to observability — LLM traces, prompt templates, token‑level usage data, and evaluator labels such as hallucination or toxicity — into one strongly‑typed graph. This breadth lets users pose nuanced questions (“Show me hallucination rate by feature slice over time”) that would be nearly impossible through a generic REST endpoint.
That richness, however, makes the schema large and complex; writing queries and mutations is a frequent pain point for customers. Our agent dynamically retrieves context from the schema graph so it can assemble valid queries and mutations on the fly.
GraphQL schemas can easily exceed 75,000 tokens, which makes stuffing the entire schema into an LLM’s context window impractical. Vector‑based RAG doesn’t help either—chunking the schema leaves the model with partial information. To solve this we taught the agent to traverse the schema graph directly, extracting only the fields and types it needs.
How We Used Arize AX In Building This Agent
We tested and instrumented the agent in Arize AX from day one. That let us:
- trace and observe the full execution path of the agent;
- track key metadata and automate evaluations at every decision point;
- experiment and iterate quickly on LLM prompts.
Hybrid instrumentation gave us exactly the metadata we needed. After mapping the agent architecture, we tagged the components we cared about and could run evaluations effortlessly. We used those evals to spot places where the agent produced an incorrect query or mutation, then added those cases as few‑shot examples in the prompts. Trace information also powered a query‑optimization loop that greatly boosted accuracy when converting natural‑language requests into GraphQL.
Getting Started Yourself
Everything described here is open‑sourced. The GitHub repo (linked above) walks you through wiring the MCP server into Cursor or Claude Desktop and covers the small package‑loading tweak you’ll need.
To add to your Cursor workspace, click here.
With that in place, you can prompt in plain language and watch the agent return fully‑validated, executable GraphQL — no more combing through tens of thousands of schema lines by hand.