Generating synthetic datasets can be very useful when testing and refining your agent or LLM application. This is especiallly true when real-world data is limited, sensitive, or hard to collect. By guiding an LLM to generate structured examples, you can quickly create datasets that cover omplex multi-step cases and edge cases like typos or out-of-scope queries. This tutorial covers different strategies for dataset generation and show how they can be used to run experiments and test evaluators. Specifically, it outlines how to: generate synthetic benchmark datasets to test evaluator accuracy and coverage; use few-shot examples to guide LLM generation for more consistent outputs; create agent-specific datasets that cover happy paths, edge cases, and adversarial scenarios; and upload datasets to Phoenix and run experiments to validate your evaluators.