Automated dataset curation

As teams collect more LLM traces, it becomes tedious to manually sift through them to curate high-quality datasets. Automated Dataset Curation lets you define rules that automatically add new examples to a dataset whenever incoming traces match your criteria. It’s a flexible way to keep datasets fresh, without manually curating examples by hand.

Curate dataset from evaluation labels

After setting up an evaluation task on a project, you can include a post-processing step that automatically adds examples to a dataset based on the evaluation label. For example, if you want to create a dataset of challenging examples where the production LLM hallucinated, you can add all the spans labeled "hallucinated" to your dataset.

Select "Auto Add Spans to Dataset", then enter your evaluation criteria to filter for the appropriate spans.

Curate dataset from filters

Alternatively, instead of using an evaluation label, you can add any example to a dataset that meets basic filter criteria, such as high token count in the LLM output, high latency, or examples where a specific tool was called.

The user calls simple heuristic filters to automatically add spans to a dataset.

Last updated

Was this helpful?