Labeling Queues

Sometimes human annotations are needed to capture details that automated evaluations might overlook. Labeling queues make this possible by letting subject matter experts or third-party annotators review and score data against criteria you define. You can then use these human insights to build golden datasets for fine-tuning or to uncover where human and LLM evaluations diverge.

What you need to use Labeling queues is:

  1. A dataset you want to annotate

  2. Annotator users in your space

  3. Annotation criteria

Inviting an Annotator

In the settings page, you can invite your annotators by adding them as users with the account role as Annotator. They will receive an email to be added to your space and set their password.

Creating a Labeling Queue

Once you’ve created a dataset of traces to evaluate, you can set up a labeling queue under the Labeling Queues tab and assign it to your annotation team.

When creating the queue, you can define detailed instructions, select the appropriate annotation configs, choose the assignment method, and specify which team members should receive the task.

The columns that annotators label will appear on datasets as name spaced annotationcolumns (i.e. annotation.hallucination). The latest annotation value for a specific row will be namespaced with latest.userannotation, which can be helpful to use for experiments if you have multiple annotators labeling a dataset.

Labeling data as an annotator

Under the Labeling Queues tab, annotators see the labeling queues they have been assigned. This includes the data they need to annotate, along with the label or score they need to provide. Your datasets can contain text, images, and links.

Annotators can review each record and add their annotations directly within the interface. Progress for each labeling queue is tracked and displayed next to the assigned queue, making it easy to monitor completion.

Last updated

Was this helpful?