Evaluate models online

In this guide, we will demonstrate how to create and use online evaluators to observe the performance of your models.

Create an online evaluator

Prerequisites

  1. You need to have access to the beta preview of evaluations.
  2. You also need to have a project created - if not, please first follow our project creation guides.
  3. Finally, you need at least a few datapoints in your project. Use the Editor to generate some datapoints if you don't have any yet.

To set up an online evaluator:

  1. Go to the evaluations tab in of your projects
  2. Select + New Evaluator and choose Online
The Evaluation Function example library.

The online evaluators example library.

  1. From the library of example evaluators we'll choose Valid JSON for this guide. You'll see a pre-populated evaluator with Python code that checks the output of our model is valid JSON grammar.
The Evaluation Function editor.

The evaluator editor.

  1. In the debug console at the bottom of the dialog, click Random selection. The console will be populated with five datapoints from your project.
The debug console populated with datapoints from your project.

The debug console populated with datapoints from your project.

  1. Click the Run button at the far right of one of the datapoint rows. After a moment, you'll see the Result column populated with a True or False.
The **Valid JSON** Evaluation Function returned `True` for this particular datapoint, indicating the text output by the model was grammatically correct JSON.

The Valid JSON evaluator returned True for this particular datapoint, indicating the text output by the model was grammatically correct JSON.

  1. Explore the datapoint dictionary in the table to help understand what is available on the Python object passed to your function.
  2. Click Create in the bottom left of the dialog.

Activate an evaluator for a project

  1. On the new Valid JSON Evaluation Function in the Evaluations tab, toggle the switch to on - the function is now activated for the current project.
Activating the new Evaluation Function to run automatically on your project.

Activating the new evaluator to run automatically on your project.

  1. Go to the Editor, and generate some fresh datapoints with your model.
  2. Over in the Data tab you'll see newly logged datapoints. The Valid JSON evaluator runs automatically on these new datapoints, and the results are displayed in the table.
The **Data** table includes a column for each activated Evaluation Function in your project. Each new datapoint has these Evaluation Functions run on it automatically.

The Data table includes a column for each activated evaluator in your project. Each activated evaluator runs on any new datapoints in your project.

Track the performance of models

Prerequisites

  1. A Humanloop project with a reasonable amount of data.
  2. An Evaluation Function activated in that project.

To track the performance of different model configs in your project:

  1. Go to the Dashboard tab.
  2. In the table of model configs at the bottom, choose a subset of the project's model configs.
  3. Use the graph controls at the top of the page to select the date range and time granularity of interest.
  4. Review the relative performance of the model configs for each activated evaluator shown in the graphs.

Note: Available Modules

The following Python modules are available to be imported in your evaluators:

  • math
  • random
  • datetime
  • json (useful for validating JSON grammar as per the example above)
  • jsonschema (useful for more fine-grained validation of JSON output - see the in-app example)
  • sqlglot (useful for validating SQL query grammar)
  • requests (useful to make further LLM calls as part of your evaluation - see the in-app example for a suggestion of how to get started).

What’s Next

Try out creating an evaluator that uses an LLM to asses more complex criteria.