Evaluate models online
In this guide, we will demonstrate how to create and use online evaluators to observe the performance of your models.
Create an online evaluator
Prerequisites
- You need to have access to the beta preview of evaluations.
- You also need to have a project created - if not, please first follow our project creation guides.
- Finally, you need at least a few datapoints in your project. Use the Editor to generate some datapoints if you don't have any yet.
To set up an online evaluator:
- Go to the evaluations tab in of your projects
- Select + New Evaluator and choose Online

The online evaluators example library.
- From the library of example evaluators we'll choose Valid JSON for this guide. You'll see a pre-populated evaluator with Python code that checks the output of our model is valid JSON grammar.

The evaluator editor.
- In the debug console at the bottom of the dialog, click Random selection. The console will be populated with five datapoints from your project.

The debug console populated with datapoints from your project.
- Click the Run button at the far right of one of the datapoint rows. After a moment, you'll see the Result column populated with a
True
orFalse
.

The Valid JSON evaluator returned True
for this particular datapoint, indicating the text output by the model was grammatically correct JSON.
- Explore the
datapoint
dictionary in the table to help understand what is available on the Python object passed to your function. - Click Create in the bottom left of the dialog.
Activate an evaluator for a project
- On the new Valid JSON Evaluation Function in the Evaluations tab, toggle the switch to on - the function is now activated for the current project.

Activating the new evaluator to run automatically on your project.
- Go to the Editor, and generate some fresh datapoints with your model.
- Over in the Data tab you'll see newly logged datapoints. The Valid JSON evaluator runs automatically on these new datapoints, and the results are displayed in the table.

The Data table includes a column for each activated evaluator in your project. Each activated evaluator runs on any new datapoints in your project.
Track the performance of models
Prerequisites
- A Humanloop project with a reasonable amount of data.
- An Evaluation Function activated in that project.
To track the performance of different model configs in your project:
- Go to the Dashboard tab.
- In the table of model configs at the bottom, choose a subset of the project's model configs.
- Use the graph controls at the top of the page to select the date range and time granularity of interest.
- Review the relative performance of the model configs for each activated evaluator shown in the graphs.
Note: Available Modules
The following Python modules are available to be imported in your evaluators:
math
random
datetime
json
(useful for validating JSON grammar as per the example above)jsonschema
(useful for more fine-grained validation of JSON output - see the in-app example)sqlglot
(useful for validating SQL query grammar)requests
(useful to make further LLM calls as part of your evaluation - see the in-app example for a suggestion of how to get started).
Updated about 1 month ago
Try out creating an evaluator that uses an LLM to asses more complex criteria.