Skip to main content

Getting Started with the API

Welcome to the Humanloop API documentation! This page aims to get you up and running with the Humanloop API as fast as possible.

You can find detailed API reference information in the tabs at the top of this page, but this tutorial will touch only on a few of the most common tasks involving the API. The code examples are incremental and build on each other.

Below you'll see examples of the entire Humanloop lifecycle, from authenticating against the API, to obtaining predictions from an ML model, to downloading your labelled dataset at the end.

Let's get started!

Step 1: Authentication

The first thing to do is configure your Python script to communicate with the Humanloop API.

The easiest way to do this is to set the X-API-KEY header on any request you make to the API, as outlined below.

You can find your API key on the profile page when you are signed in. If you don't already have a Humanloop account, feel free to create one now.

For detailed authentication documentation, see this page.

import requests
import json

base_url = "https://api.humanloop.com"
headers = {
"Content-Type": "application/json",
"X-API-KEY": "<INSERT YOUR API KEY HERE>",
}
# use the email associated to your Humanloop account
project_owner = "<INSERT YOUR USER EMAIL HERE>"

Step 2: Creating a Dataset

Now that the client is authenticated, it's possible to create a dataset.

Datasets are an important concept in the Humanloop platform - they form the basis for labelling and model training. You can create them in the Humanloop webapp via CSV or JSON. In this API example we will use JSON.

A dataset consists of a collection of fields which hold metadata about the dataset's columns, along with a list of datapoints which have specific values for each field.

It can be helpful to include your own unique identifiers for your datapoints if available so that you can easily correlate any annotations and predictions created by Humanloop back to your system. In the below example, this is the field external_id.

For dataset creation POST endpoint details, see this page. You can also use a PUT to incrementally add data for larger datasets.

dataset_request = {
"name": "Extraction dataset",
"description": "With existing examples for review.",
"fields": [
{"name": "text", "data_type": "text"},
{"name": "sentiment", "data_type": "categorical"},
{"name": "external_id", "data_type": "text"},
{"name": "entities", "data_type": "character_offsets"},
],
"data": [
{
"text": "Hi Apple! Please can you refund this item?",
"sentiment": "neutral",
"external_id": 1233456432354,
"entities": [{"start": 3, "end": 8, "label": "ORG"}],
},
{
"text": "Amazon, I love this so much!",
"sentiment": "positive",
"external_id": 1233456432355,
"entities": [{"start": 0, "end": 6, "label": "ORG"}],
},
],
}
dataset_fields = requests.post(
url=f"{base_url}/datasets", data=json.dumps(dataset_request), headers=headers
).json()["fields"]

Step 3: Creating a Project

A Humanloop project is made up of one or more datasets, a team of annotators and a model. As your team begin to annotate the data, a model is trained in real time and used to prioritise what data your annotators should focus on next (see https://humanloop.com/blog/why-you-should-be-using-active-learning).

The project inputs specify those dataset fields you wish to show to your annotators and the model. These are what the model will eventually use to make decisions about new, unseen datapoints.

The project output specifies the type of model you wish to train and the corresponding label taxonomy. For example, in the sentiment classification example below the model outputs sentiment labels for unseen texts. Single-label classification, multilabel classification and span extraction projects are all currently supported on the platform.

If your dataset has a field with existing annotations, you can use this to warm start your project as shown in the following examples. If you want your team to first review these existing annotations in Humanloop, set "review_existing_annotations" to True, otherwise they will be used automatically to train an initial model.

You can update your project with more data by either connecting another dataset or simply adding additional data-points to your existing dataset. Alternatively, you can submit tasks for your model and/or team to complete (see our Human-in-the-loop tutorial for more information on this!).

For project creation POST endpoint details, see this page.


def get_field_id_by_name(name: str, fields):
"""Helper method for parsing field_id from dataset.fields given the name"""
return [field for field in fields if field["name"] == name][0]["id"]

classification_project_request = {
"name": "Sentiment classification project",
"inputs": [
{
"name": "text",
"data_type": "text",
"description": "Text to classify.",
"data_sources": [
{"field_id": get_field_id_by_name("text", dataset_fields)}
],
},
{
"name": "external_id",
"data_type": "text",
"description": "The ID from your system.",
"display_only": True,
"data_sources": [
{"field_id": get_field_id_by_name("external_id", dataset_fields)}
],
},
],
"outputs": [
{
"name": "sentiment",
"description": "Sentiment of the text",
"task_type": "classification",
# link existing labels from your dataset using data_sources, otherwise
# manually specify an array of labels here
"data_sources": [
{"field_id": get_field_id_by_name("sentiment", dataset_fields)}
],
}
],
"guidelines": "Insert your markdown annotator guidelines here",
"users": [project_owner],
"review_existing_annotations": False,
}

classification_project_id = requests.post(
url=f"{base_url}/projects",
data=json.dumps(classification_project_request),
headers=headers,
).json()["id"]

print(f"Navigate to https://app.humanloop.com/projects/{classification_project_id}")

Step 4: Making predictions

A pre-trained NLP model is fine-tuned on your annotated data and available within minutes of creating your project, if you have at least 10 annotated datapoints available. You can continuously improve this model by simply annotating more data; new versions of the model will train and evaluate in the background while you annotate.

For prediction POST endpoint details, see this page.

prediction_request = {
"data": [
{
"text": "The sound quality of these headphones were so disappointing",
"external_id": 1754535653,
},
{
"text": "OMG! if only my phone camera could be this good",
"external_id": 1711535653,
},
]
}

prediction_response = requests.post(
url=f"{base_url}/projects/{classification_project_id}/predict",
data=json.dumps(prediction_request),
headers=headers,
).json()

print(prediction_response)

Step 5: Downloading your labelled data

Sometimes the most useful outcome from the Humanloop platform is a labelled dataset. It's very easy to download your fully-labelled data using a code sample like the below.

For data download GET endpoint details, see this page.

data_response = requests.get(
url=f"{base_url}/projects/{classification_project_id}/labels",
headers=headers,
).json()

print(data_response)