Examples
GCP Examples¶
Nested frame classification with Claude 3.5 Sonnet¶
The goals of this example are:
- Create an editor agent that automatically adds frame-level classifications.
- Demonstrate how to use the
OntologyDataModel
for classifications.
Setup
To get set up, you must:
- Create a virtual python environment
- Install necessary dependencies
- Get an Anthropic API key
- Set up Encord authentication
First, create the virtual environment. Before proceeding, ensure you can authenticate with Anthropic and with Encord (see links in list above).
python -m venv venv
source venv/bin/activate
python -m pip install encord-agents anthropic
export ANTHROPIC_API_KEY="<your_api_key>"
export ENCORD_SSH_KEY_FILE="/path/to/your/private/key"
Project setup
We' are using a Project with the following Ontology:
See the ontology JSON
{
"objects": [],
"classifications": [
{
"id": "1",
"featureNodeHash": "TTkHMtuD",
"attributes": [
{
"id": "1.1",
"featureNodeHash": "+1g9I9Sg",
"type": "text",
"name": "scene summary",
"required": false,
"dynamic": false
}
]
},
{
"id": "2",
"featureNodeHash": "xGV/wCD0",
"attributes": [
{
"id": "2.1",
"featureNodeHash": "k3EVexk7",
"type": "radio",
"name": "is there a person in the frame?",
"required": false,
"options": [
{
"id": "2.1.1",
"featureNodeHash": "EkGwhcO4",
"label": "yes",
"value": "yes",
"options": [
{
"id": "2.1.1.1",
"featureNodeHash": "mj9QCDY4",
"type": "text",
"name": "What is the person doing?",
"required": false
}
]
},
{
"id": "2.1.2",
"featureNodeHash": "37rMLC/v",
"label": "no",
"value": "no",
"options": []
}
],
"dynamic": false
}
]
}
]
}
To construct the exact same ontology, you can do
import json
from encord.objects.ontology_structure import OntologyStructure
from encord_agents.core.utils import get_user_client
encord_client = get_user_client()
structure = OntologyStructure.from_dict(json.loads("{the_json_above}"))
ontology = encord_client.create_ontology(
title="Your ontology title",
structure=structure
)
print(ontology.ontology_hash)
Your Ontology can be any Ontology containing classifications. Attach your Ontology to a Project with visual content (images, image groups, or videos).
An agent that transforms a labeling task from Figure A to Figure B, as shown below must be triggered. (Hint: Click the images and use the keyboard arrows to toggle between them.)
The full code for agent.py
Let's go through the code section by section.
First, we import dependencies and set up the Project:
Info
Make sure to insert your Project's hash here.
import os
from anthropic import Anthropic
from encord.objects.ontology_labels_impl import LabelRowV2
from numpy.typing import NDArray
from typing_extensions import Annotated
from encord_agents.core.ontology import OntologyDataModel
from encord_agents.core.utils import get_user_client
from encord_agents.core.video import Frame
from encord_agents.gcp import Depends, editor_agent
from encord_agents.gcp.dependencies import FrameData, dep_single_frame
client = get_user_client()
project = client.get_project("<your_project_hash>")
Next, we create a data model and a system prompt based on the Project Ontology that will tell Claude how to structure its response:
data_model = OntologyDataModel(project.ontology_structure.classifications)
system_prompt = f"""
You're a helpful assistant that's supposed to help fill in json objects
according to this schema:
```json
{data_model.model_json_schema_str}
```
Please only respond with valid json.
"""
See the result of data_model.model_json_schema_str
for the given example
{
"$defs": {
"IsThereAPersonInTheFrameRadioModel": {
"properties": {
"feature_node_hash": {
"const": "k3EVexk7",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"k3EVexk7"
],
"title": "Feature Node Hash",
"type": "string"
},
"choice": {
"description": "Choose exactly one answer from the given options.",
"discriminator": {
"mapping": {
"37rMLC/v": "#/$defs/NoNestedRadioModel",
"EkGwhcO4": "#/$defs/YesNestedRadioModel"
},
"propertyName": "feature_node_hash"
},
"oneOf": [
{
"$ref": "#/$defs/YesNestedRadioModel"
},
{
"$ref": "#/$defs/NoNestedRadioModel"
}
],
"title": "Choice"
}
},
"required": [
"feature_node_hash",
"choice"
],
"title": "IsThereAPersonInTheFrameRadioModel",
"type": "object"
},
"NoNestedRadioModel": {
"properties": {
"feature_node_hash": {
"const": "37rMLC/v",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"37rMLC/v"
],
"title": "Feature Node Hash",
"type": "string"
},
"title": {
"const": "no",
"default": "Constant value - should be included as-is.",
"enum": [
"no"
],
"title": "Title",
"type": "string"
}
},
"required": [
"feature_node_hash"
],
"title": "NoNestedRadioModel",
"type": "object"
},
"SceneSummaryTextModel": {
"properties": {
"feature_node_hash": {
"const": "+1g9I9Sg",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"+1g9I9Sg"
],
"title": "Feature Node Hash",
"type": "string"
},
"value": {
"description": "Please describe the image as accurate as possible focusing on 'scene summary'",
"maxLength": 1000,
"minLength": 0,
"title": "Value",
"type": "string"
}
},
"required": [
"feature_node_hash",
"value"
],
"title": "SceneSummaryTextModel",
"type": "object"
},
"WhatIsThePersonDoingTextModel": {
"properties": {
"feature_node_hash": {
"const": "mj9QCDY4",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"mj9QCDY4"
],
"title": "Feature Node Hash",
"type": "string"
},
"value": {
"description": "Please describe the image as accurate as possible focusing on 'What is the person doing?'",
"maxLength": 1000,
"minLength": 0,
"title": "Value",
"type": "string"
}
},
"required": [
"feature_node_hash",
"value"
],
"title": "WhatIsThePersonDoingTextModel",
"type": "object"
},
"YesNestedRadioModel": {
"properties": {
"feature_node_hash": {
"const": "EkGwhcO4",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"EkGwhcO4"
],
"title": "Feature Node Hash",
"type": "string"
},
"what_is_the_person_doing": {
"$ref": "#/$defs/WhatIsThePersonDoingTextModel",
"description": "A text attribute with carefully crafted text to describe the property."
}
},
"required": [
"feature_node_hash",
"what_is_the_person_doing"
],
"title": "YesNestedRadioModel",
"type": "object"
}
},
"properties": {
"scene_summary": {
"$ref": "#/$defs/SceneSummaryTextModel",
"description": "A text attribute with carefully crafted text to describe the property."
},
"is_there_a_person_in_the_frame": {
"$ref": "#/$defs/IsThereAPersonInTheFrameRadioModel",
"description": "A mutually exclusive radio attribute to choose exactly one option that best matches to the give visual input."
}
},
"required": [
"scene_summary",
"is_there_a_person_in_the_frame"
],
"title": "ClassificationModel",
"type": "object"
}
We also need an Anthropic API client to communicate with Claude:
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
anthropic_client = Anthropic(api_key=ANTHROPIC_API_KEY)
Finally, we define our editor agent:
@editor_agent()
def agent(
frame_data: FrameData,
lr: LabelRowV2,
content: Annotated[NDArray, Depends(dep_single_frame)],
):
frame = Frame(frame_data.frame, content=content)
message = anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=system_prompt,
messages=[
{
"role": "user",
"content": [frame.b64_encoding(output_format="anthropic")],
}
],
)
try:
classifications = data_model(message.content[0].text)
for clf in classifications:
clf.set_for_frames(frame_data.frame, confidence=0.5, manual_annotation=False)
lr.add_classification_instance(clf)
except Exception:
import traceback
traceback.print_exc()
print(f"Response from model: {message.content[0].text}")
lr.save()
The agent:
1. Gets the frame content automatically using the dep_single_frame
dependency
2. Queries Claude with the frame image
3. Parses Claude's response into classification instances using our data model
4. Adds the classifications to the label row and saves it
Testing the Agent¶
STEP 1: Run the Agent
With the agent laid down, we can run it and test it.
In your current terminal, run:
This runs the agent in debug mode for you to test it.
STEP 2: Open a Frame in the Editor
Open your project within the Encord platform in your browser and navigate to a frame you want to classify. Copy the URL from your browser.
Hint
The url should have roughly this format: "https://app.encord.com/label_editor/{project_hash}/{data_hash}/{frame}"
.
STEP 3: Trigger the Agent
In another shell operating from the same working directory, source your virtual environment and test the agent:
If the test is successful, you are able to refresh your browser and see the classifications that Claude generated.
You are now ready to deploy your agent. Visit the deployment documentation to learn more.
Nested object classification with Claude 3.5 Sonnet¶
The goals of this example are:
- Obtain an editor agent that can convert generic object annotations (class-less coordinates) into class specific annotations with nested attributes like descriptions, radio buttons, and checklists.
- Show how you can use both the
OntologyDataModel
and thedep_object_crops
dependency.
Setup
To get setup, you need to
- Create a virtual python environment
- Install necessary dependencies
- Get an Anthropic API key
- Setup Encord authentication
First, we create the virtual environment. Before you do the following actions, make sure you have authentication with Anthropic and Encord sorted (see links in list above).
python -m venv venv
source venv/bin/activate
python -m pip install encord-agents anthropic
export ANTHROPIC_API_KEY="<your_api_key>"
export ENCORD_SSH_KEY_FILE="/path/to/your/private/key"
Project setup
We're using a project with the following ontology:
See the ontology JSON
{
"objects": [
{
"id": "1",
"name": "person",
"color": "#D33115",
"shape": "bounding_box",
"featureNodeHash": "2xlDPPAG",
"required": false,
"attributes": [
{
"id": "1.1",
"featureNodeHash": "aFCN9MMm",
"type": "text",
"name": "activity",
"required": false,
"dynamic": false
}
]
},
{
"id": "2",
"name": "animal",
"color": "#E27300",
"shape": "bounding_box",
"featureNodeHash": "3y6JxTUX",
"required": false,
"attributes": [
{
"id": "2.1",
"featureNodeHash": "2P7LTUZA",
"type": "radio",
"name": "type",
"required": false,
"options": [
{
"id": "2.1.1",
"featureNodeHash": "gJvcEeLl",
"label": "dolphin",
"value": "dolphin",
"options": []
},
{
"id": "2.1.2",
"featureNodeHash": "CxrftGS4",
"label": "monkey",
"value": "monkey",
"options": []
},
{
"id": "2.1.3",
"featureNodeHash": "OQyWm7Sm",
"label": "dog",
"value": "dog",
"options": []
},
{
"id": "2.1.4",
"featureNodeHash": "CDKmYJK/",
"label": "cat",
"value": "cat",
"options": []
}
],
"dynamic": false
},
{
"id": "2.2",
"featureNodeHash": "5fFgrM+E",
"type": "text",
"name": "description",
"required": false,
"dynamic": false
}
]
},
{
"id": "3",
"name": "vehicle",
"color": "#16406C",
"shape": "bounding_box",
"featureNodeHash": "llw7qdWW",
"required": false,
"attributes": [
{
"id": "3.1",
"featureNodeHash": "79mo1G7Q",
"type": "text",
"name": "type - short and concise",
"required": false,
"dynamic": false
},
{
"id": "3.2",
"featureNodeHash": "OFrk07Ds",
"type": "checklist",
"name": "visible",
"required": false,
"options": [
{
"id": "3.2.1",
"featureNodeHash": "KmX/HjRT",
"label": "wheels",
"value": "wheels"
},
{
"id": "3.2.2",
"featureNodeHash": "H6qbEcdj",
"label": "frame",
"value": "frame"
},
{
"id": "3.2.3",
"featureNodeHash": "gZ9OucoQ",
"label": "chain",
"value": "chain"
},
{
"id": "3.2.4",
"featureNodeHash": "cit3aZSz",
"label": "head lights",
"value": "head_lights"
},
{
"id": "3.2.5",
"featureNodeHash": "qQ3PieJ/",
"label": "tail lights",
"value": "tail_lights"
}
],
"dynamic": false
}
]
},
{
"id": "4",
"name": "generic",
"color": "#FE9200",
"shape": "bounding_box",
"featureNodeHash": "jootTFfQ",
"required": false,
"attributes": []
}
],
"classifications": []
}
`
To construct the exact same ontology, you can do
```python
import json
from encord.objects.ontology_structure import OntologyStructure
from encord_agents.core.utils import get_user_client
encord_client = get_user_client()
structure = OntologyStructure.from_dict(json.loads("{the_json_above}"))
ontology = encord_client.create_ontology(
title="Your ontology title",
structure=structure
)
print(ontology.ontology_hash)
Your Ontology can be any Ontology containing classifications, provided the object types are the same and there is one entry called "generic"
.
Attach that Ontology to a Project with visual content (images, image groups, or videos).
The goal is to be able to trigger an agent that takes a labeling task from Figure A to Figure B, below (hint: you can click them and use keyboard arrows toggle between images).
The agent
Warning
Some of the code blocks suffers from wrong indentation in this section. If you intend to copy/paste, we strongly recommend that you do it from the full code below rather than from each sub-section👇
The full code for agent.py
Create a file called "agent.py"
.
Let's begin with some simple imports and reading the project ontology.
For this, you will need to have your <project_hash>
ready.
import os
from anthropic import Anthropic
from encord.objects.ontology_labels_impl import LabelRowV2
from typing_extensions import Annotated
from encord_agents.core.ontology import OntologyDataModel
from encord_agents.core.utils import get_user_client
from encord_agents.gcp import Depends, editor_agent
from encord_agents.gcp.dependencies import FrameData, InstanceCrop, dep_object_crops
# User client
client = get_user_client()
project = client.get_project("<project_hash>")
Now that we have the project, we can extract the generic ontology object as well as that actual ontology objects that we care about.
generic_ont_obj, *other_objects = sorted(
project.ontology_structure.objects,
key=lambda o: o.title.lower() == "generic",
reverse=True,
)
The code above sorts the Ontology objects based on whether they have the title "generic"
or not.
We use the generic object to query image crops within the agent. Before doing so, we leverage other_objects
to communicate to Claude the specific information we are focusing on.
For that there is a useful class called OntologyDataModel
which understands how to translate from Encord ontology Objects
to a pydantic model and from json objects to Encord ObjectInstance
s.
Next we must prepare the system prompt to go along with every object crop.
For that, we use the data_model
from above to create the json schema.
It is worth noticing that we pass in just the other_objetcs
such that the model
is only allowed to choose between the object types that are not of the generic one.
data_model = OntologyDataModel(other_objects)
system_prompt = f"""
You're a helpful assistant that's supposed to help fill in
json objects according to this schema:
`{data_model.model_json_schema_str}`
Please only respond with valid json.
"""
See the result of data_model.model_json_schema_str
for the given example
{
"$defs": {
"ActivityTextModel": {
"properties": {
"feature_node_hash": {
"const": "aFCN9MMm",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"aFCN9MMm"
],
"title": "Feature Node Hash",
"type": "string"
},
"value": {
"description": "Please describe the image as accurate as possible focusing on 'activity'",
"maxLength": 1000,
"minLength": 0,
"title": "Value",
"type": "string"
}
},
"required": [
"feature_node_hash",
"value"
],
"title": "ActivityTextModel",
"type": "object"
},
"AnimalNestedModel": {
"properties": {
"feature_node_hash": {
"const": "3y6JxTUX",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"3y6JxTUX"
],
"title": "Feature Node Hash",
"type": "string"
},
"type": {
"$ref": "#/$defs/TypeRadioModel",
"description": "A mutually exclusive radio attribute to choose exactly one option that best matches to the give visual input."
},
"description": {
"$ref": "#/$defs/DescriptionTextModel",
"description": "A text attribute with carefully crafted text to describe the property."
}
},
"required": [
"feature_node_hash",
"type",
"description"
],
"title": "AnimalNestedModel",
"type": "object"
},
"DescriptionTextModel": {
"properties": {
"feature_node_hash": {
"const": "5fFgrM+E",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"5fFgrM+E"
],
"title": "Feature Node Hash",
"type": "string"
},
"value": {
"description": "Please describe the image as accurate as possible focusing on 'description'",
"maxLength": 1000,
"minLength": 0,
"title": "Value",
"type": "string"
}
},
"required": [
"feature_node_hash",
"value"
],
"title": "DescriptionTextModel",
"type": "object"
},
"PersonNestedModel": {
"properties": {
"feature_node_hash": {
"const": "2xlDPPAG",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"2xlDPPAG"
],
"title": "Feature Node Hash",
"type": "string"
},
"activity": {
"$ref": "#/$defs/ActivityTextModel",
"description": "A text attribute with carefully crafted text to describe the property."
}
},
"required": [
"feature_node_hash",
"activity"
],
"title": "PersonNestedModel",
"type": "object"
},
"TypeRadioEnum": {
"enum": [
"dolphin",
"monkey",
"dog",
"cat"
],
"title": "TypeRadioEnum",
"type": "string"
},
"TypeRadioModel": {
"properties": {
"feature_node_hash": {
"const": "2P7LTUZA",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"2P7LTUZA"
],
"title": "Feature Node Hash",
"type": "string"
},
"choice": {
"$ref": "#/$defs/TypeRadioEnum",
"description": "Choose exactly one answer from the given options."
}
},
"required": [
"feature_node_hash",
"choice"
],
"title": "TypeRadioModel",
"type": "object"
},
"TypeShortAndConciseTextModel": {
"properties": {
"feature_node_hash": {
"const": "79mo1G7Q",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"79mo1G7Q"
],
"title": "Feature Node Hash",
"type": "string"
},
"value": {
"description": "Please describe the image as accurate as possible focusing on 'type - short and concise'",
"maxLength": 1000,
"minLength": 0,
"title": "Value",
"type": "string"
}
},
"required": [
"feature_node_hash",
"value"
],
"title": "TypeShortAndConciseTextModel",
"type": "object"
},
"VehicleNestedModel": {
"properties": {
"feature_node_hash": {
"const": "llw7qdWW",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"llw7qdWW"
],
"title": "Feature Node Hash",
"type": "string"
},
"type__short_and_concise": {
"$ref": "#/$defs/TypeShortAndConciseTextModel",
"description": "A text attribute with carefully crafted text to describe the property."
},
"visible": {
"$ref": "#/$defs/VisibleChecklistModel",
"description": "A collection of boolean values indicating which concepts are applicable according to the image content."
}
},
"required": [
"feature_node_hash",
"type__short_and_concise",
"visible"
],
"title": "VehicleNestedModel",
"type": "object"
},
"VisibleChecklistModel": {
"properties": {
"feature_node_hash": {
"const": "OFrk07Ds",
"description": "UUID for discrimination. Must be included in json as is.",
"enum": [
"OFrk07Ds"
],
"title": "Feature Node Hash",
"type": "string"
},
"wheels": {
"description": "Is 'wheels' applicable or not?",
"title": "Wheels",
"type": "boolean"
},
"frame": {
"description": "Is 'frame' applicable or not?",
"title": "Frame",
"type": "boolean"
},
"chain": {
"description": "Is 'chain' applicable or not?",
"title": "Chain",
"type": "boolean"
},
"head_lights": {
"description": "Is 'head lights' applicable or not?",
"title": "Head Lights",
"type": "boolean"
},
"tail_lights": {
"description": "Is 'tail lights' applicable or not?",
"title": "Tail Lights",
"type": "boolean"
}
},
"required": [
"feature_node_hash",
"wheels",
"frame",
"chain",
"head_lights",
"tail_lights"
],
"title": "VisibleChecklistModel",
"type": "object"
}
},
"properties": {
"choice": {
"description": "Choose exactly one answer from the given options.",
"discriminator": {
"mapping": {
"2xlDPPAG": "#/$defs/PersonNestedModel",
"3y6JxTUX": "#/$defs/AnimalNestedModel",
"llw7qdWW": "#/$defs/VehicleNestedModel"
},
"propertyName": "feature_node_hash"
},
"oneOf": [
{
"$ref": "#/$defs/PersonNestedModel"
},
{
"$ref": "#/$defs/AnimalNestedModel"
},
{
"$ref": "#/$defs/VehicleNestedModel"
}
],
"title": "Choice"
}
},
"required": [
"choice"
],
"title": "ObjectsRadioModel",
"type": "object"
}
With the system prompt ready, we can instantiate an api client for Claude.
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
anthropic_client = Anthropic(api_key=ANTHROPIC_API_KEY)
Now, let's define the editor agent.
@editor_agent()
def agent(
frame_data: FrameData,
lr: LabelRowV2,
crops: Annotated[
list[InstanceCrop],
Depends(dep_object_crops(filter_ontology_objects=[generic_ont_obj])),
],
):
In the code above, there are two main things to stress.
- All arguments are automatically injected when this agent is called. For more details on dependency injections, please see here.
- The
dep_object_crops
dependency is a little special in that you can provide it filtering arguments. In this case, we tell it to only include object crops when the object instances are of the "generic" type. We do this because we don't want to keep on working on those that have already been converted to "actual labels."
Now, we can call Claude given the image crops.
Notice how the crop
variable has a convenient b64_encoding
method to produce an input that Claude understands.
# Query Claude
changes = False
for crop in crops:
message = anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=system_prompt,
messages=[
{
"role": "user",
"content": [crop.b64_encoding(output_format="anthropic")],
}
],
)
To parse the message from Claude, the data_model
is again useful.
When called with a JSON string, it attempts to parse it with respect to the
the JSON schema we saw above to create an Encord object instance.
If successful, the old generic object can be removed and the newly classified object added.
try:
instance = data_model(message.content[0].text)
coordinates = crop.instance.get_annotation(frame=frame_data.frame).coordinates
instance.set_for_frames(
coordinates=coordinates,
frames=frame_data.frame,
confidence=0.5,
manual_annotation=False,
)
lr.remove_object(crop.instance)
lr.add_object_instance(instance)
changes = True
except Exception:
import traceback
traceback.print_exc()
print(f"Response from model: {message.content[0].text}")
Finally, we'll save the labels with Encord.
Testing the model
STEP 1: Run the Agent
With the agent laid down, we can run it and test it.
In your current terminal, run the function:
This will run the agent in debug mode for you to test it.
STEP 2: Annotate Generic Objects
Open your Project within the Encord platform in your browser and annotate an image with some generic objects. Once done, copy the url from your browser.
Hint
The url should have the following format: "https://app.encord.com/label_editor/{project_hash}/{data_hash}/{frame}"
.
STEP 3: Trigger the Agent
In another shell, source your virtual environment and test the agent.
If the test is successful, you should be able to refresh your browser and see the result of what your agent.
Once you are ready to deploy your agent. Visit the deployment documentation to learn more.
FastAPI Examples¶
Nested frame classification with Claude 3.5 Sonnet¶
The goals of this example is to:
- Create an editor agent that can automatically fill in frame-level classifications in the Label Editor.
- Demonstrate how to use the
OntologyDataModel
for classifications. - Demonstrate how to build an agent using FastAPI that can be self-hosted.
Setup
To get set up, you must:
- Create a virtual python environment
- Install necessary dependencies
- Get an Anthropic API key
- Setup Encord authentication
First, create the virtual environment. Before proceeding, ensure you can authenticate with Anthropic and with Encord (see links in list above).
python -m venv venv
source venv/bin/activate
python -m pip install "fastapi[standard]" encord-agents anthropic
export ANTHROPIC_API_KEY="<your_api_key>"
export ENCORD_SSH_KEY_FILE="/path/to/your/private/key"
Project setup
We are using a Project with the following Ontology (same as in the GCP example):
See the ontology JSON
[Same JSON as in GCP Frame Classification example]
The goal is to trigger an agent that takes a labeling task from Figure A to Figure B, below:
The FastAPI agent
The full code for main.py
Let us go through the code section by section.
First, we import dependencies and setup the FastAPI app with CORS middleware:
import os
import numpy as np
from anthropic import Anthropic
from encord.objects.ontology_labels_impl import LabelRowV2
from fastapi import Depends, FastAPI, Form
from numpy.typing import NDArray
from typing_extensions import Annotated
from encord_agents.core.data_model import Frame
from encord_agents.core.ontology import OntologyDataModel
from encord_agents.core.utils import get_user_client
from encord_agents.fastapi.cors import EncordCORSMiddleware
from encord_agents.fastapi.dependencies import (
FrameData,
dep_label_row,
dep_single_frame,
)
# Initialize FastAPI app
app = FastAPI()
app.add_middleware(EncordCORSMiddleware)
The CORS middleware is crucial as it allows the Encord platform to make requests to your API.
Next, we set up the Project and create a data model based on the Ontology:
client = get_user_client()
project = client.get_project("<your_project_hash>")
data_model = OntologyDataModel(project.ontology_structure.classifications)
We create the system prompt that tells Claude how to structure its response:
system_prompt = f"""
You're a helpful assistant that's supposed to help fill in json objects
according to this schema:
```json
{data_model.model_json_schema_str}
```
Please only respond with valid json.
"""
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
anthropic_client = Anthropic(api_key=ANTHROPIC_API_KEY)
Finally, we define the endpoint to handle the classification:
@app.post("/frame_classification")
async def classify_frame(
frame_data: FrameData,
lr: Annotated[LabelRowV2, Depends(dep_label_row)],
content: Annotated[NDArray[np.uint8], Depends(dep_single_frame)],
):
"""Classify a frame using Claude."""
frame = Frame(frame=frame_data.frame, content=content)
message = anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=system_prompt,
messages=[
{
"role": "user",
"content": [frame.b64_encoding(output_format="anthropic")],
}
],
)
try:
classifications = data_model(message.content[0].text)
for clf in classifications:
clf.set_for_frames(frame_data.frame, confidence=0.5, manual_annotation=False)
lr.add_classification_instance(clf)
except Exception:
import traceback
traceback.print_exc()
print(f"Response from model: {message.content[0].text}")
lr.save()
The endpoint:
- Receives frame data via FastAPI's Form dependency
- Gets the label row and frame content via Encord Agents' dependencies
- Creates a Frame object with the content
- Queries Claude with the frame image
- Parses Claude's response into classification instances
- Adds the classifications to the label row and saves it
Testing the Agent¶
STEP 1: Run the FastAPI Server
With the agent laid down, we can run it and test it.
In your current terminal, run:
This runs the FastAPI server in development mode with auto-reload enabled.
STEP 2: Open a Frame in the Editor
Open your Project within the Encord platform in your browser and navigate to a frame you want to classify. Copy the URL from your browser.
Hint
The url should have the following format: "https://app.encord.com/label_editor/{project_hash}/{data_hash}/{frame}"
.
STEP 3: Trigger the Agent
In another shell operating from the same working directory, source your virtual environment and test the agent:
If the test is successful, you are be able to refresh your browser and see the classifications that Claude generated.
Nested object classification with Claude 3.5 Sonnet¶
The goals of this example are:
- Create an editor agent that can convert generic object annotations (class-less coordinates) into class specific annotations with nested attributes in the Label Editor.
- Show how you can use both the
OntologyDataModel
and thedep_object_crops
dependencies. - Demonstrate a more complex FastAPI endpoint handling object classification.
Setup
The setup is identical to the frame classification example above. You need the same environment and dependencies.
Project setup
We are using a Project with the following Ontology (same as in the GCP example):
See the ontology JSON
[Same JSON as in GCP Object Classification example]
The goal is to trigger an agent that takes a labeling task from Figure A to Figure B, below:
The FastAPI agent
The full code for main.py
Let's walk through the key components.
First, we setup the FastAPI app and CORS middleware:
import os
from anthropic import Anthropic
from encord.objects.ontology_labels_impl import LabelRowV2
from fastapi import Depends, FastAPI
from typing_extensions import Annotated
from encord_agents.core.data_model import InstanceCrop
from encord_agents.core.ontology import OntologyDataModel
from encord_agents.core.utils import get_user_client
from encord_agents.fastapi.cors import EncordCORSMiddleware
from encord_agents.fastapi.dependencies import (
FrameData,
dep_label_row,
dep_object_crops,
)
# Initialize FastAPI app
app = FastAPI()
app.add_middleware(EncordCORSMiddleware)
Then we setup the client, Project, and extract the generic Ontology object:
client = get_user_client()
project = client.get_project("d2f7665e-8767-4686-8178-0844fac37a7f")
generic_ont_obj, *other_objects = sorted(
project.ontology_structure.objects,
key=lambda o: o.title.lower() == "generic",
reverse=True,
)
We create the data model and system prompt for Claude:
data_model = OntologyDataModel(other_objects)
system_prompt = f"""
You're a helpful assistant that's supposed to help fill in
json objects according to this schema:
`{data_model.model_json_schema_str}`
Please only respond with valid json.
"""
# Claude setup
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
anthropic_client = Anthropic(api_key=ANTHROPIC_API_KEY)
Finally, we define our object classification endpoint:
@app.post("/object_classification")
async def classify_objects(
frame_data: FrameData,
lr: Annotated[LabelRowV2, Depends(dep_label_row)],
crops: Annotated[
list[InstanceCrop],
Depends(dep_object_crops(filter_ontology_objects=[generic_ont_obj])),
],
):
"""Classify generic objects using Claude."""
# Query Claude for each crop
changes = False
for crop in crops:
message = anthropic_client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=1024,
system=system_prompt,
messages=[
{
"role": "user",
"content": [crop.b64_encoding(output_format="anthropic")],
}
],
)
# Parse result
try:
instance = data_model(message.content[0].text)
coordinates = crop.instance.get_annotation(frame=frame_data.frame).coordinates
instance.set_for_frames(
coordinates=coordinates,
frames=frame_data.frame,
confidence=0.5,
manual_annotation=False,
)
lr.remove_object(crop.instance)
lr.add_object_instance(instance)
changes = True
except Exception:
import traceback
traceback.print_exc()
print(f"Response from model: {message.content[0].text}")
# Save changes
if changes:
lr.save()
The endpoint:
- Receives frame data via FastAPI's Form dependency
- Gets the label row via
dep_label_row
- Gets object crops filtered to only include "generic" objects via
dep_object_crops
- For each crop:
- Queries Claude with the cropped image
- Parses the response into an object instance
- Replaces the generic object with the classified one
- Saves the changes to the label row
Testing the Agent¶
STEP 1: Run the FastAPI Server
With the agent set up, we can run it and test it.
In your current terminal, run:
This runs the FastAPI server in development mode with auto-reload enabled.
Step 2: annotate some generic objects
Open your Project in the Encord platform in your browser and try annotating an image with some generic objects. Once you have done that, copy the URL from your browser.
Hint
The url should have roughly this format: "https://app.encord.com/label_editor/{project_hash}/{data_hash}/{frame}"
.
Step 3: trigger the agent
In another shell operating from the same working directory, source your virtual environment and test the agent:
If the test is successful, you are able to refresh your browser and see the generic objects replaced with properly classified objects including all their nested attributes.
Agent examples in the making¶
- Tightening Bounding Boxes with SAM
- Extrapolating labels with DINOv
- Triggering internal notification system
- Label assertion