Hosting Editor Agents with Modal¶
Modal provides a serverless cloud for engineers and researchers who want to build compute-intensive applications without thinking about infrastructure. The cloud service is well suited for building and hosting editor agents. On this page, we will show you end-to-end how to set up an editor agent via Modal.
The example, will put a bitmask "ball" in the center of the current image in the label editor. However, you can look at the Examples to get more inspiration in terms of what you want your editor agent to do.
Info
Everything within the FastAPI examples will be possible to do with Modal, as Modal is a wrapper around FastAPI.
While this page is not intended to explain everything that Modal does -- as it does a lot --, it should be self-contained and enough to get you to a working example. The page will follow these steps:
- Setting up authentication
- Inspect the Modal template
- Defining the agent logic
- Testing the agent
- Deploying the agent
👀 For a complete code example, please see here.
Setting up authentication¶
You need to authenticate with Encord first. Once you have a private ssh key (preferably corresponding to a service account), you should also ensure that you have signed up for Modal.
Now you can configure the secret:
- Go to [https://modal.com/secrets][]
- Click "Create new secret"
- Choose the "Custom" option
- Name it
encord-ssh-key
(you can choose the name but it needs to match the name in the code below) - Add an environment variable names
ENCORD_SSH_KEY
with the content of your private ssh key file. Similar to the figure below.
This setup will allow encord-agents
to authenticate with Encord via the provided key.
Inspecting the modal template¶
Next, let's have a look at a template for creating an endpoint that can be used for defining an Editor Agent.
Do make sure that you have installed encord-agents
and modal
:
Under the hood, Modal uses FastAPI which means that everything that is available in encord_agents.fastapi
will also be usable within Modal web_endpoint
s.
Let us have a look:
from typing_extensions import Annotated
from fastapi import Depends
from encord_agents.fastapi.dependencies import LabelRowV2, dep_label_row
from encord_agents import FrameData
import modal
# 1. Define the docker image that will run the code.
image = (
modal.Image.debian_slim(python_version="3.12")
.apt_install(
"libgl1",
"libglib2.0-0",
).pip_install(
"fastapi[standard]",
"encord-agents",
)
)
# 2. Define the app that will hold the endpoint
app = modal.App(name="encord-agents-add-ball", image=image)
# 3. Define the endpoint with access to the secret that we set up.
@app.function(secrets=[modal.Secret.from_name("encord-ssh-key")])
@modal.web_endpoint(method="POST")
def put_ball_on_image(
frame_data: FrameData,
label_row: Annotated[LabelRowV2, Depends(dep_label_row)],
):
# 4. The agent logic goes here
# This is really a FastAPI endpoint ✨
...
In the template code above, we have
- Defined a docker image that contain the required dependencies (
encord-agents
andmodal
+libgl
which is required byopencv
). Modular automatically sets up a container with these dependencies. - Defined an app to contain the endpoints that we define.
- An actual endpoint definition, which has access to the ssh key.
encord-agents
will know how to access that key (since it's namesENCORD_SSH_KEY
) and use it for authenticating theencord
SDK) - A place to define our agent logic. There are multiple examples here on what could go in that function.
The magical part is that since this is a FastAPI route, we can, e.g., use all the dependencies defined in encord_agents.fastapi.dependencies
to make our lives easier.
Next, we'll define some concrete logic to go into the agent.
Defining the agent logic¶
In this example, we'll add a circular bitmask in the middle of current frame. However, it could easily be running your own model (even on the GPU).
In the function definition of our route, we'll add the following code:
@app.function(secrets=[modal.Secret.from_name("encord-ssh-key")])
@modal.web_endpoint(method="POST")
def put_ball_on_image(
frame_data: FrameData,
label_row: Annotated[LabelRowV2, Depends(dep_label_row)],
):
# 1. Find the first bitmask ontology object
# Exit if no one is found
bitmask_object = next(
(o for o in label_row.ontology_structure.objects if o.shape == Shape.BITMASK),
None,
)
bitmask_object = next((o for o in label_row.ontology_structure.objects if o.shape == Shape.BITMASK), None)
if not bitmask_object:
return
# 2. Ensure we can read image width/height
# Otherwise, it's probably not a vision asset
w, h = label_row.width, label_row.height
if w is None or h is None:
return
# 3. Construct bitmask with ball in the center
cw, ch = w//2, h//2
radius = min(cw, ch) // 2
mask = np.zeros((h, w), dtype=np.uint8)
cv2.circle(mask, (cw, ch), radius, 255, -1) # type: ignore
# 4. Add it as a label
ins = bitmask_object.create_instance()
ins.set_for_frames(
frames=frame_data.frame,
confidence=0.8,
coordinates=BitmaskCoordinates(mask > 0)
)
label_row.add_object_instance(ins)
# 5. Save update
label_row.save()
The code performs the following steps:
- It ensures that the project ontology has a bitmask object. If it does not, there's nothing to do here
- It ensures that we're looking at a visual asset (an image, image group, or video)
- It constructs a bitmask with the appropriate shape and fills in a sphere
- It adds the bitmask to the label row
- Saves the labels
Upon completion, the encord_agents
library will send a 200 response to the label editor, causing the editor to refresh it's state.
The result should be a sphere in the middle of the image.
Testing the agent¶
To test the agent, there is a dedicated CLI tool.
You will have to first run the agent locally (with your python environment sourced).
Assuming that the python file you created is named example.py
, you can run:
The command should print a url similar to this
└── 🔨 Created web function put_ball_on_image =>
https://<your_project>-put-ball-on-image-dev.modal.run
Let's call it <agent_url>
.
Now, go to the Encord Label Editor in your browser, viewing an asset for which you would like to test your agent.
Copy the url; let's call it <editor_url>
.
It should follow this pattern:
With the endpoint running, you can from another terminal window run:
Where you replace the <agent_url>
and the <editor_url>
with the relevant urls.
You should see a green box detaining the request and the response.
If the status code is 200, refresh your browser to see the new bitmask in the Label Editor.
Deploying the agent¶
Once you're done developing your agent, it's very easy to deploy the agent. Simply run:
This will deploy the agent more permanently. Copy the displayed url and configure it in the Encord platform by following the documentation.
Afterwards, you should be able to right-click the frame in the Label Editor to trigger the agent.