{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Use model from Hugging Face 🤗\n",
"\n",
"This notebook demonstrates how to use a task agent to pre-label videos with predictions.\n",
"Here we'll use a bounding box prediction model.\n",
"\n",
"Before we start, let's get installations and authentication out of the way.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Set up environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Please ensure that you have the `encord-agents` library installed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!python -m pip install encord-agents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Authentication\n",
"\n",
"The library authenticates via ssh-keys. Below, is a code cell for setting the `ENCORD_SSH_KEY` environment variable. It should contain the raw content of your private ssh key file.\n",
"\n",
"If you have not yet setup an ssh key, please follow the [documentation](https://agents-docs.encord.com/authentication/).\n",
"\n",
"> 💡 **Colab users**: In colab, you can set the key once in the secrets in the left sidebar and load it in new notebooks with\n",
"> ```python\n",
"> from google.colab import userdata\n",
"> key_content = userdata.get(\"ENCORD_SSH_KEY\")\n",
"> ```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import userdata\n",
"\n",
"key_contet = userdata.get(\"ENCORD_SSH_KEY\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"ENCORD_SSH_KEY\"] = key_contet\n",
"# or you can set a path to a file\n",
"# os.environ[\"ENCORD_SSH_KEY_FILE\"] = \"/path/to/your/private/key\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### [Alternative] Temporary Key\n",
"There's also the option of generating a temporary (fresh) ssh key pair via the code cell below.\n",
"Please follow the instructions printed when executing the code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ⚠️ Safe to skip if you have authenticated already\n",
"import os\n",
"\n",
"from encord_agents.utils.colab import generate_public_private_key_pair_with_instructions\n",
"\n",
"private_key_path, public_key_path = generate_public_private_key_pair_with_instructions()\n",
"os.environ[\"ENCORD_SSH_KEY_FILE\"] = private_key_path.as_posix()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Define a model for predictions\n",
"\n",
"We will define a model which predicts labels, bounding boxes, and confidences.\n",
"We'll use the model to predict objects on frames from videos below.\n",
"\n",
"> 💡 Hint: If you wish, to use an alternate model or your own model from HF, here is the place you'll modify\n",
"\n",
"We'll use the DETR model from Hugging Face as described in this article:\n",
"https://huggingface.co/docs/transformers/en/model_doc/detr\n",
"\n",
"Other models are available from: https://huggingface.co/models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import torch\n",
"from PIL import Image\n",
"from transformers import DetrForObjectDetection, DetrImageProcessor\n",
"\n",
"url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n",
"image = Image.open(requests.get(url, stream=True).raw)\n",
"\n",
"# you can specify the revision tag if you don't want the timm dependency\n",
"processor = DetrImageProcessor.from_pretrained(\"facebook/detr-resnet-50\", revision=\"no_timm\")\n",
"model = DetrForObjectDetection.from_pretrained(\"facebook/detr-resnet-50\", revision=\"no_timm\")\n",
"\n",
"inputs = processor(images=image, return_tensors=\"pt\")\n",
"outputs = model(**inputs)\n",
"\n",
"# convert outputs (bounding boxes and class logits) to COCO API\n",
"# let's only keep detections with score > 0.9\n",
"target_sizes = torch.tensor([image.size[::-1]])\n",
"results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]\n",
"\n",
"for score, label, box in zip(results[\"scores\"], results[\"labels\"], results[\"boxes\"]):\n",
" box = [round(i, 2) for i in box.tolist()]\n",
" print(\n",
" f\"Detected {model.config.id2label[label.item()]} with confidence \" f\"{round(score.item(), 3)} at location {box}\"\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Define the agent\n",
"\n",
"We've defined the model and now we want to define the agent. Think about some long-lived mechanism of using the agent for pre-labeling in this scenario"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dataclasses import dataclass\n",
"\n",
"import numpy as np\n",
"from encord.objects.coordinates import BoundingBoxCoordinates\n",
"from numpy.typing import NDArray\n",
"\n",
"\n",
"# Data class to hold predictions from our model\n",
"@dataclass\n",
"class ModelPrediction:\n",
" featureHash: str\n",
" coords: BoundingBoxCoordinates\n",
" conf: float\n",
"\n",
"\n",
"def HF_DETR_predict(image: NDArray[np.uint8]) -> list[ModelPrediction]:\n",
" inputs = processor(images=image, return_tensors=\"pt\")\n",
" outputs = model(**inputs)\n",
"\n",
" target_sizes = torch.tensor([image.shape[:2]])\n",
" results = processor.post_process_object_detection(outputs, target_sizes=target_sizes)[0]\n",
" model_predictions = []\n",
" for score, label, box in zip(results[\"scores\"], results[\"labels\"], results[\"boxes\"]):\n",
" box = [round(i, 2) for i in box.tolist()]\n",
" # We'll skip predictions with confidence < 0.8\n",
" # As this model makes a lot of predictions\n",
" if score < 0.8:\n",
" continue\n",
" print(\n",
" f\"Detected {model.config.id2label[label.item()]} with confidence \"\n",
" f\"{round(score.item(), 3)} at location {box}\"\n",
" )\n",
" if ontology_equivalent := ontology_map.get(model.config.id2label[label.item()]):\n",
" model_predictions.append(\n",
" ModelPrediction(\n",
" featureHash=ontology_equivalent,\n",
" coords=BoundingBoxCoordinates(top_left_x=box[0], top_left_y=box[1], width=box[2], height=box[3]),\n",
" conf=score.item(),\n",
" )\n",
" )\n",
" return model_predictions\n",
"\n",
"\n",
"agent = HF_DETR_predict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Set up your Ontology\n",
"\n",
"Create an Ontology that matches the expected output of your pre-labeling agent.\n",
"For example, if your model predicts classes `surfboard`, `person`, and `car`, then the ontology should look like this: Our DETR model predicts more objects but we'll focus on the car predictions in this example"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" \n",
" Figure 1: Project ontology.\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[📖 here](https://docs.encord.com/platform-documentation/GettingStarted/gettingstarted-create-ontology) is the documentation for creating ontologies."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.1 Define an Ontology Map\n",
"We need to translate the model predictions so that they are paired against the respective Encord ontology item. This is easiest done via the featureNodeHash of the target. This can be found in the app either via the Ontology preview JSON or via using the SDK."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"data:image/s3,"s3://crabby-images/b3832/b3832603ea46ff7b91de20893944389fabf706fe" alt="Screenshot 2025-01-27 at 16.54.25.png""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ontology_map = {\"car\": \"80fUMkkZ\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Create a Workflow with a pre-labeling agent node\n",
"\n",
"Create a project in the Encord platform that has a Workflow that includes a pre-labeling agent node before the annotation stage to automatically pre-label tasks with model predictions.\n",
"This node is where we'll hook in our custom code to pre-label the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" \n",
" Figure 2: Project workflow.\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice how the workflow has a purple Agent node called \"pre-label.\"\n",
"This node will allow our custom code to run inference over the data before passing it on to the annotation stage.\n",
"\n",
"[📖 here](https://docs.encord.com/platform-documentation/Annotate/annotate-projects/annotate-workflows-and-templates#creating-workflows) is the documentation for creating a workflow with Encord."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Define the pre-labelling agent\n",
"\n",
"The following code provides a template for defining an agent that does pre-labeling.\n",
"We assume that the project only contains videos and the we want to do pre-labeling on all frames in each video.\n",
"\n",
"If your agent node is named \"pre-label\" and the pathway to the annotation stage is named \"annotate,\" you will only have to change the `` to your actual project hash to make it work.\n",
"Is your naming, on the other hand, different, then you can update the `stage` parameter of the decorator and the returned string, respectively, to comply with your own setup.\n",
"\n",
"Note that this code uses the [`dep_video_iterator` dependency](../../reference/task_agents.md#encord_agents.tasks.dependencies.dep_video_iterator) to automatically load an iterator of frames as RGB numpy arrays from the video."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import Iterable\n",
"\n",
"from encord.objects.ontology_labels_impl import LabelRowV2\n",
"from encord.project import Project\n",
"from typing_extensions import Annotated\n",
"\n",
"from encord_agents.core.data_model import Frame\n",
"from encord_agents.tasks import Depends, Runner\n",
"from encord_agents.tasks.dependencies import dep_video_iterator\n",
"\n",
"# a. Define a runner that will execute the agent on every task in the agent stage\n",
"runner = Runner(project_hash=\"\")\n",
"\n",
"\n",
"# b. Specify the logic that goes into the \"pre-label\" agent node.\n",
"@runner.stage(stage=\"pre-label\")\n",
"def pre_segment(\n",
" lr: LabelRowV2,\n",
" project: Project,\n",
" frames: Annotated[Iterable[Frame], Depends(dep_video_iterator)],\n",
") -> str:\n",
" ontology = project.ontology_structure\n",
"\n",
" # c. Loop over the frames in the video\n",
" for frame in frames: # For every frame in the video\n",
" # d. Predict - we could do batching here to speed up the process\n",
" outputs = agent(frame.content)\n",
"\n",
" # e. Store the results\n",
" for output in outputs:\n",
" ins = ontology.get_child_by_hash(output.feature_hash).create_instance()\n",
" ins.set_for_frames(frames=frame.frame, coordinates=output.coords, confidence=output.conf)\n",
"\n",
" lr.add_object_instance(ins)\n",
"\n",
" lr.save()\n",
" return \"annotate\" # Tell where the task should go"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running the agent\n",
"Now that we've defined the project, workflow, and the agent, it's time to try it out.\n",
"The `runner` object is callable which means that you can just call it to prioritize your tasks."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Run the agent\n",
"runner()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your agent now assigns labels to the videos and routes them appropriately through the Workflow to the annotation stage.\n",
"As a result, every annotation task should already have pre-existing labels (predictions) included.\n",
"\n",
"> 💡*Hint:* If you execute this as a Python script, you can run it as a command line interface by putting the above code in an `agents.py` file and replacing\n",
"> ```python\n",
"> runner()\n",
"> ```\n",
"> with\n",
"> ```python\n",
"> if __name__ == \"__main__\":\n",
"> runner.run()\n",
"> ```\n",
"> Which allows you to set, for example the Project hash using the command line:\n",
"> ```bash\n",
"> python agent.py --project-hash \"...\"\n",
"> ```\n"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "encord-agents-Cw_LL1Rx-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}