{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Use a Hugging Face 🤗 Model\n",
"\n",
"This notebook demonstrates how to use a task agent to pre-label videos with predictions. A bounding box prediction model is used.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Requirements\n",
"\n",
"This notebook guides you through the Workflow template, Ontology and model selection required.\n",
"\n",
"For this notebook, you need: \n",
" \n",
"- A Dataset containing visual files (video, images, image groups, or image sequences)in Encord. \n",
"- Access to Hugging Face models."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installation\n",
"\n",
"Ensure that you have the `encord-agents` library installed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!python -m pip install encord-agents[vision]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Encord Authentication\n",
"\n",
"Encord uses ssh-keys for authentication. The following is a code cell for setting the `ENCORD_SSH_KEY` environment variable. It contains the raw content of your private ssh key file.\n",
"\n",
"If you have not setup an ssh key, see our [documentation](https://agents-docs.encord.com/authentication/).\n",
"\n",
"> 💡 In colab, you can set the key once in the secrets in the left sidebar and load it in new notebooks. IF YOU ARE NOT RUNNING THE CODE IN THE COLLAB NOTEBOOK, you must set the environment variable directly.\n",
"> ```python\n",
"> os.environ[\"ENCORD_SSH_KEY\"] = \"\"\"paste-private-key-here\"\"\"\n",
"> ```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import userdata\n",
"\n",
"key_contet = userdata.get(\"ENCORD_SSH_KEY\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"ENCORD_SSH_KEY\"] = key_contet\n",
"# or you can set a path to a file\n",
"# os.environ[\"ENCORD_SSH_KEY_FILE\"] = \"/path/to/your/private/key\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### [Alternative] Temporary Key\n",
"There's also the option of generating a temporary (fresh) ssh key pair via the code cell below.\n",
"Please follow the instructions printed when executing the code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ⚠️ Safe to skip if you have authenticated already\n",
"import os\n",
"\n",
"from encord_agents.utils.colab import generate_public_private_key_pair_with_instructions\n",
"\n",
"private_key_path, public_key_path = generate_public_private_key_pair_with_instructions()\n",
"os.environ[\"ENCORD_SSH_KEY_FILE\"] = private_key_path.as_posix()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define a Model for Predictions\n",
"\n",
"Define a model to predict labels, bounding boxes, and confidence scores. \n",
"\n",
"This model will identify objects in video frames by predicting their classifications, locations, and associated confidence levels. We'll use the DETR model from Hugging Face, as outlined in the following article:\n",
"https://huggingface.co/docs/transformers/en/model_doc/detr\n",
"\n",
"Other models are available from: https://huggingface.co/models\n",
"\n",
"> 💡 If you want to use a different model, such as your own model from Hugging Face, the following code blocks should be modified.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import torch\n",
"from PIL import Image\n",
"from transformers import DetrForObjectDetection, DetrImageProcessor\n",
"\n",
"url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n",
"image = Image.open(requests.get(url, stream=True).raw)\n",
"\n",
"# You can specify the revision tag if you don't want the timm dependency\n",
"processor = DetrImageProcessor.from_pretrained(\"facebook/detr-resnet-50\", revision=\"no_timm\")\n",
"model = DetrForObjectDetection.from_pretrained(\"facebook/detr-resnet-50\", revision=\"no_timm\")\n",
"\n",
"inputs = processor(images=image, return_tensors=\"pt\")\n",
"outputs = model(**inputs)\n",
"\n",
"# Convert outputs (bounding boxes and class logits) to COCO API\n",
"# Only keep detections with score > 0.9\n",
"target_sizes = torch.tensor([image.size[::-1]])\n",
"results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.9)[0]\n",
"\n",
"for score, label, box in zip(results[\"scores\"], results[\"labels\"], results[\"boxes\"]):\n",
" box = [round(i, 2) for i in box.tolist()]\n",
" print(\n",
" f\"Detected {model.config.id2label[label.item()]} with confidence \" f\"{round(score.item(), 3)} at location {box}\"\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define the Agent\n",
"\n",
"Once the model is defined it is time to define the agent."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dataclasses import dataclass\n",
"\n",
"import numpy as np\n",
"from encord.objects.coordinates import BoundingBoxCoordinates\n",
"from numpy.typing import NDArray\n",
"\n",
"\n",
"# Data class to hold predictions from our model\n",
"@dataclass\n",
"class ModelPrediction:\n",
" featureHash: str\n",
" coords: BoundingBoxCoordinates\n",
" conf: float\n",
"\n",
"\n",
"def HF_DETR_predict(image: NDArray[np.uint8]) -> list[ModelPrediction]:\n",
" inputs = processor(images=image, return_tensors=\"pt\")\n",
" outputs = model(**inputs)\n",
"\n",
" target_sizes = torch.tensor([image.shape[:2]])\n",
" results = processor.post_process_object_detection(outputs, target_sizes=target_sizes)[0]\n",
" model_predictions = []\n",
" for score, label, box in zip(results[\"scores\"], results[\"labels\"], results[\"boxes\"]):\n",
" box = [round(i, 2) for i in box.tolist()]\n",
" # We'll skip predictions with confidence < 0.8\n",
" # As this model makes a lot of predictions\n",
" if score < 0.8:\n",
" continue\n",
" print(\n",
" f\"Detected {model.config.id2label[label.item()]} with confidence \"\n",
" f\"{round(score.item(), 3)} at location {box}\"\n",
" )\n",
" if ontology_equivalent := ontology_map.get(model.config.id2label[label.item()]):\n",
" model_predictions.append(\n",
" ModelPrediction(\n",
" featureHash=ontology_equivalent,\n",
" coords=BoundingBoxCoordinates(top_left_x=box[0], top_left_y=box[1], width=box[2], height=box[3]),\n",
" conf=score.item(),\n",
" )\n",
" )\n",
" return model_predictions\n",
"\n",
"\n",
"agent = HF_DETR_predict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set Up Ontology\n",
"\n",
"Create an Ontology in Encord that matches the expected output of your pre-labeling agent. For example, if your model predicts classes `surfboard`, `person`, and `car`, then the Ontology should match the ONtology shown below. The DETR model we use can predicts more objects, but in this example our focus is on the car predictions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" \n",
" Figure 1: Project ontology.\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[📖 Here](https://docs.encord.com/platform-documentation/GettingStarted/gettingstarted-create-ontology) is the documentation for creating Ontologies."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define an Ontology Map\n",
"\n",
"We need to translate the model predictions so that they are paired against the respective Encord ontology item. This is easiest done via the featureNodeHash of the target. This can be found in the app either via the Ontology preview JSON or via using the SDK."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ontology_map = {\"car\": \"80fUMkkZ\"}\n",
"# Note the featureNodeHash as seen on the right hand side above"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Workflow with a Pre-Labeling Agent Node\n",
"\n",
"Create a Project in the Encord platform with a workflow that includes a pre-labeling agent node before the annotation stage. This node, called **\"pre-label,\"** runs custom code to generate model predictions, automatically pre-labeling tasks before they are sent for annotation.\n",
"\n",
"[📖 Here](https://docs.encord.com/platform-documentation/Annotate/annotate-projects/annotate-workflows-and-templates#creating-workflows) is the documentation for creating Workflows in Encord."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" \n",
" Figure 2: Project workflow.\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define the Pre-Labeling Agent\n",
"\n",
"The following code serves as a template for defining a pre-labeling agent. \n",
"\n",
"It assumes the project contains only videos and applies pre-labeling to all frames in each video. \n",
"\n",
"If the agent node is named **\"pre-label\"** and the pathway to the annotation stage is **\"annotate,\"** simply replace `` with your actual project hash to make it work. If using different names, update the `stage` parameter in the decorator and the returned string to match your setup. \n",
"\n",
"This code relies on the [`dep_video_iterator` dependency](../../reference/task_agents.md#encord_agents.tasks.dependencies.dep_video_iterator) to automatically load video frames as RGB numpy arrays."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import Iterable\n",
"\n",
"from encord.objects.ontology_labels_impl import LabelRowV2\n",
"from encord.project import Project\n",
"from typing_extensions import Annotated\n",
"\n",
"from encord_agents.core.data_model import Frame\n",
"from encord_agents.tasks import Depends, Runner\n",
"from encord_agents.tasks.dependencies import dep_video_iterator\n",
"\n",
"# a. Define a runner that will execute the agent on every task in the agent stage\n",
"runner = Runner(project_hash=\"\")\n",
"\n",
"\n",
"# b. Specify the logic that goes into the \"pre-label\" agent node.\n",
"@runner.stage(stage=\"pre-label\")\n",
"def pre_segment(\n",
" lr: LabelRowV2,\n",
" project: Project,\n",
" frames: Annotated[Iterable[Frame], Depends(dep_video_iterator)],\n",
") -> str:\n",
" ontology = project.ontology_structure\n",
"\n",
" # c. Loop over the frames in the video\n",
" for frame in frames: # For every frame in the video\n",
" # d. Predict - we could do batching here to speed up the process\n",
" outputs = agent(frame.content)\n",
"\n",
" # e. Store the results\n",
" for output in outputs:\n",
" ins = ontology.get_child_by_hash(output.feature_hash).create_instance()\n",
" ins.set_for_frames(frames=frame.frame, coordinates=output.coords, confidence=output.conf)\n",
"\n",
" lr.add_object_instance(ins)\n",
"\n",
" lr.save()\n",
" return \"annotate\" # Tell where the task should go"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running the Agent\n",
"\n",
"The `runner` object is callable which means that you can just call it to prioritize your tasks."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Run the agent\n",
"runner()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Outcome\n",
"\n",
"Your agent assigns labels to videos and routes them through the workflow to the annotation stage. As a result, each annotation task includes pre-labeled predictions. \n",
"\n",
"> 💡 To run this as a command-line interface, save the code in an `agents.py` file and replace: \n",
"> ```python\n",
"> runner()\n",
"> ``` \n",
"> with: \n",
"> ```python\n",
"> if __name__ == \"__main__\":\n",
"> runner.run()\n",
"> ``` \n",
"> This lets you set parameters like the project hash from the command line: \n",
"> ```bash\n",
"> python agent.py --project-hash \"...\"\n",
"> ```\n"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "encord-agents-Cw_LL1Rx-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}