{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pre-label video with Mask-RCNN\n", "\n", "This notebook demonstrates how to use a task agent to pre-label videos with predictions.\n", "We will use the off-the-shelf model MaskRNN in this case.\n", "\n", "Before we start, let's get installations and authentication out of the way.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Set up environment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installation\n", "\n", "Please ensure that you have the `encord-agents` library installed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python -m pip install encord-agents\n", "# If you don't have torch installed (Colab does by default)\n", "# Please install it by following the guide here: https://pytorch.org/get-started/locally/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Authentication\n", "\n", "The library authenticates via ssh-keys. Below, is a code cell for setting the `ENCORD_SSH_KEY` environment variable. It should contain the raw content of your private ssh key file.\n", "\n", "If you have not yet setup an ssh key, please follow the [documentation](https://agents-docs.encord.com/authentication/).\n", "\n", "> 💡 **Colab users**: In colab, you can set the key once in the secrets in the left sidebar and load it in new notebooks with\n", "> ```python\n", "> from google.colab import userdata\n", "> key_content = userdata.get(\"ENCORD_SSH_KEY\")\n", "> ```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.environ[\"ENCORD_SSH_KEY\"] = \"private_key_file_content\"\n", "# or you can set a path to a file\n", "# os.environ[\"ENCORD_SSH_KEY_FILE\"] = \"/path/to/your/private/key\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### [Alternative] Temporary Key\n", "There's also the option of generating a temporary (fresh) ssh key pair via the code cell below.\n", "Please follow the instructions printed when executing the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ⚠️ Safe to skip if you have authenticated already\n", "import os\n", "\n", "from encord_agents.utils.colab import generate_public_private_key_pair_with_instructions\n", "\n", "private_key_path, public_key_path = generate_public_private_key_pair_with_instructions()\n", "os.environ[\"ENCORD_SSH_KEY_FILE\"] = private_key_path.as_posix()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Load mask-RCNN\n", "\n", "Let's load the Mask-RCNN model and it's image transform such that we can use it for predictions.\n", "\n", "Below, we load the model and it's image transform." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torchvision\n", "import torchvision.models.detection\n", "from torchvision.models.detection.faster_rcnn import FastRCNNPredictor\n", "from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor\n", "from torchvision.transforms import v2 as T\n", "\n", "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", "\n", "\n", "def get_transform():\n", " return T.Compose([T.ToImage(), T.ToDtype(torch.float, scale=True), T.ToPureTensor()])\n", "\n", "\n", "def get_model_instance_segmentation():\n", " model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights=\"DEFAULT\")\n", " model = model.eval().to(device)\n", " transform = get_transform()\n", " return model, transform\n", "\n", "\n", "model, transform = get_model_instance_segmentation()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's define some utility functions to\n", "\n", "1. Convert the raw tensors from Mask-RNN to the encord bitmask coordinates\n", "2. Apply non-maximum suppression (to avoid having many overlapping predictions)\n", "3. Convert the raw tensors to Encord `ObjectInstance`s." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from encord.objects import Object as OntologyObject\n", "from encord.objects import ObjectInstance\n", "from encord.objects.bitmask import BitmaskCoordinates\n", "from encord.ontology import OntologyStructure\n", "from torchvision.ops import nms\n", "\n", "\n", "def to_mask_coordinates(torch_mask: torch.Tensor, threshold: float = 0.5) -> BitmaskCoordinates:\n", " \"\"\"\n", " Convert torch mask to bitmask coordinates.\n", "\n", " args:\n", " - threshold: threshold at which to cut the mask floating point values. Higher values will yield smaller masks.\n", "\n", " returns:\n", " Encord bitmask\n", " \"\"\"\n", " binary_mask = (torch_mask > threshold).detach().cpu().numpy().squeeze().astype(bool)\n", " return BitmaskCoordinates(binary_mask)\n", "\n", "\n", "def apply_nms(pred, nms_iou_threshold: float):\n", " \"\"\"\n", " Apply non-maximum suppression to the mask-rcnn predictions.\n", "\n", " The method retains the bounding boxes to make it easy to modify the code\n", " to also work for bounding boxes.\n", " \"\"\"\n", " indices = nms(pred[\"boxes\"], pred[\"scores\"], nms_iou_threshold)\n", " return {\n", " \"masks\": pred[\"masks\"][indices],\n", " \"boxes\": pred[\"boxes\"][indices],\n", " \"labels\": pred[\"labels\"][indices],\n", " \"scores\": pred[\"scores\"][indices],\n", " }\n", "\n", "\n", "def convert_predictions_to_encord(\n", " predictions: dict[str, torch.Tensor],\n", " ontology_map: dict[int, OntologyObject],\n", " frame_idx: int = 0,\n", " conf_threshold: float = 0.50,\n", " nms_iou_threshold: float = 0.3,\n", ") -> list[ObjectInstance]:\n", " \"\"\"\n", " Convert mask-rcnn prediction to Encord object instances.\n", "\n", " Intended use in pseudo code:\n", "\n", " ```\n", " preds = model(img)\n", " instances = convert_predictions_to_encord(preds)\n", " [label_row.add_object_instance(ins) for ins in instances]\n", " ```\n", "\n", " Args:\n", " - predictions: The output of mask-rcnn for one frame.\n", " - ontology_map: The map between predicted labels and the Encord ontology objects.\n", " - frame_idx: The frame number to associate the prediction with.\n", " This is particularly important for videos.\n", " - conf_threshold: The threshold at which we want to retain predictions.\n", " - nms_iou_threshold: The threshold that we wich to select above during nms.\n", "\n", " Returns:\n", " - The resulting object instanesl.\n", " \"\"\"\n", "\n", " # Apply non-maximum suppression\n", " if nms_iou_threshold > 0:\n", " predictions = apply_nms(predictions, nms_iou_threshold)\n", "\n", " out: list[ObjectInstance] = []\n", " for mask, label, conf in zip(predictions[\"masks\"], predictions[\"labels\"], predictions[\"scores\"]):\n", " if label.item() not in ontology_map or conf < conf_threshold:\n", " continue\n", "\n", " if ont_obj := ontology_map.get(label.item()):\n", " ins = ont_obj.create_instance()\n", " ins.set_for_frames(\n", " frames=frame_idx,\n", " coordinates=to_mask_coordinates(mask),\n", " confidence=conf.item(),\n", " )\n", " out.append(ins)\n", " return out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let us put this to use in an agent.\n", "In order to do so, we need i) a project ontology which has classes overlapping with the MaskRCNN classes and ii) a project workflow which allows hooking in a pre-labeling agent." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Set up your Ontology\n", "\n", "Create an ontology with __BITMASK__ objects named by some of the following classes (those from COCO).\n", "\n", "```\n", "coco_class_names = [\n", " 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',\n", " 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',\n", " 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',\n", " 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',\n", " 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',\n", " 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',\n", " 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',\n", " 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',\n", " 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table',\n", " 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',\n", " 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book',\n", " 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'\n", "]\n", "```\n", "\n", "Below is an example:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "
Figure 1: Project ontology.
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code below will match these against the right coco indices and use the pre-trained model to fill in labels according to this ontology.\n", "\n", "[📖 Here](https://docs.encord.com/platform-documentation/GettingStarted/gettingstarted-create-ontology) is the documentation for creating ontologies." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Create a Workflow with a pre-labeling agent node\n", "\n", "Create a project in the Encord platform that has a Workflow that includes a pre-labeling agent node before the annotation stage to automatically pre-label tasks with model predictions.\n", "This node is where we'll hook in Mask-RCNN e to pre-label the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "
Figure 2: Project workflow.
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the workflow has a purple Agent node called \"pre-label.\"\n", "This node will allow our custom code to run inference over the data before passing it on to the annotation stage.\n", "\n", "[📖 Here](https://docs.encord.com/platform-documentation/Annotate/annotate-projects/annotate-workflows-and-templates#creating-workflows) is the documentation for creating a workflow with Encord." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5: Define the pre-labelling agent\n", "\n", "The following code provides a template for defining an agent that does pre-labeling.\n", "We assume that the project only contains videos and the we want to do pre-labeling on all frames in each video.\n", "\n", "You will have to update the three identifiers: \n", "\n", "- ``: The project hash of the project that you wish to apply the agent to.\n", "- ``: The workflow stage name (or uuid) that you want to run inference via.\n", "- ``: The pathway the the task should follow upon prediction.\n", "\n", "\n", "Note that this code uses the [`dep_video_iterator` dependency](../../reference/task_agents.md#encord_agents.tasks.dependencies.dep_video_iterator) to automatically load an iterator of frames as RGB numpy arrays from the video.\n", "\n", "> 💡 Hint: If you want to only predict, e.g., on the first frame, concider using `from encord_agents.tasks.depencencies import dep_single_frame` instead." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from typing import Iterable\n", "\n", "from encord.objects.ontology_labels_impl import LabelRowV2\n", "from encord.project import Project\n", "from typing_extensions import Annotated\n", "\n", "from encord_agents.core.data_model import Frame\n", "from encord_agents.tasks import Depends, Runner\n", "from encord_agents.tasks.dependencies import dep_video_iterator\n", "\n", "BATCH_SIZE = 10\n", "\n", "# a. Define a runner that will execute the agent on every task in the agent stage\n", "runner = Runner(project_hash=\"\")\n", "\n", "# b. Define ontology map and prepare prediction function\n", "coco_class_names = [\n", " \"__background__\",\n", " \"person\",\n", " \"bicycle\",\n", " \"car\",\n", " \"motorcycle\",\n", " \"airplane\",\n", " \"bus\",\n", " \"train\",\n", " \"truck\",\n", " \"boat\",\n", " \"traffic light\",\n", " \"fire hydrant\",\n", " \"N/A\",\n", " \"stop sign\",\n", " \"parking meter\",\n", " \"bench\",\n", " \"bird\",\n", " \"cat\",\n", " \"dog\",\n", " \"horse\",\n", " \"sheep\",\n", " \"cow\",\n", " \"elephant\",\n", " \"bear\",\n", " \"zebra\",\n", " \"giraffe\",\n", " \"N/A\",\n", " \"backpack\",\n", " \"umbrella\",\n", " \"N/A\",\n", " \"N/A\",\n", " \"handbag\",\n", " \"tie\",\n", " \"suitcase\",\n", " \"frisbee\",\n", " \"skis\",\n", " \"snowboard\",\n", " \"sports ball\",\n", " \"kite\",\n", " \"baseball bat\",\n", " \"baseball glove\",\n", " \"skateboard\",\n", " \"surfboard\",\n", " \"tennis racket\",\n", " \"bottle\",\n", " \"N/A\",\n", " \"wine glass\",\n", " \"cup\",\n", " \"fork\",\n", " \"knife\",\n", " \"spoon\",\n", " \"bowl\",\n", " \"banana\",\n", " \"apple\",\n", " \"sandwich\",\n", " \"orange\",\n", " \"broccoli\",\n", " \"carrot\",\n", " \"hot dog\",\n", " \"pizza\",\n", " \"donut\",\n", " \"cake\",\n", " \"chair\",\n", " \"couch\",\n", " \"potted plant\",\n", " \"bed\",\n", " \"N/A\",\n", " \"dining table\",\n", " \"N/A\",\n", " \"N/A\",\n", " \"toilet\",\n", " \"N/A\",\n", " \"tv\",\n", " \"laptop\",\n", " \"mouse\",\n", " \"remote\",\n", " \"keyboard\",\n", " \"cell phone\",\n", " \"microwave\",\n", " \"oven\",\n", " \"toaster\",\n", " \"sink\",\n", " \"refrigerator\",\n", " \"N/A\",\n", " \"book\",\n", " \"clock\",\n", " \"vase\",\n", " \"scissors\",\n", " \"teddy bear\",\n", " \"hair drier\",\n", " \"toothbrush\",\n", "]\n", "ont_map = {coco_class_names.index(o.name): o for o in runner.project.ontology_structure.objects}\n", "\n", "\n", "# c. Define batch predict function\n", "@torch.inference_mode()\n", "def predict_batch(label_row: LabelRowV2, batch: list[Frame]) -> None:\n", " \"\"\"\n", " Utility to predict across a batch and store predictions on label row.\n", " \"\"\"\n", " input = list(map(lambda i: transform(i.content).to(device), batch))\n", " predictions = model(input)\n", "\n", " for frame, pred in zip(batch, predictions):\n", " for ins in convert_predictions_to_encord(pred, ont_map, frame.frame):\n", " label_row.add_object_instance(ins)\n", "\n", "\n", "# d. Specify the logic that goes into the \"pre-label\" agent node.\n", "@runner.stage(stage=\"\")\n", "def run_something(\n", " lr: LabelRowV2,\n", " frames: Annotated[Iterable[Frame], Depends(dep_video_iterator)],\n", ") -> str:\n", " batch: list[Frame] = []\n", " for frame in frames:\n", " # Collect batch\n", " batch.append(frame)\n", "\n", " # Inference on full batch\n", " if len(batch) == BATCH_SIZE:\n", " predict_batch(lr, batch)\n", " batch = []\n", "\n", " # Inference on last \"half\" batch\n", " if batch:\n", " predict_batch(lr, batch)\n", "\n", " lr.save()\n", " return \"\" # Tell where the task should go" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running the agent\n", "Now that we've defined the project, workflow, and the agent, it's time to try it out.\n", "The `runner` object is callable which means that you can just call it to prioritize your tasks." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run the agent\n", "# After 5 label updates, tasks will be moved in workflow queue.\n", "runner(task_batch_size=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your agent now assigns labels to the videos and routes them appropriately through the Workflow to the annotation stage.\n", "As a result, every annotation task should already have pre-existing labels (predictions) included.\n", "\n", "> 💡*Hint:* If you execute this as a Python script, you can run it as a command line interface by putting the above code in an `agents.py` file and replacing\n", "> ```python\n", "> runner()\n", "> ```\n", "> with\n", "> ```python\n", "> if __name__ == \"__main__\":\n", "> runner.run()\n", "> ```\n", "> Which allows you to set, for example the Project hash using the command line:\n", "> ```bash\n", "> python agent.py --project-hash \"...\"\n", "> ```\n" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 0 }