{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pre-label video with Mask-RCNN\n", "\n", "This notebook demonstrates how to use a task agent to pre-label videos with predictions.\n", "We will use the off-the-shelf model MaskRNN in this case.\n", "\n", "Before we start, let's get installations and authentication out of the way.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Set up environment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installation\n", "\n", "Please ensure that you have the `encord-agents` library installed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python -m pip install encord-agents\n", "# If you don't have torch installed (Colab does by default)\n", "# Please install it by following the guide here: https://pytorch.org/get-started/locally/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Authentication\n", "\n", "The library authenticates via ssh-keys. Below, is a code cell for setting the `ENCORD_SSH_KEY` environment variable. It should contain the raw content of your private ssh key file.\n", "\n", "If you have not yet setup an ssh key, please follow the [documentation](https://agents-docs.encord.com/authentication/).\n", "\n", "> 💡 **Colab users**: In colab, you can set the key once in the secrets in the left sidebar and load it in new notebooks with\n", "> ```python\n", "> from google.colab import userdata\n", "> key_content = userdata.get(\"ENCORD_SSH_KEY\")\n", "> ```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.environ[\"ENCORD_SSH_KEY\"] = \"private_key_file_content\"\n", "# or you can set a path to a file\n", "# os.environ[\"ENCORD_SSH_KEY_FILE\"] = \"/path/to/your/private/key\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### [Alternative] Temporary Key\n", "There's also the option of generating a temporary (fresh) ssh key pair via the code cell below.\n", "Please follow the instructions printed when executing the code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ⚠️ Safe to skip if you have authenticated already\n", "import os\n", "\n", "from encord_agents.utils.colab import generate_public_private_key_pair_with_instructions\n", "\n", "private_key_path, public_key_path = generate_public_private_key_pair_with_instructions()\n", "os.environ[\"ENCORD_SSH_KEY_FILE\"] = private_key_path.as_posix()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Load mask-RCNN\n", "\n", "Let's load the Mask-RCNN model and it's image transform such that we can use it for predictions.\n", "\n", "Below, we load the model and it's image transform." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torchvision\n", "import torchvision.models.detection\n", "from torchvision.models.detection.faster_rcnn import FastRCNNPredictor\n", "from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor\n", "from torchvision.transforms import v2 as T\n", "\n", "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", "\n", "\n", "def get_transform():\n", " return T.Compose([T.ToImage(), T.ToDtype(torch.float, scale=True), T.ToPureTensor()])\n", "\n", "\n", "def get_model_instance_segmentation():\n", " model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights=\"DEFAULT\")\n", " model = model.eval().to(device)\n", " transform = get_transform()\n", " return model, transform\n", "\n", "\n", "model, transform = get_model_instance_segmentation()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's define some utility functions to\n", "\n", "1. Convert the raw tensors from Mask-RNN to the encord bitmask coordinates\n", "2. Apply non-maximum suppression (to avoid having many overlapping predictions)\n", "3. Convert the raw tensors to Encord `ObjectInstance`s." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from encord.objects import Object as OntologyObject\n", "from encord.objects import ObjectInstance\n", "from encord.objects.bitmask import BitmaskCoordinates\n", "from encord.ontology import OntologyStructure\n", "from torchvision.ops import nms\n", "\n", "\n", "def to_mask_coordinates(torch_mask: torch.Tensor, threshold: float = 0.5) -> BitmaskCoordinates:\n", " \"\"\"\n", " Convert torch mask to bitmask coordinates.\n", "\n", " args:\n", " - threshold: threshold at which to cut the mask floating point values. Define a runner that will execute the agent on every task in the agent stage\n", "runner = Runner(project_hash=\"\")\n", "\n", "# b. Define ontology map and prepare prediction function\n", "coco_class_names = [\n", " \"__background__\",\n", " \"person\",\n", " \"bicycle\",\n", " \"car\",\n", " \"motorcycle\",\n", " \"airplane\",\n", " \"bus\",\n", " \"train\",\n", " \"truck\",\n", " \"boat\",\n", " \"traffic light\",\n", " \"fire hydrant\",\n", " \"N/A\",\n", " \"stop sign\",\n", " \"parking meter\",\n", " \"bench\",\n", " \"bird\",\n", " \"cat\",\n", " \"dog\",\n", " \"horse\",\n", " \"sheep\",\n", " \"cow\",\n", " \"elephant\",\n", " \"bear\",\n", " \"zebra\",\n", " \"giraffe\",\n", " \"N/A\",\n", " \"backpack\",\n", " \"umbrella\",\n", " \"N/A\",\n", " \"N/A\",\n", " \"handbag\",\n", " \"tie\",\n", " \"suitcase\",\n", " \"frisbee\",\n", " \"skis\",\n", " \"snowboard\",\n", " \"sports ball\",\n", " \"kite\",\n", " \"baseball bat\",\n", " \"baseball glove\",\n", " \"skateboard\",\n", " \"surfboard\",\n", " \"tennis racket\",\n", " \"bottle\",\n", " \"N/A\",\n", " \"wine glass\",\n", " \"cup\",\n", " \"fork\",\n", " \"knife\",\n", " \"spoon\",\n", " \"bowl\",\n", " \"banana\",\n", " \"apple\",\n", " \"sandwich\",\n", " \"orange\",\n", " \"broccoli\",\n", " \"carrot\",\n", " \"hot dog\",\n", " \"pizza\",\n", " \"donut\",\n", " \"cake\",\n", " \"chair\",\n", " \"couch\",\n", " \"potted plant\",\n", " \"bed\",\n", " \"N/A\",\n", " \"dining table\",\n", " \"N/A\",\n", " \"N/A\",\n", " \"toilet\",\n", " \"N/A\",\n", " \"tv\",\n", " \"laptop\",\n", " \"mouse\",\n", " \"remote\",\n", " \"keyboard\",\n", " \"cell phone\",\n", " \"microwave\",\n", " \"oven\",\n", " \"toaster\",\n", " \"sink\",\n", " \"refrigerator\",\n", " \"N/A\",\n", " \"book\",\n", " \"clock\",\n", " \"vase\",\n", " \"scissors\",\n", " \"teddy bear\",\n", " \"hair drier\",\n", " \"toothbrush\",\n", "]\n", "ont_map = {coco_class_names.index(o.name): o for o in runner.project.ontology_structure.objects}\n", "\n", "\n", "# c. Define batch predict function\n", "@torch.inference_mode()\n", "def predict_batch(label_row: LabelRowV2, batch: list[Frame]) -> None:\n", " \"\"\"\n", " Utility to predict across a batch and store predictions on label row.\n", " \"\"\"\n", " input = list(map(lambda i: transform(i.content).to(device), batch))\n", " predictions = model(input)\n", "\n", " for frame, pred in zip(batch, predictions):\n", " for ins in convert_predictions_to_encord(pred, ont_map, frame.frame):\n", " label_row.add_object_instance(ins)\n", "\n", "\n", "# d. Specify the logic that goes into the \"pre-label\" agent node.\n", "@runner.stage(stage=\"\")\n", "def run_something(\n", " lr: LabelRowV2,\n", " frames: Annotated[Iterable[Frame], Depends(dep_video_iterator)],\n", ") -> str:\n", " batch: list[Frame] = []\n", " for frame in frames:\n", " # Collect batch\n", " batch.append(frame)\n", "\n", " # Inference on full batch\n", " if len(batch) == BATCH_SIZE:\n", " predict_batch(lr, batch)\n", " batch = []\n", "\n", " # Inference on last \"half\" batch\n", " if batch:\n", " predict_batch(lr, batch)\n", "\n", " lr.save()\n", " return \"\" # Tell where the task should go" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running the agent\n", "Now that we've defined the project, workflow, and the agent, it's time to try it out.\n", "The `runner` object is callable which means that you can just call it to prioritize your tasks." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run the agent\n", "# After 5 label updates, tasks will be moved in workflow queue.\n", "runner(task_batch_size=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your agent now assigns labels to the videos and routes them appropriately through the Workflow to the annotation stage.\n", "As a result, every annotation task should already have pre-existing labels (predictions) included.\n", "\n", "> 💡*Hint:* If you execute this as a Python script, you can run it as a command line interface by putting the above code in an `agents.py` file and replacing\n", "> ```python\n", "> runner()\n", "> ```\n", "> with\n", "> ```python\n", "> if __name__ == \"__main__\":\n", "> runner.run()\n", "> ```\n", "> Which allows you to set, for example the Project hash using the command line:\n", "> ```bash\n", "> python agent.py --project-hash \"...\"\n", "> ```\n" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 0 }