{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pre-label video with Mask-RCNN\n",
"\n",
"This notebook demonstrates how to use a task agent to pre-label videos with predictions.\n",
"We will use the off-the-shelf model MaskRNN in this case.\n",
"\n",
"Before we start, let's get installations and authentication out of the way.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Set up environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Please ensure that you have the `encord-agents` library installed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!python -m pip install encord-agents\n",
"# If you don't have torch installed (Colab does by default)\n",
"# Please install it by following the guide here: https://pytorch.org/get-started/locally/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Authentication\n",
"\n",
"The library authenticates via ssh-keys. Below, is a code cell for setting the `ENCORD_SSH_KEY` environment variable. It should contain the raw content of your private ssh key file.\n",
"\n",
"If you have not yet setup an ssh key, please follow the [documentation](https://agents-docs.encord.com/authentication/).\n",
"\n",
"> 💡 **Colab users**: In colab, you can set the key once in the secrets in the left sidebar and load it in new notebooks with\n",
"> ```python\n",
"> from google.colab import userdata\n",
"> key_content = userdata.get(\"ENCORD_SSH_KEY\")\n",
"> ```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"ENCORD_SSH_KEY\"] = \"private_key_file_content\"\n",
"# or you can set a path to a file\n",
"# os.environ[\"ENCORD_SSH_KEY_FILE\"] = \"/path/to/your/private/key\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### [Alternative] Temporary Key\n",
"There's also the option of generating a temporary (fresh) ssh key pair via the code cell below.\n",
"Please follow the instructions printed when executing the code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ⚠️ Safe to skip if you have authenticated already\n",
"import os\n",
"\n",
"from encord_agents.utils.colab import generate_public_private_key_pair_with_instructions\n",
"\n",
"private_key_path, public_key_path = generate_public_private_key_pair_with_instructions()\n",
"os.environ[\"ENCORD_SSH_KEY_FILE\"] = private_key_path.as_posix()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Load mask-RCNN\n",
"\n",
"Let's load the Mask-RCNN model and it's image transform such that we can use it for predictions.\n",
"\n",
"Below, we load the model and it's image transform."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torchvision\n",
"import torchvision.models.detection\n",
"from torchvision.models.detection.faster_rcnn import FastRCNNPredictor\n",
"from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor\n",
"from torchvision.transforms import v2 as T\n",
"\n",
"device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"\n",
"\n",
"def get_transform():\n",
" return T.Compose([T.ToImage(), T.ToDtype(torch.float, scale=True), T.ToPureTensor()])\n",
"\n",
"\n",
"def get_model_instance_segmentation():\n",
" model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights=\"DEFAULT\")\n",
" model = model.eval().to(device)\n",
" transform = get_transform()\n",
" return model, transform\n",
"\n",
"\n",
"model, transform = get_model_instance_segmentation()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's define some utility functions to\n",
"\n",
"1. Convert the raw tensors from Mask-RNN to the encord bitmask coordinates\n",
"2. Apply non-maximum suppression (to avoid having many overlapping predictions)\n",
"3. Convert the raw tensors to Encord `ObjectInstance`s."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from encord.objects import Object as OntologyObject\n",
"from encord.objects import ObjectInstance\n",
"from encord.objects.bitmask import BitmaskCoordinates\n",
"from encord.ontology import OntologyStructure\n",
"from torchvision.ops import nms\n",
"\n",
"\n",
"def to_mask_coordinates(torch_mask: torch.Tensor, threshold: float = 0.5) -> BitmaskCoordinates:\n",
" \"\"\"\n",
" Convert torch mask to bitmask coordinates.\n",
"\n",
" args:\n",
" - threshold: threshold at which to cut the mask floating point values. Higher values will yield smaller masks.\n",
"\n",
" returns:\n",
" Encord bitmask\n",
" \"\"\"\n",
" binary_mask = (torch_mask > threshold).detach().cpu().numpy().squeeze().astype(bool)\n",
" return BitmaskCoordinates(binary_mask)\n",
"\n",
"\n",
"def apply_nms(pred, nms_iou_threshold: float):\n",
" \"\"\"\n",
" Apply non-maximum suppression to the mask-rcnn predictions.\n",
"\n",
" The method retains the bounding boxes to make it easy to modify the code\n",
" to also work for bounding boxes.\n",
" \"\"\"\n",
" indices = nms(pred[\"boxes\"], pred[\"scores\"], nms_iou_threshold)\n",
" return {\n",
" \"masks\": pred[\"masks\"][indices],\n",
" \"boxes\": pred[\"boxes\"][indices],\n",
" \"labels\": pred[\"labels\"][indices],\n",
" \"scores\": pred[\"scores\"][indices],\n",
" }\n",
"\n",
"\n",
"def convert_predictions_to_encord(\n",
" predictions: dict[str, torch.Tensor],\n",
" ontology_map: dict[int, OntologyObject],\n",
" frame_idx: int = 0,\n",
" conf_threshold: float = 0.50,\n",
" nms_iou_threshold: float = 0.3,\n",
") -> list[ObjectInstance]:\n",
" \"\"\"\n",
" Convert mask-rcnn prediction to Encord object instances.\n",
"\n",
" Intended use in pseudo code:\n",
"\n",
" ```\n",
" preds = model(img)\n",
" instances = convert_predictions_to_encord(preds)\n",
" [label_row.add_object_instance(ins) for ins in instances]\n",
" ```\n",
"\n",
" Args:\n",
" - predictions: The output of mask-rcnn for one frame.\n",
" - ontology_map: The map between predicted labels and the Encord ontology objects.\n",
" - frame_idx: The frame number to associate the prediction with.\n",
" This is particularly important for videos.\n",
" - conf_threshold: The threshold at which we want to retain predictions.\n",
" - nms_iou_threshold: The threshold that we wich to select above during nms.\n",
"\n",
" Returns:\n",
" - The resulting object instanesl.\n",
" \"\"\"\n",
"\n",
" # Apply non-maximum suppression\n",
" if nms_iou_threshold > 0:\n",
" predictions = apply_nms(predictions, nms_iou_threshold)\n",
"\n",
" out: list[ObjectInstance] = []\n",
" for mask, label, conf in zip(predictions[\"masks\"], predictions[\"labels\"], predictions[\"scores\"]):\n",
" if label.item() not in ontology_map or conf < conf_threshold:\n",
" continue\n",
"\n",
" if ont_obj := ontology_map.get(label.item()):\n",
" ins = ont_obj.create_instance()\n",
" ins.set_for_frames(\n",
" frames=frame_idx,\n",
" coordinates=to_mask_coordinates(mask),\n",
" confidence=conf.item(),\n",
" )\n",
" out.append(ins)\n",
" return out"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, let us put this to use in an agent.\n",
"In order to do so, we need i) a project ontology which has classes overlapping with the MaskRCNN classes and ii) a project workflow which allows hooking in a pre-labeling agent."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Set up your Ontology\n",
"\n",
"Create an ontology with __BITMASK__ objects named by some of the following classes (those from COCO).\n",
"\n",
"```\n",
"coco_class_names = [\n",
" 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',\n",
" 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',\n",
" 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',\n",
" 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',\n",
" 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',\n",
" 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',\n",
" 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',\n",
" 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',\n",
" 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table',\n",
" 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',\n",
" 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book',\n",
" 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'\n",
"]\n",
"```\n",
"\n",
"Below is an example:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" \n",
" Figure 1: Project ontology.\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code below will match these against the right coco indices and use the pre-trained model to fill in labels according to this ontology.\n",
"\n",
"[📖 Here](https://docs.encord.com/platform-documentation/GettingStarted/gettingstarted-create-ontology) is the documentation for creating ontologies."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Create a Workflow with a pre-labeling agent node\n",
"\n",
"Create a project in the Encord platform that has a Workflow that includes a pre-labeling agent node before the annotation stage to automatically pre-label tasks with model predictions.\n",
"This node is where we'll hook in Mask-RCNN e to pre-label the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" \n",
" Figure 2: Project workflow.\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice how the workflow has a purple Agent node called \"pre-label.\"\n",
"This node will allow our custom code to run inference over the data before passing it on to the annotation stage.\n",
"\n",
"[📖 Here](https://docs.encord.com/platform-documentation/Annotate/annotate-projects/annotate-workflows-and-templates#creating-workflows) is the documentation for creating a workflow with Encord."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Define the pre-labelling agent\n",
"\n",
"The following code provides a template for defining an agent that does pre-labeling.\n",
"We assume that the project only contains videos and the we want to do pre-labeling on all frames in each video.\n",
"\n",
"You will have to update the three identifiers: \n",
"\n",
"- ``: The project hash of the project that you wish to apply the agent to.\n",
"- ``: The workflow stage name (or uuid) that you want to run inference via.\n",
"- ``: The pathway the the task should follow upon prediction.\n",
"\n",
"\n",
"Note that this code uses the [`dep_video_iterator` dependency](../../reference/task_agents.md#encord_agents.tasks.dependencies.dep_video_iterator) to automatically load an iterator of frames as RGB numpy arrays from the video.\n",
"\n",
"> 💡 Hint: If you want to only predict, e.g., on the first frame, concider using `from encord_agents.tasks.depencencies import dep_single_frame` instead."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import Iterable\n",
"\n",
"from encord.objects.ontology_labels_impl import LabelRowV2\n",
"from encord.project import Project\n",
"from typing_extensions import Annotated\n",
"\n",
"from encord_agents.core.data_model import Frame\n",
"from encord_agents.tasks import Depends, Runner\n",
"from encord_agents.tasks.dependencies import dep_video_iterator\n",
"\n",
"BATCH_SIZE = 10\n",
"\n",
"# a. Define a runner that will execute the agent on every task in the agent stage\n",
"runner = Runner(project_hash=\"\")\n",
"\n",
"# b. Define ontology map and prepare prediction function\n",
"coco_class_names = [\n",
" \"__background__\",\n",
" \"person\",\n",
" \"bicycle\",\n",
" \"car\",\n",
" \"motorcycle\",\n",
" \"airplane\",\n",
" \"bus\",\n",
" \"train\",\n",
" \"truck\",\n",
" \"boat\",\n",
" \"traffic light\",\n",
" \"fire hydrant\",\n",
" \"N/A\",\n",
" \"stop sign\",\n",
" \"parking meter\",\n",
" \"bench\",\n",
" \"bird\",\n",
" \"cat\",\n",
" \"dog\",\n",
" \"horse\",\n",
" \"sheep\",\n",
" \"cow\",\n",
" \"elephant\",\n",
" \"bear\",\n",
" \"zebra\",\n",
" \"giraffe\",\n",
" \"N/A\",\n",
" \"backpack\",\n",
" \"umbrella\",\n",
" \"N/A\",\n",
" \"N/A\",\n",
" \"handbag\",\n",
" \"tie\",\n",
" \"suitcase\",\n",
" \"frisbee\",\n",
" \"skis\",\n",
" \"snowboard\",\n",
" \"sports ball\",\n",
" \"kite\",\n",
" \"baseball bat\",\n",
" \"baseball glove\",\n",
" \"skateboard\",\n",
" \"surfboard\",\n",
" \"tennis racket\",\n",
" \"bottle\",\n",
" \"N/A\",\n",
" \"wine glass\",\n",
" \"cup\",\n",
" \"fork\",\n",
" \"knife\",\n",
" \"spoon\",\n",
" \"bowl\",\n",
" \"banana\",\n",
" \"apple\",\n",
" \"sandwich\",\n",
" \"orange\",\n",
" \"broccoli\",\n",
" \"carrot\",\n",
" \"hot dog\",\n",
" \"pizza\",\n",
" \"donut\",\n",
" \"cake\",\n",
" \"chair\",\n",
" \"couch\",\n",
" \"potted plant\",\n",
" \"bed\",\n",
" \"N/A\",\n",
" \"dining table\",\n",
" \"N/A\",\n",
" \"N/A\",\n",
" \"toilet\",\n",
" \"N/A\",\n",
" \"tv\",\n",
" \"laptop\",\n",
" \"mouse\",\n",
" \"remote\",\n",
" \"keyboard\",\n",
" \"cell phone\",\n",
" \"microwave\",\n",
" \"oven\",\n",
" \"toaster\",\n",
" \"sink\",\n",
" \"refrigerator\",\n",
" \"N/A\",\n",
" \"book\",\n",
" \"clock\",\n",
" \"vase\",\n",
" \"scissors\",\n",
" \"teddy bear\",\n",
" \"hair drier\",\n",
" \"toothbrush\",\n",
"]\n",
"ont_map = {coco_class_names.index(o.name): o for o in runner.project.ontology_structure.objects}\n",
"\n",
"\n",
"# c. Define batch predict function\n",
"@torch.inference_mode()\n",
"def predict_batch(label_row: LabelRowV2, batch: list[Frame]) -> None:\n",
" \"\"\"\n",
" Utility to predict across a batch and store predictions on label row.\n",
" \"\"\"\n",
" input = list(map(lambda i: transform(i.content).to(device), batch))\n",
" predictions = model(input)\n",
"\n",
" for frame, pred in zip(batch, predictions):\n",
" for ins in convert_predictions_to_encord(pred, ont_map, frame.frame):\n",
" label_row.add_object_instance(ins)\n",
"\n",
"\n",
"# d. Specify the logic that goes into the \"pre-label\" agent node.\n",
"@runner.stage(stage=\"\")\n",
"def run_something(\n",
" lr: LabelRowV2,\n",
" frames: Annotated[Iterable[Frame], Depends(dep_video_iterator)],\n",
") -> str:\n",
" batch: list[Frame] = []\n",
" for frame in frames:\n",
" # Collect batch\n",
" batch.append(frame)\n",
"\n",
" # Inference on full batch\n",
" if len(batch) == BATCH_SIZE:\n",
" predict_batch(lr, batch)\n",
" batch = []\n",
"\n",
" # Inference on last \"half\" batch\n",
" if batch:\n",
" predict_batch(lr, batch)\n",
"\n",
" lr.save()\n",
" return \"\" # Tell where the task should go"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running the agent\n",
"Now that we've defined the project, workflow, and the agent, it's time to try it out.\n",
"The `runner` object is callable which means that you can just call it to prioritize your tasks."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Run the agent\n",
"# After 5 label updates, tasks will be moved in workflow queue.\n",
"runner(task_batch_size=5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your agent now assigns labels to the videos and routes them appropriately through the Workflow to the annotation stage.\n",
"As a result, every annotation task should already have pre-existing labels (predictions) included.\n",
"\n",
"> 💡*Hint:* If you execute this as a Python script, you can run it as a command line interface by putting the above code in an `agents.py` file and replacing\n",
"> ```python\n",
"> runner()\n",
"> ```\n",
"> with\n",
"> ```python\n",
"> if __name__ == \"__main__\":\n",
"> runner.run()\n",
"> ```\n",
"> Which allows you to set, for example the Project hash using the command line:\n",
"> ```bash\n",
"> python agent.py --project-hash \"...\"\n",
"> ```\n"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}