View on Github Try in Colab Download notebook

No description has been provided for this image

Sentiment Analysis¶

This notebook walks you through using an AI-powered Agent to analyze the "high confidence" transcriptions on audio files from the Diarization agent.

If you haven't already, consider this notebook for more details on how to first transcribe the audio.

Example Workflow¶

The following workflow illustrates how audio files can be reviewed after pre-labeling by the Diarization agent. The code in this notebook is for the Sentiment analysis agent.

Installation¶

Install the encord-agents and transformers libraries:

In [ ]:

Copied!

!python -m pip install -q encord-agents
!python -m pip install -q transformers
!python -m pip install -q encord-agents
!python -m pip install -q transformers

Pipeline¶

This code uses the transformers library from Hugging Face to create a sentiment analysis pipeline. It initializes a sentiment analysis model and applies it to the given text: "Covid cases are increasing fast!". The model returns a sentiment label ("high confidence" or "low confidence") along with a confidence score.

In [ ]:

Copied!

from transformers import pipeline

sentiment_task = pipeline("sentiment-analysis")

# Example inference
sentiment_task("Covid cases are increasing fast!")
from transformers import pipeline

sentiment_task = pipeline("sentiment-analysis")

# Example inference
sentiment_task("Covid cases are increasing fast!")

Create a Workflow¶

Verify the Workflow being used in your Project is the same as the one outlined in the earlier section of this notebook. The important thing here is that it begins with an agent node that routes to the annotation node. Naming is not important.

📖 here is the documentation for creating a workflow with Encord.

Create an Ontology¶

For this project, we need to have "Audio Object"s with two nested classifications. One object for each speaker ("speaker" should be in the name of the object).

"utterance" (TextAttribute) which holds the transcription to read and do sentiment analysis on.
"sentiment" (RadioAttribute) to store the sentiment of the transcription with three options: "positive", "neutral", "negative".

Here is an example of one such speaker entry:

Encord Authentication¶

Encord uses ssh-keys for authentication. The following is a code cell for setting the ENCORD_SSH_KEY environment variable. It contains the raw content of your private ssh key file.

If you have not setup an ssh key, see our documentation.

💡 In colab, you can set the key once in the secrets in the left sidebar and load it in new notebooks. IF YOU ARE NOT RUNNING THE CODE IN THE COLLAB NOTEBOOK, you must set the environment variable directly.
os.environ["ENCORD_SSH_KEY"] = """paste-private-key-here"""

In [ ]:

Copied!

import os

from google.colab import userdata

os.environ["ENCORD_SSH_KEY"] = "YOUR_KEY_GOES_HERE"
# or this if you want to reuse
# os.environ["ENCORD_SSH_KEY"] = userdata.get("ENCORD_SSH_KEY")
import os

from google.colab import userdata

os.environ["ENCORD_SSH_KEY"] = "YOUR_KEY_GOES_HERE"
# or this if you want to reuse
# os.environ["ENCORD_SSH_KEY"] = userdata.get("ENCORD_SSH_KEY")

Running the Agent¶

This script integrates Hugging Face's sentiment analysis model with Encord's annotation pipeline using encord_agents.

A Runner instance is initialized with a specific project ID.
A processing stage, "Sentiment analysis", is defined.
The function do_analysis extracts textual data (utterances) from objects in a LabelRowV2.
The text is analyzed for sentiment using transformers' sentiment-analysis pipeline.
Based on the sentiment (positive, negative, or neutral), the corresponding label is assigned to the object.
The updated labels are saved back to the annotation row.
The function returns "high confidence" as an indicator of task completion.

This setup automates sentiment labeling in Encord Annotate by leveraging pre-trained NLP models.

In [ ]:

Copied!

from encord.objects.ontology_labels_impl import LabelRowV2

from encord_agents.tasks import Runner

runner = Runner("<Your project ID>")
from encord.objects.ontology_labels_impl import LabelRowV2

from encord_agents.tasks import Runner

runner = Runner("")

In [ ]:

Copied!





@runner.stage("Sentiment analysis")
def do_analysis(label_row: LabelRowV2):
    texts = []
    object_instances = label_row.get_object_instances()
    for o in object_instances:
        attr = o.ontology_item.get_child_by_title("utterance")
        answer = o.get_answer(attribute=attr)
        texts.append(answer)
    sentiments = sentiment_task(texts)
    for o, s in zip(object_instances, sentiments):
        idx = 0 if s["label"] == "POSITIVE" else 2 if s["label"] == "NEGATIVE" else 1
        opt = o.ontology_item.get_child_by_title("sentiment").options[idx]
        o.set_answer(opt)
    label_row.save()
    return "high confidence"
@runner.stage("Sentiment analysis")
def do_analysis(label_row: LabelRowV2):
    texts = []
    object_instances = label_row.get_object_instances()
    for o in object_instances:
        attr = o.ontology_item.get_child_by_title("utterance")
        answer = o.get_answer(attribute=attr)
        texts.append(answer)
    sentiments = sentiment_task(texts)
    for o, s in zip(object_instances, sentiments):
        idx = 0 if s["label"] == "POSITIVE" else 2 if s["label"] == "NEGATIVE" else 1
        opt = o.ontology_item.get_child_by_title("sentiment").options[idx]
        o.set_answer(opt)
    label_row.save()
    return "high confidence"

To execute the agent, run the following command:

Pre-execution Validation¶

To ensure that your project is of an appropriate form to run the above agent, we can perform pre-execution checks that the relevant Workflow and Ontology are in place.

In [ ]:

Copied!





from encord.objects import Object, RadioAttribute, Shape, TextAttribute
from encord.workflow.stages.agent import AgentStage


def pre_execution_validation(runner: Runner) -> None:
    assert runner.project
    project = runner.project

    sentiment_stage = project.workflow.get_stage(name="Sentiment analysis", type_=AgentStage)
    assert sentiment_stage.pathways
    assert any(pathway.name == "high confidence" for pathway in sentiment_stage.pathways)

    assert any(object.shape == Shape.AUDIO for object in project.ontology_structure.objects)
    audio_objects = [object for object in project.ontology_structure.objects if object.shape == Shape.AUDIO]
    if len(audio_objects) > 1:
        print("There are multiple Audio objects")

    def is_acceptable_audio(object: Object) -> bool:
        try:
            object.get_child_by_title("utterance", type_=TextAttribute)
            radio_attr = object.get_child_by_title("sentiment", type_=RadioAttribute)
            assert {"positive", "neutral", "negative"}.issubset({option.label for option in radio_attr.options})
            return True
        except Exception:
            return False

    assert any(is_acceptable_audio(audio_obj) for audio_obj in audio_objects)
from encord.objects import Object, RadioAttribute, Shape, TextAttribute
from encord.workflow.stages.agent import AgentStage


def pre_execution_validation(runner: Runner) -> None:
    assert runner.project
    project = runner.project

    sentiment_stage = project.workflow.get_stage(name="Sentiment analysis", type_=AgentStage)
    assert sentiment_stage.pathways
    assert any(pathway.name == "high confidence" for pathway in sentiment_stage.pathways)

    assert any(object.shape == Shape.AUDIO for object in project.ontology_structure.objects)
    audio_objects = [object for object in project.ontology_structure.objects if object.shape == Shape.AUDIO]
    if len(audio_objects) > 1:
        print("There are multiple Audio objects")

    def is_acceptable_audio(object: Object) -> bool:
        try:
            object.get_child_by_title("utterance", type_=TextAttribute)
            radio_attr = object.get_child_by_title("sentiment", type_=RadioAttribute)
            assert {"positive", "neutral", "negative"}.issubset({option.label for option in radio_attr.options})
            return True
        except Exception:
            return False

    assert any(is_acceptable_audio(audio_obj) for audio_obj in audio_objects)

In [ ]:

Copied!

runner(pre_execution_validation=pre_execution_validation)
runner(pre_execution_validation=pre_execution_validation)

Outcome¶

Your agent assigns labels to videos and routes them through the workflow to the annotation stage. As a result, each annotation task includes pre-labeled predictions.

💡 To run this as a command-line interface, save the code in an agents.py file and replace:
runner()
with:
if __name__ == "__main__":
    runner.run()
This lets you set parameters like the project hash from the command line:
python agent.py --project-hash "..."

View on Github Try in Colab Download notebook