Measuring engagement

Accuracy is the entry bar: an investigation has to be right often enough to be worth a responder’s time. But accuracy and engagement answer different questions. Accuracy asks was the investigation right? Engagement asks did responders use it? A perfectly accurate investigation that nobody read created no value; a partly-right one that prompted a responder to check the right thing created a lot. We track both, because the return only shows up when accuracy and engagement are both there. See Why accuracy comes first.

How we measure engagement

We give every investigation a single engagement score, computed once its incident closes. It’s deterministic: there’s no separate AI judgment involved. We tally the interactions we already record between responders and the investigation, weight them by how much they signal real use, and normalize for the number of responders.

What we count

Not every interaction says the same thing. Skimming a link is weaker evidence than merging the code change an investigation proposed. So we group signals into three strengths, from strongest to weakest:

Acted — a responder acted on it: acted on a heads-up, merged a code change, merged a duplicate incident it flagged, steered the investigation, or held a sustained conversation with @incident.
Responded — a responder worked with it: responded to a heads-up, rated a message useful, or asked @incident a question.
Noticed — a responder took it in: clicked through to something it surfaced, or rated a message somewhat useful.

These are the individual signals we count. On the homepage they’re aggregated into a single score and shown as one of three engagement bands. A few details shape the tally:

Acted signals count for the most. These are the moments where an investigation changed what a responder did: the clearest evidence it earned its place. A conversation with @incident counts for more the longer it runs: a sustained exchange weighs more than a single question.
We only count real engagement. A heads-up that was ignored, or a message nobody reacted to, contributes nothing.

Normalizing for incident size

A raw tally would make every large incident look more engaged than every small one, simply because more people were in the channel. That’s not what we want to measure. So we scale the score by the number of responders on the incident, but gently, so it grows slower than headcount. A small, tightly-focused incident where two responders both acted on the investigation can score as high as a large one where a handful of a much bigger crowd did. This keeps engagement comparable across incidents of very different sizes, so a trend in the score reflects how responders are using investigations rather than how big their incidents happened to be.

Engaging responders

Alongside the score, we track how many of an incident’s responders engaged: the distinct people behind an attributable signal, as a share of everyone who responded. This separates two very different situations that can produce the same score: one responder leaning on the investigation heavily, versus the whole team each using it a little. Broad engagement across a team is a stronger sign that investigations have become part of how people respond, rather than something one person relies on.

Seeing engagement in your dashboard

You can view engagement metrics for your account on the Investigations homepage in your dashboard, alongside accuracy. Each investigation’s score is shown as one of three bands:

Band	What it means
None	Responders made no use of the investigation. We couldn’t attribute any interaction: no reactions, feedback, questions, or actions.
Light	Responders made light use of the investigation. Some engagement, usually weaker signals like reading what it surfaced or asking a single question.
Strong	Responders made heavy use of the investigation. At least one Acted signal, such as acting on a heads-up, merging a code change it proposed, or steering it. A consistent run of lighter signals can also reach this band.

The band comes from the same score described above, so it already accounts for both how strong each signal was and how many responders were on the incident. To dig into specific incidents, use the investigations list there to find investigations with particularly high or low engagement, a quick way to see what’s landing with your responders and what isn’t.

How we use engagement scores

Every closed incident’s investigation is scored this way, so we have a continuous picture of how investigations are being used across your account rather than a one-off sample. We use these scores in a few ways:

Spotting where value is or isn’t landing. Accuracy tells us whether investigations are right; engagement tells us whether that’s translating into use. High accuracy with low engagement points at a delivery problem (the right answer surfaced in a way that didn’t land) rather than a reasoning one.
Driving improvements. Engagement points at how investigations show up for responders: the timing and clarity of heads-up messages, how findings are surfaced in the channel, and how actionable their suggestions are.
Specific to your teams. Because it’s built from your responders’ real interactions, engagement reflects how your teams work with investigations, not a generic average.

FAQs

How is engagement different from accuracy?

Accuracy measures whether an investigation was right, graded against the cause your responders established. Engagement measures whether responders used it. An investigation can be accurate but ignored, or partly right but genuinely useful. The two are measured separately and both matter. See Measuring accuracy.

Does engagement use AI to judge interactions?

No. The engagement score is deterministic: it’s a weighted tally of interactions we already record (heads-up reactions, feedback, steers, chatbot threads, link clicks, merged code changes), normalized for incident size. There’s no separate model judgment involved.

Why normalize by the number of responders?

Without it, big incidents would always look more engaged than small ones, purely because more people were present. Scaling by responder count (gently, so it grows slower than headcount) keeps a small, high-engagement incident comparable to a large one, so the score reflects how investigations are used rather than how big the incident was.

Does a low engagement score mean the investigation was bad?

Not on its own. A low score can mean the investigation wasn’t useful, but paired with high accuracy, it more often points at how findings were surfaced, or an incident that resolved before responders needed to lean on it. It’s a signal to look at, not a verdict.

Measuring accuracy

How we grade investigations against what really caused your incidents, and why accuracy comes first.

Incident channel experience

Where responders meet investigations, from heads-up messages to asking questions and steering.

How investigations work

The process behind a result, and how an investigation builds conviction in real time.

Getting started

Alerts

On-call

Incident response

Post-incident

Status pages

Investigations

AI features

Catalog

Workflows

Insights

Integrations

Administration

Need more help?

How we measure engagement

What we count

Normalizing for incident size

Engaging responders

Seeing engagement in your dashboard

How we use engagement scores

FAQs

Measuring accuracy

Incident channel experience

How investigations work

​How we measure engagement

​What we count

​Normalizing for incident size

​Engaging responders

​Seeing engagement in your dashboard

​How we use engagement scores

​FAQs

​Related

Measuring accuracy

Incident channel experience

How investigations work

How we measure engagement

What we count

Normalizing for incident size

Engaging responders

Seeing engagement in your dashboard

How we use engagement scores

FAQs

Related