Measuring autonomy

Accuracy tells you whether an investigation’s diagnosis was right. Autonomy answers a different question: who surfaced the relevant clues to get to the answer? For each incident we work out how much of the diagnosis Investigations drove on its own, and how much came from your responders. Alongside that we measure time to diagnosis, so you can see how much faster incidents reach an answer when Investigations leads. Both appear on the Investigations page in your dashboard, under Autonomy.

How we work out who diagnosed it

Once responders establish an incident’s cause, we look back over everything that happened before the cause was understood (channel messages, incident calls, and the investigation’s own findings) and reconstruct the diagnostic chain: the small number of steps that were genuinely key to reaching the answer. Exploration that didn’t pan out doesn’t count, and neither does anything after the diagnosis. Confirmation and remediation are important work, but they aren’t diagnosis. Each step is attributed to whoever first surfaced it. Attribution comes from the evidence behind the step: the message, call moment, or investigation finding that delivered the result. If a step’s results arrived in an investigation finding, the step is the investigation’s; if they arrived in a responder’s message, it’s the responders’; a genuine collaboration counts toward both. The balance of steps places each incident in one of four bands:

Band	What it means
Autonomous	Investigations reached the correct diagnosis independent of responder input.
Mostly autonomous	The incident was diagnosed mainly by Investigations, with a small amount of assistance from responders.
Mostly manual	The incident was diagnosed mainly by responders, with a small amount of assistance from Investigations.
Manual	Responders reached the correct diagnosis independent of Investigations input.

Between the two extremes, the band goes to whichever side drove the majority of the steps. On a dead-even split, it goes to whoever surfaced the first load-bearing clue.

An example

Here’s how an example incident breaks down:

The investigation isolates the failing dependency

Checkout starts returning 500s. Within a minute, the investigation traces the errors to a single downstream (the payments service) and posts a heads-up. First diagnostic step, attributed to Investigations.

The investigation recovers the underlying error

It pulls the payments service’s logs and surfaces a spike of database connection-pool timeouts that lines up with the error rate. Second step, again the investigation’s: it recovered the evidence that mattered.

Diagnosed: a responder connects it to a config change

A responder recognizes that the timeouts began right after a deploy that cut the connection-pool size, and names that as the cause. This is the moment the cause is understood, so it ends the diagnostic chain and stops the time-to-diagnose clock. As the step that named the cause, it’s the responder’s step.

Fixed

Responders restore the pool size. The fix comes after the diagnosis, so it doesn’t affect which band the incident lands in, or its time to diagnose.

Three steps in all, two of them the investigation’s, so this incident lands in Mostly autonomous: the band follows who did the majority of the diagnostic work, not who spoke the final diagnosis. Note that credit goes to whoever surfaced each piece of the answer first, regardless of what happened next. If an investigation names the root cause in its hypothesis but responders investigate independently and arrive at the same answer themselves, the credit is still the investigation’s: it found the answer first, even though its version wasn’t the one responders acted on. That’s why we frame this around who diagnosed the incident rather than who fixed it: it measures who found the answer, not whose work resolved the incident.

Because this breakdown credits whoever surfaced each finding first, it deliberately doesn’t tell you how much of an investigation’s content responders actually used. That’s what engagement measures: whether responders read, steered, and acted on what an investigation surfaced. Read the two together: an investigation can drive the diagnosis and still see little engagement, or see heavy engagement without leading the diagnosis.

Time to diagnose

For each incident that reached a diagnosis, we measure the time from the start of the investigation to the moment the cause was first understood: the specific message or call moment where the answer landed, located after the fact from the incident’s own record. The dashboard shows the median time to diagnose for each band, so you can compare how quickly incidents reach a diagnosis depending on who drove them there. We also calculate how much faster incidents are diagnosed when Investigations drove the diagnosis: the median time to diagnose for incidents in the Autonomous and Mostly autonomous bands, compared against the median for the rest (Mostly manual and Manual).

Not enough data

A median is only trustworthy with enough incidents behind it. Any band with fewer than five incidents doesn’t get one: the time card shows Not enough data for that band instead of a number, rather than reading too much into a handful of incidents.

Where you see it

The Autonomy section appears on the Investigations page in your dashboard, once around ten investigations in the selected period have reached a diagnosis we can attribute. The percentages are shares of those diagnosed, attributable incidents. An incident whose cause was never established, or whose diagnostic steps couldn’t be attributed to either side, isn’t part of the split.

FAQs

Why isn't every incident shown in the breakdown?

The breakdown needs a diagnosis to work back from. If an incident’s cause was never established, or we couldn’t locate the moment it was understood, there’s no diagnostic chain to attribute and the incident sits out of it.

What if responders re-did work the investigation had already done?

Credit stays with whoever surfaced it first. It measures who found the answer, not whose version of the answer responders acted on, so an investigation that named the cause early keeps the credit even if responders independently re-derived it later.

Does Autonomous mean nobody was involved?

It means the diagnostic steps all came from Investigations independently of the responders. Responders were still there confirming the diagnosis, fixing the incident, but the work of finding the answer didn’t depend on them.

Does Manual mean Investigations didn't run?

No. Investigations still ran. Manual just means none of its work fed the diagnosis: every key step came from responders, so even where the investigation surfaced findings, none of them turned out to be part of the chain that reached the cause.

How are steps split when both sides worked together?

A step whose result was genuinely a collaboration counts toward both sides. And when an incident splits exactly evenly, the band goes to whoever surfaced the first load-bearing clue.

Does a faster time to diagnose for Investigations-led incidents prove they'd speed up every incident?

No. It’s a comparison across different incidents, not the same incident with and without an investigation. And the incidents Investigations can lead tend to be the more tractable ones. Treat the gap as a useful signal, not a controlled experiment.

Measuring accuracy

How we grade investigations against what really caused your incidents.

How investigations work

The process behind a result, and how an investigation builds conviction in real time.

Incident channel experience

How investigations show up for responders while an incident is live.

Trust and safety

How investigations stay under your control, auditable, and honest about what they know.

Getting started

Alerts

On-call

Incident response

Post-incident

Status pages

Investigations

AI features

Catalog

Workflows

Insights

Integrations

Administration

Need more help?

How we work out who diagnosed it

An example

Time to diagnose

Not enough data

Where you see it

FAQs

Measuring accuracy

How investigations work

Incident channel experience

Trust and safety

​How we work out who diagnosed it

​An example

​Time to diagnose

​Not enough data

​Where you see it

​FAQs

​Related

Measuring accuracy

How investigations work

Incident channel experience

Trust and safety

How we work out who diagnosed it

An example

Time to diagnose

Not enough data

Where you see it

FAQs

Related