Skip to main content
An investigation isn’t a single prompt to a model. It’s a structured process that gathers evidence, forms a hypothesis, then tests that hypothesis against your code and telemetry — refining its conclusion as new evidence arrives. This page explains how that process works, so you know what to expect from the results and why connecting more sources makes investigations better.

The shape of an investigation

Every investigation moves through three phases.
1

Fast diagnosis

The investigation starts from what it already knows: the alert that fired (including any error and stack trace) and the message that declared the incident. This gives it an initial read on what happened and where, in seconds.
2

Gather context in parallel

While that initial picture forms, the investigation fans out across every source you’ve connected, all at once — searching Slack for relevant discussion, finding similar past incidents and what resolved them, surfacing the runbooks and reference docs that apply, lining up recent deploys, feature flags, and config changes from your change events, pulling in the code changes that could be responsible, and querying your telemetry for anomalies around the time of the incident. It also checks whether any third-party providers you depend on were having an outage at the same moment. All of this runs in parallel, so the slow searches never hold up the rest.
3

Analyze and refine

The investigation pulls the gathered evidence into an initial hypothesis, then looks for what’s missing. It asks targeted follow-up questions — often by reading your code or running further telemetry queries — and feeds the answers back in. Each pass makes the hypothesis more specific and better grounded.
The first two phases run the same way for every incident — they’re the foundational legwork. The third phase is where each investigation becomes specific to your incident, shaped entirely by what the evidence reveals.

Investigations run throughout the incident

An investigation doesn’t stop after its first report. It keeps running for as long as the incident is live, re-assessing as the situation changes. New activity in the channel, fresh alerts, a third-party provider changing state, or a responder steering it — any of these prompt the investigation to gather new evidence and reconsider its hypothesis. This means the investigation stays current with the incident rather than going stale the moment it posts. If the cause shifts, or new information rules out the original theory, the investigation follows along.
Investigations keep going until the incident is resolved or declined. You can also pause one if you’d rather it stopped, and pick it back up later.

From evidence to findings

The investigation reasons in terms of findings — concrete hypotheses about what happened, each backed by evidence.
  • A finding is a claim, like “a recent deploy introduced a query that locks the orders table under load.”
  • Evidence is what supports or contradicts it: a specific Slack message, a pull request diff, a metric spike, a line of code.
  • Each finding carries a confidence level, so you can see how sure the investigation is.
Findings evolve as the investigation progresses. A hypothesis that looks promising early on can be discounted when later evidence contradicts it — and the investigation keeps that audit trail rather than quietly dropping it. The final report surfaces only the findings still standing, each linked back to the evidence behind it.
This is why investigations improve as you connect more sources. A finding grounded in a real code diff and a matching metric spike is far stronger than one inferred from an error message alone.

Reading your code

When an investigation needs to understand the code itself, it does more than search for keywords. It works out which repositories are relevant from the incident’s context, plans the specific questions worth answering — “where is this value set?”, “what changed here recently?” — and then reads the code to answer them. All code analysis runs inside isolated, sandboxed containers. Repositories are cloned into ephemeral workspaces and deleted after use. See Code repositories for how access and security work.

Querying your telemetry

Investigations query the same logs, metrics, traces, and dashboards your responders reach for. Rather than blindly running queries, the investigation learns the shape of each connected data source — its labels, common query patterns, and the dashboards your team actually uses — so its queries are relevant to your systems. See Telemetry for the providers you can connect.

Checking your dependencies

Not every incident is your fault. Alongside everything else, an investigation checks whether the third-party providers you depend on — AWS, GitHub, Stripe, and the like — were having an outage around the time of your incident, and surfaces any that could explain it. This needs no setup. See Third-party dependencies.

What you get

The investigation posts a summary into the incident channel and keeps it up to date as it runs, with the same detail available on the incident in the dashboard. It surfaces:
  • A summary — the headline conclusion in plain language.
  • Findings — the hypotheses that survived, each with its confidence and supporting evidence.
  • Evidence — links straight back to the source: the Slack message, the pull request, the dashboard, the log line.
As it learns more, it threads progress onto the summary and posts a heads-up when it discovers something important — and you can ask it questions or steer it at any point by tagging @incident. See Incident channel experience for what this looks like and how to interact with it.

Investigate alongside the agent

The investigation runs centrally, but you don’t have to work apart from it. With the incident.io desktop app, you can pull a live investigation into a local coding agent such as Claude Code, Codex, or Cursor and investigate side by side — each of you informing the other. It works as a loop:
  • Pull the investigation in — your local agent downloads the full investigation: its findings, the checks it ran, the incident context, and the conversation so far. It can read all of this alongside your actual codebase.
  • Get live updates — as the central investigation learns more, your local agent keeps in sync, so you’re always working from its latest thinking.
  • Send what you find back — when you spot something the investigation hasn’t — the real cause, a misleading metric, a wrong turn it’s taking — you can steer it. Your local agent feeds that back, with evidence, and the central investigation re-assesses its hypothesis within a few minutes. Your input is attributed in the incident channel, so everyone sees where a change in direction came from.
The two work as a pair: the central agent does the broad, continuous legwork across all your sources, while you and your local agent go deep on the code in front of you — and neither of you loses what the other finds.
The desktop app connects incident.io to your local agent over the incident.io MCP. Mention an incident by reference (like INC-123) and a capable agent can pull it in and start working.

Tuning how deep investigations go

Investigations run more than one pass of analysis by default, refining the hypothesis each time. More passes mean a more thorough investigation but a slower one. If you’d like to adjust how deep investigations go for your organization, reach out to us.

Where to go next

Connect your data

The sources an investigation draws on, and how to set each one up.

Triggering investigations

Decide when investigations run.