Incident channel experience

Most of an investigation happens in your incident channel, in Slack or Microsoft Teams. This page covers what you’ll see there (the summary message, the progress updates, and the heads-ups) and how to ask questions or steer the investigation as it runs.

The walkthrough below describes the experience in Slack. Investigations work in Microsoft Teams too, with a few differences. See In Microsoft Teams.

The summary message

When an investigation starts, it posts a single summary message to the channel and keeps it up to date in place. It’s kept short and scannable, with headers for the things a responder most wants to know:

What’s going on?: the situation in plain language.
What caused it?: the current best hypothesis, with its confidence shown inline.
What can I do next?: concrete next steps, linked to the evidence behind them.

While the investigation is still working, the message shows its progress and what it’s still checking. As it learns more, the same message updates, so the channel always shows current thinking, never a stale first guess.

What’s going on? The Redis instance in production is sustaining CPU above 50% (measured at 55.3%), which triggered an operational alert. What caused it? (medium confidence) The elevated Redis CPU is plausibly linked to increased worker queue load, particularly from the statuspage-worker process, a pattern seen in past incidents. What can I do next?

Correlate the Redis CPU spike with statuspage-worker queue metrics in Grafana for the incident window

If they line up, temporarily gate the event subscriptions driving Redis load

Keep watching Redis CPU and workload metrics over the next 30 minutes

Confidence is shown right next to the hypothesis, for example “(medium confidence)” or “(medium confidence, still investigating)”, so you can weigh a suggestion before acting on it. It reflects how well the current theory is evidenced; see Building conviction for how it’s decided.

Progress in the thread

As the investigation works, it posts updates into the thread beneath the summary, so you can watch what it’s doing as it does it. You don’t need to read these to follow along (the summary always reflects the latest) but they’re there when you want the detail. You’ll see things like:

Hypothesis updates when its thinking changes, labeled so you can see the shift at a glance, such as “New hypothesis”, “Hypothesis strengthened”, or “Hypothesis weakened”.
Check results as each piece of work completes: a short summary of what it found, with links to the source.

New hypothesis

I’m now looking at upstream rate limiting from the payments API, with sustained 429s and flat database metrics

Shifted away from the earlier database contention theory

Next: checking whether the payments API quota was changed recently

Querying telemetry

A core deploy shipped 11 minutes before the first error (PR #54586), and the build SHA matches

107 successful responses vs 4 server errors over 4 hours, with no latency spike

Links: Grafana dashboard, PR #54586

This thread is read-only, it’s where the investigation shows its work. To ask a question or steer the investigation, tag @incident in the main channel instead.

Heads-up messages

The summary and thread are there whenever you choose to look. But sometimes the investigation works out something important that you probably don’t know yet, and waiting for you to check back isn’t good enough. In that case it posts a heads-up message to the channel, with the detail in a thread.

Heads up: I think this could be database connection pool exhaustion. The auth-gateway connection pool hit saturation (50/50) at 14:23 UTC, exactly when errors started spiking.

Heads-ups are deliberately quiet. The investigation only posts one when there’s a genuine shift worth your attention (a code change that explains the error, a past incident with the same fingerprint, a third-party outage) so they read as progress, not noise. The thread carries the supporting evidence and links, including any similar past incidents.

Ask and steer with @incident

The investigation isn’t a one-way broadcast. At any point you can talk to it in the channel by tagging @incident.

Ask about the investigation

Ask questions about what it’s found or why it thinks what it thinks, and it answers from everything the investigation knows.

@incident why do you think this is a Redis problem and not Postgres?

@incident has anything like this happened before?

Steer it

If you know something the investigation doesn’t, whether the real cause, a misleading signal, or a wrong turn it’s taking, tell it what to focus on instead. It feeds that in and re-assesses its hypothesis within a few minutes, and your input is attributed in the channel so everyone can see where the change in direction came from.

@incident the investigation is wrong. It’s Redis, not Postgres. Connections have been at 100% since 14:32.

@incident focus on the 14:30 deploy of payment-service v2.3.1. The errors started right after it. The integration warnings are unrelated noise.

You can also steer from the investigation message directly using Add context, or from the incident in the dashboard. Engineers working in a local coding agent can steer it too. See Investigate alongside the agent.

In Microsoft Teams

Investigations work in Microsoft Teams too. The same summary and heads-up messages post to your incident channel, and the investigation reads the channel just as it does in Slack. A few things work differently.

The full investigation lives in an embedded tab. The messages posted to the channel are kept light; the complete investigation (every finding, the evidence behind it, and the checks it ran) opens in a Teams tab from View full investigation on the summary message.
Ask and steer from the tab, not with @incident. Tagging @incident in the channel isn’t supported in Teams. Instead, use Chat to incident about this on the investigation message: it opens the investigation in the embedded tab with an agent chat alongside it, where you can ask questions and steer just as you would in Slack. You can also Add context from the investigation message, or steer from the incident in the dashboard.
Progress updates depend on your channel layout. In channels that use the threads layout, progress is posted as replies beneath the summary, as in Slack. In channels that use the posts layout, the summary can’t be threaded, so it updates in place instead.

We recommend the threads layout for your incident channels. It keeps each investigation’s progress in a thread beneath its summary, closest to the Slack experience, rather than updating the summary in place.

How investigations work

The process behind what you see in the channel.

What we can see

The context an investigation reads from inside your incident, including the channel and call transcripts.

Chatbot

Everything else you can ask @incident during an incident.

​The summary message

​Progress in the thread

​Heads-up messages

​Ask and steer with @incident

​Ask about the investigation

​Steer it

​In Microsoft Teams

​Related

How investigations work

What we can see

Chatbot

The summary message

Progress in the thread

Heads-up messages

Ask and steer with @incident

Ask about the investigation

Steer it

In Microsoft Teams

Related