Skip to main content
Prometheus is a time-series metric store. Investigations query it to see how your services behaved around the time of an incident — the error rates that climbed, the latency that crept up, the saturation that tipped a service over.
You can connect Prometheus directly, or have it discovered automatically when you connect Grafana. Either route gives investigations the same access — pick whichever fits how you run Prometheus.

What we support

Investigations query Prometheus with PromQL, its query language. They go beyond reading a metric’s raw value: PromQL turns counters and histograms into the rates and percentiles you actually reason about during an incident.
  • Rates from counters. Counters only ever climb, so the raw number means little on its own. Investigations wrap them in rate() to ask the real question — how fast are requests failing right now, and was that different before the incident started.
  • Percentiles from histograms. Latency lives in histogram buckets, not a single number. Investigations use histogram_quantile() to pull out the p95 or p99 your responders care about, rather than an average that hides the tail.
  • Aggregations across dimensions. With sum by, avg by, and max by, investigations roll a metric up to the dimension that matters — per service, per route, per namespace — to find which slice of your fleet is misbehaving.

Discovering your metrics and labels

Prometheus exposes thousands of metrics and labels, and a query that names the wrong one returns nothing. Rather than guess, investigations read what your Prometheus actually holds: the metric names, their types, and the help text you’ve attached, plus every label and a sense of how many distinct values each one takes. That last point shapes how queries are built. Grouping by a low-cardinality label like service or namespace gives a readable breakdown; grouping by a high-cardinality one like pod or instance produces noise. Investigations learn which labels are which, so they group on the ones that clarify and filter on the ones that would overwhelm. Investigations learn this structure — your metrics, their types, and your labels and their cardinality — automatically. How that works is covered in How telemetry works.

Connecting Prometheus

There are two ways to connect Prometheus. Both give investigations the same access — choose whichever matches how you run it.

Directly

Connect Prometheus on its own, with its endpoint and credentials:
  • The URL of your Prometheus instance, or any backend that speaks the Prometheus HTTP API — Thanos, Cortex, Mimir, and VictoriaMetrics all work the same way.
  • Any authentication it requires — for example basic auth or a bearer token.
If you run more than one Prometheus — separate clusters, regions, or HA replicas — investigations can query them together so an aggregation like sum(rate(...)) returns one correct global figure rather than per-instance numbers stitched together after the fact.

Through Grafana

If your Prometheus already sits behind Grafana, connect Grafana and Prometheus is discovered automatically as one of the data sources behind it, using Grafana’s own credentials — nothing separate to configure. Either way, Prometheus is disabled by default. You opt in deliberately — enable the Prometheus data sources your team uses once they’re connected.

Best practice

  • Connect the Grafana dashboards that query Prometheus. Investigations learn your real query patterns from them — which metrics matter, how they’re filtered and grouped — which makes Prometheus queries more accurate.
  • Keep your metric metadata and help text populated. Investigations read it to tell counters from gauges and histograms, and to pick the right PromQL function for each.
  • Enable the Prometheus data sources your responders actually reach for during incidents, rather than every source available.

Grafana

The provider Prometheus can be connected through.

How telemetry works

How investigations query your metrics.