Google Cloud Monitoring

Google Cloud Monitoring collects metrics from your Google Cloud projects: your GKE clusters, Cloud SQL instances, load balancers, Pub/Sub subscriptions, and more. Investigations query it to see what your infrastructure was doing around the time of an incident: CPU climbing, a connection pool filling up, a request rate falling away.

Cloud Monitoring is connected through Google Cloud. Connect Google Cloud once, then enable the projects you want investigations to read. There’s nothing to set up for Cloud Monitoring on its own.

What we support

Investigations query Cloud Monitoring with PromQL, through Google Cloud’s Prometheus-compatible API. They use it to:

Graph a metric over the incident window: CPU utilization on a Cloud SQL instance, memory on a container, the backlog on a Pub/Sub subscription, scoped to a project.
Narrow by resource: filter on resource labels so a query reaches one GKE namespace, one database, or one backend service rather than everything in the project.
Aggregate and rank: sum across the containers in a namespace, average across instances, or pull the top few pods by memory to find the one that’s misbehaving.

This covers both Google’s built-in metrics and any custom metrics you send through Managed Service for Prometheus, so a single query can move between infrastructure and your own application metrics. A query can answer questions like:

Did the database’s CPU spike when the checkout errors started?

Which pods in the payments namespace were using the most memory during the outage?

Was the Pub/Sub backlog growing while messages went unprocessed?

Checking a metric before trusting it

A metric that isn’t emitting tells you nothing, and a filter on a label that doesn’t exist returns an empty graph that looks like a problem when it isn’t. Before building a query, investigations check whether a metric is actually producing data in the incident window and learn which labels it carries, so they don’t graph a silent metric or filter on a label that was never there, and they know which resource labels are available to narrow on.

Knowing what’s in each project

A project can run many kinds of workload, and a PromQL query only works if it names the right metric and labels. Cloud Monitoring’s naming differs from plain Prometheus, and resource labels vary by service. Investigations learn what each enabled project actually runs (its GKE clusters, Cloud SQL instances, load balancers, and the metric types they emit) so they query the resources that matter instead of guessing. How that works is covered in How telemetry works.

Connecting Cloud Monitoring

Cloud Monitoring is connected through Google Cloud. Connect Google Cloud with a service account that can read metrics, then enable the projects your team runs production workloads in. Each project is disabled by default, so you opt in deliberately; enabling one turns on its Cloud Monitoring access.

Best practice

Enable the projects your responders actually investigate, rather than every project the service account can reach.
Grant the service account read-only monitoring access. Investigations only ever read from Cloud Monitoring.

Google Cloud

The provider Cloud Monitoring is connected through.

How telemetry works

How investigations query your metrics.

Getting started

Alerts

On-call

Incident response

Post-incident

Status pages

Investigations

AI features

Catalog

Workflows

Insights

Integrations

Administration

Need more help?

Google Cloud Monitoring

What we support

Checking a metric before trusting it

Knowing what’s in each project

Connecting Cloud Monitoring

Best practice

Google Cloud

How telemetry works

​What we support

​Checking a metric before trusting it

​Knowing what’s in each project

​Connecting Cloud Monitoring

​Best practice

​Related

Google Cloud

How telemetry works

What we support

Checking a metric before trusting it

Knowing what’s in each project

Connecting Cloud Monitoring

Best practice

Related