Kubernetes

Kubernetes runs your workloads, and when an incident starts the first question is often “what is the cluster doing right now?”. Investigations read your cluster’s live state (the deployments, pods, services, and events) to answer that without anyone reaching for kubectl.

Kubernetes clusters are discovered through your cloud provider. Connect Google Cloud to surface your GKE clusters. Support for discovering EKS clusters through AWS is coming soon. Either route gives investigations the same read-only access to the cluster.

What we support

Investigations read your cluster the way a responder would with kubectl get and kubectl describe: listing resources and describing a single object in detail. They never write to the cluster; access is read-only.

List resources: get a kubectl get-shaped view of any kind: pods, deployments, statefulsets, daemonsets, jobs, services, ingresses, nodes, events, and the custom resources your operators add. Scope a list to a namespace or a label selector to stay fast on busy clusters.
Describe a resource: get the kubectl describe-shaped detail for a single object: its spec and status, labels and annotations, the recent events attached to it, and the ownership chain that links a pod back to its replica set and deployment.

Seeing what’s failing

The useful detail in an incident is rarely the healthy workload; it’s the one that isn’t. When investigations list pods they see the same signals you would: the ready container count, the pod phase, and the restart count. When they describe a failing pod they get its container statuses and the events behind them, so a crash shows up as what it actually is (CrashLoopBackOff, ImagePullBackOff, OOMKilled) rather than a pod that’s simply “not ready”. That lets an investigation walk a symptom to its cause: start at the deployment a responder named, check its rollout conditions, list the pods behind it, and describe the one that’s failing to read the events that explain why. The ownership chain ties it together, so a single failing pod can be traced back to the deployment that owns it.

Logs and metrics live elsewhere

Kubernetes tells investigations the state of your workloads, not what they logged or how much CPU they burned. For the log lines a service emitted, connect a logging data source such as Loki; for resource usage over time, connect a metrics data source such as Prometheus. Investigations combine them: the cluster shows a pod restarting, and your logs and metrics show what led up to it. Investigations learn the shape of each cluster automatically: its namespaces, the workloads that run in them, the label conventions your team uses, and the operators you’ve installed. That structure makes queries land on the right resource the first time. How that works is covered in How telemetry works.

Connecting Kubernetes

You don’t connect a cluster on its own. Connect the cloud provider that hosts it, and investigations discover the clusters behind it using that provider’s credentials.

Through Google Cloud

Connect Google Cloud and your GKE clusters are discovered automatically. The service account you grant Google Cloud is exchanged for cluster access, so each discovered cluster inherits those credentials. One provider connection can surface many clusters, so discovered Kubernetes clusters are left disabled by default. Review the clusters that appear and enable the ones your team runs incidents against.

Through AWS (coming soon)

Discovering your EKS clusters through AWS is coming soon. Once it’s available, connecting AWS will surface your EKS clusters automatically, reached through that same connection, with nothing separate to configure per cluster.

Best practice

Grant the cloud provider read-only access. Investigations only ever read cluster state, so a read-only role keeps the blast radius small.
Enable the clusters your responders actually investigate (your production clusters) rather than every cluster the provider can see.
Connect a logging and a metrics data source alongside Kubernetes. Cluster state shows you what failed; logs and metrics show you why.

AWS

Discover your EKS clusters (coming soon).

Google Cloud

Connect Google Cloud to discover your GKE clusters.

How telemetry works

How investigations learn and query your cluster.

Getting started

Alerts

On-call

Incident response

Post-incident

Status pages

Investigations

AI features

Catalog

Workflows

Insights

Integrations

Administration

Need more help?

What we support

Seeing what’s failing

Logs and metrics live elsewhere

Connecting Kubernetes

Through Google Cloud

Through AWS (coming soon)

Best practice

AWS

Google Cloud

How telemetry works

​What we support

​Seeing what’s failing

​Logs and metrics live elsewhere

​Connecting Kubernetes

​Through Google Cloud

​Through AWS (coming soon)

​Best practice

​Related

AWS

Google Cloud

How telemetry works

What we support

Seeing what’s failing

Logs and metrics live elsewhere

Connecting Kubernetes

Through Google Cloud

Through AWS (coming soon)

Best practice

Related