Prometheus Interview Guide

🟢 Easy (Basics)

1. What is Prometheus?

Monitoring + TSDB with pull model and PromQL.

2. Metric types?

Counter, Gauge, Histogram, Summary.

3. Exporter?

Component exposing /metrics (node_exporter, blackbox).

4. PromQL example?

rate(http_requests_total[5m]) for per‑sec rate.

5. Alerting?

Rules fire → Alertmanager routes/dedupes.

1. Recording rules?

Precompute queries to speed dashboards/alerts.

2. HA?

Two Prom servers + dedupe; or Mimir/Thanos for scale.

3. Remote write/read?

Ship metrics to long‑term backends.

4. Hist vs Summary?

Histograms aggregate; summaries compute client‑side.

5. Service discovery?

K8s/EC2/Consul auto target discovery.

1. Cardinality issues?

Avoid unbounded labels, relabel, drop series.

2. SLO burn‑rate?

Multi‑window queries to detect fast/slow burn.

3. Global view?

Thanos/Mimir with object storage & downsampling.

1. Alert floods.

Tune thresholds, inhibit, group, maintenance silence.

2. High p95 latency in one zone.

Slice by labels; route traffic; check node pressure.

3. Disk usage explosion.

Retention, compaction, remote write for cold data.

Generated for quick interview revision — basics, hands-on, advanced, and scenarios.