INITIALIZING SECURE CHANNEL...
CPU MEM NET PING9.2 ms SECSECURED
COFFEE—WEB v4.0 // eu-1
ONLINE 2 847 --:--:--
/english > 36. Monitoring & Observability
// УРОК 36

Monitoring & Observability

B2

Monitoring & Observability

The Three Pillars of Observability

PillarWhat it capturesTools
MetricsNumeric measurements over time (CPU, RPS, latency)Prometheus, Datadog
LogsTimestamped text records of eventsELK Stack, Loki
TracesEnd-to-end path of a request through servicesJaeger, Zipkin

SLI / SLO / SLA

  • SLI (Service Level Indicator) — a metric you measure: "p99 latency"
  • SLO (Service Level Objective) — your internal target: "p99 latency < 200ms"
  • SLA (Service Level Agreement) — contractual commitment with customers: "99.9% uptime"
  • error budget — how much downtime/errors you can afford before breaching SLA

Useful Phrases

  • "We alert on p99 latency exceeding 500ms."
  • "Distributed tracing helped us find which service was adding 200ms to the request path."
  • "Our error budget for this quarter is 43 minutes of downtime."
// TERMINAL CHALLENGE

Проверь себя

Q1. What is the difference between an SLO and an SLA?
Q2. What does 'distributed tracing' help you find?
Q3. What is an 'error budget'?
Q4. Complete: 'We ___ on p99 latency exceeding 500ms to catch degradation early.'
Q5. What does p99 latency mean?
╔═ GL1TCH v0.1 ═[ПОДКЛЮЧЕНО]═╗ [×]
СОЕДИНЕНИЕ АКТИВНО
запросов:
// сессия #{} начата
>_
[ РАЗРЫВ СВЯЗИ ]
лимит исчерпан...
иду спать... zzZ
хочешь больше? [зарегистрироваться] // +10 запросов в день