INITIALIZING SECURE CHANNEL...
CPU
—
MEM
—
NET
—
PING
9.2 ms
SEC
SECURED
COFFEE—WEB
v4.0 // eu-1
ONLINE
2 847
--:--:--
guest@coffee-web:~$
/главная
/блог
/games
/services
/english
/telegram
/login
/register
●
uplink stable
/english
>
36. Monitoring & Observability
// УРОК 36
Monitoring & Observability
B2
Monitoring & Observability
The Three Pillars of Observability
Pillar
What it captures
Tools
Metrics
Numeric measurements over time (CPU, RPS, latency)
Prometheus, Datadog
Logs
Timestamped text records of events
ELK Stack, Loki
Traces
End-to-end path of a request through services
Jaeger, Zipkin
SLI / SLO / SLA
SLI
(Service Level Indicator) — a metric you measure: "p99 latency"
SLO
(Service Level Objective) — your internal target: "p99 latency < 200ms"
SLA
(Service Level Agreement) — contractual commitment with customers: "99.9% uptime"
error budget
— how much downtime/errors you can afford before breaching SLA
Useful Phrases
"We alert on p99 latency exceeding 500ms."
"Distributed tracing helped us find which service was adding 200ms to the request path."
"Our error budget for this quarter is 43 minutes of downtime."
← 35. Trade-offs & Decision Making
37. LLMs & Prompting Vocabulary →
// TERMINAL CHALLENGE
Проверь себя
Q1.
What is the difference between an SLO and an SLA?
An SLO is an internal target; an SLA is a contractual commitment to customers.
They are the same thing with different names.
An SLA is internal; an SLO is external.
An SLO is for infrastructure; an SLA is for application code.
Q2.
What does 'distributed tracing' help you find?
Which service in a chain of microservices is adding latency to a request.
Memory leaks in individual services.
Security vulnerabilities in the API.
Slow database queries.
Q3.
What is an 'error budget'?
The allowed amount of downtime or errors before breaching the SLA.
The number of bugs allowed per release.
A budget for buying monitoring tools.
The maximum number of retries allowed.
Q4.
Complete: 'We ___ on p99 latency exceeding 500ms to catch degradation early.'
alert
warn
notify
trigger
Q5.
What does p99 latency mean?
99% of requests complete faster than this value — it represents the worst-case experience for most users.
The average latency across all requests.
The latency of the 99th server.
99 milliseconds of latency.
╔═ GL1TCH v0.1 ═[ПОДКЛЮЧЕНО]═╗
[×]
СОЕДИНЕНИЕ АКТИВНО
запросов:
// сессия #{
} начата
>_
[ОТПРАВИТЬ]
[ РАЗРЫВ СВЯЗИ ]
лимит исчерпан...
иду спать... zzZ
хочешь больше?
[зарегистрироваться]
// +10 запросов в день