INITIALIZING SECURE CHANNEL...
CPU MEM NET PING9.2 ms SECSECURED
COFFEE—WEB v4.0 // eu-1
ONLINE 2 847 --:--:--
/english > 43. AI System Design
// УРОК 43

AI System Design

B2

AI System Design

Designing production AI systems requires balancing: quality, latency, cost, and reliability.

Key Design Decisions

DecisionTrade-off
Model choiceLarger model = better quality but higher cost and latency
Streaming vs. batchStreaming = better UX; batch = higher throughput
Caching responsesFaster + cheaper but may return stale answers
Prompt cachingReduces cost for repeated long system prompts
Fallback modelUse cheaper model if primary is unavailable or too slow

Reliability Patterns

  • retry with backoff — retry failed API calls with increasing delays
  • timeout — don't wait more than N seconds for a response
  • fallback — switch to a different model or cached response on failure
  • rate limiting — respect API rate limits to avoid 429 errors
// TERMINAL CHALLENGE

Проверь себя

Q1. What is the main trade-off when choosing between a large and a small LLM?
Q2. What is 'prompt caching' and why is it useful?
Q3. What does 'retry with exponential backoff' mean?
Q4. Complete: 'We set a ___ of 10 seconds so slow responses do not block the user experience.'
Q5. What is a 'fallback model' in an AI system?
╔═ GL1TCH v0.1 ═[ПОДКЛЮЧЕНО]═╗ [×]
СОЕДИНЕНИЕ АКТИВНО
запросов:
// сессия #{} начата
>_
[ РАЗРЫВ СВЯЗИ ]
лимит исчерпан...
иду спать... zzZ
хочешь больше? [зарегистрироваться] // +10 запросов в день