Decision ledger
A one-page table for architecture trade-offs.
When the meeting needs a clean answer, use this ledger: pattern, pressure, toll.
| Route | Pressure | Toll |
|---|---|---|
| 01Primary-Replica (Leader-Follower) | Scaling read traffic without sending every query to the write leader | Replication lag can make replica reads stale; synchronous replication reduces lag but slows writes |
| 02Sharding (Horizontal Partitioning) | A single database growing beyond one machine’s write or storage limits | Cross-shard queries, hot keys, and resharding are operationally painful |
| 03Consistent Hashing | Redistributing too much data when nodes join or leave a cluster | More complex than modulo hashing and still needs virtual nodes to balance load |
| 04Write-Ahead Log (WAL) | Recovering durable writes after a crash without corrupting storage | Extra write amplification and log compaction/retention work |
| 05Event Sourcing | Needing a perfect audit trail and the ability to reconstruct old states | Schema evolution, replay cost, and unbounded event growth require discipline |
| 06CQRS (Command Query Responsibility Segregation) | Read needs and write invariants fighting over the same model | Two models, projection lag, and more moving pieces |
| 07Cache-Aside (Lazy Loading) | Avoiding repeated database reads for frequently requested data | First request is slow and cache invalidation must be handled carefully |
| 08Write-Through | Keeping cache and storage fresh immediately after writes | Writes are slower and unused values may occupy cache |
| 09Write-Behind (Write-Back) | Absorbing high write volume with low perceived latency | A crash before flush can lose data and the database intentionally lags |
| 10Read-Through | Keeping application code free of cache-loading logic | Cache infrastructure becomes coupled to the data source and query semantics |
| 11Cache Stampede Prevention | Preventing one expired hot key from stampeding the database | Locks, jitter, and stale-while-revalidate logic add operational complexity |
| 12Request-Response (Synchronous) | Asking another component for an immediate answer | Latency and failures propagate directly through synchronous call chains |
| 13Message Queue (Asynchronous) | Decoupling producers from slower or unreliable consumers | Results are delayed and delivery semantics/order must be designed |
| 14Publish-Subscribe (Pub/Sub) | Letting many services react to the same business event | Duplicates and ordering differences require idempotent subscribers |
| 15Event-Driven Architecture | Reducing direct service coupling across a workflow | Debugging and latency become distributed across logs, queues, and consumers |
| 16Webhooks | Notifying an external system exactly when something happens | The receiver must be reachable, verify signatures, retry safely, and handle duplicates |
| 17Server-Sent Events (SSE) | Pushing server updates to browsers without full duplex complexity | Server-to-client only and browser connection limits apply |
| 18Bidirectional Streaming (WebSockets / gRPC Streaming) | Supporting continuous two-way real-time interaction | Millions of open connections need specialized routing, backpressure, and reconnect logic |
| 19Circuit Breaker | Stopping a failing dependency from exhausting callers | Fallbacks may be degraded and thresholds must fit real failure modes |
| 20Retry with Exponential Backoff | Handling transient failures without giving up immediately | Poorly bounded retries amplify outages and increase tail latency |
| 21Bulkhead | Keeping one workload or tenant from sinking the whole service | Reserved pools can reduce utilization when traffic is uneven |
| 22Timeout | Preventing slow dependencies from tying up resources forever | Too short causes false failures; too long delays recovery |
| 23Idempotency | Making retries safe when clients cannot know if a write succeeded | Requires storing keys/results and checking duplicates on the write path |
| 24Dead Letter Queue (DLQ) | Stopping poison messages from blocking the main queue forever | Dlqs need ownership, alerts, replay tooling, and cleanup |
| 25Graceful Degradation | Serving something useful when a noncritical subsystem breaks | Degraded modes must be designed and tested before outages happen |
| 26Horizontal Scaling | Handling more stateless request volume by adding machines | Needs load balancing and externalized session/state storage |
| 27Vertical Scaling | Getting more capacity quickly from a single-node component | Hard physical ceiling, larger blast radius, and diminishing returns |
| 28Load Balancing | Spreading incoming requests across healthy backends | Health checks, uneven workloads, stickiness, and overload handling matter |
| 29Auto-Scaling | Matching capacity to variable traffic without manual intervention | Scaling reacts with delay and can hide inefficient code or create cost surprises |
| 30Database Connection Pooling | Avoiding expensive database connection setup per request | Too few connections queue requests; too many overload the database |
| 31MapReduce | Processing huge datasets that cannot fit on one machine | High latency and operational overhead compared with streaming for realtime needs |
| 32Stream Processing | Reacting to data continuously instead of waiting for batch jobs | Ordering, replay, watermarks, and exactly-once semantics are hard |
| 33Lambda Architecture | Combining accurate batch views with low-latency realtime views | Duplicated logic and reconciliation complexity |
| 34Change Data Capture (CDC) | Letting other systems react to database changes reliably | Schema changes, ordering, replay, and backfills need care |
| 35API Gateway | Giving clients one stable doorway into many backend services | Gateway misconfiguration can become a bottleneck or single failure point |
| 36Backend for Frontend (BFF) | Serving different client experiences without one bloated api | More api surfaces and potential duplicated business logic |
| 37Rate Limiting | Protecting services from abusive or accidental request floods | Legitimate bursts can be throttled if limits are too blunt |
| 38Pagination (Cursor-Based) | Returning large changing lists without skips or duplicate surprises | Harder than offset paging and requires stable ordering |
| 39API Versioning | Evolving an api without breaking existing clients | Old versions create maintenance burden and migration planning |
| 40CDN (Content Delivery Network) | Serving static content with low latency worldwide | Cache invalidation, stale content, and dynamic personalization boundaries |
| 41Reverse Proxy | Putting common web concerns in front of application servers | Incorrect headers/routing can hide client identity or create difficult bugs |
| 42Service Mesh | Standardizing service-to-service behavior across many teams | Operational complexity and another layer to debug |
| 43Sidecar Pattern | Adding cross-cutting behavior without modifying the main app | Resource overhead and lifecycle coupling with the main service |
| 44Two-Phase Commit (2PC) | Committing one transaction atomically across multiple participants | Blocking behavior, coordinator failure modes, and poor fit for long workflows |
| 45Saga Pattern | Coordinating distributed work without one global transaction | Compensation is business-specific and final state is eventually consistent |
| 46Quorum | Tuning consistency and availability in replicated systems | Higher quorum counts increase latency and reduce availability during failures |
| 47Vector Clocks | Detecting causal ordering without a global clock | Metadata grows with node count and conflicts still need resolution policy |
| 48Health Check Endpoint | Letting infrastructure know whether a service should receive traffic | Shallow checks miss real failures; deep checks can overload dependencies |
| 49Distributed Tracing | Seeing where time and failures go across a distributed request | Sampling, context propagation, and cardinality must be managed |
| 50Canary Deployment | Reducing deployment risk by exposing new code gradually | Requires traffic splitting, compatible versions, and strong metrics |
| 51Outbox Pattern | Reliably publishing events after a database write | Requires a relay, dedupe, monitoring, and cleanup of old outbox rows |
| 52Inbox Pattern | Processing incoming messages exactly once from the consumer perspective | Consumer storage and idempotency logic become part of the contract |
| 53Transactional Messaging | Coordinating local state changes with external messages | Eventual consistency and operational repair paths must be explicit |
| 54Compensating Transaction | Undoing a multi-step workflow when one step fails | Compensation may be partial, delayed, or business-specific rather than a true undo |
| 55Materialized View | Serving expensive read shapes without recomputing them on every request | Views lag source-of-truth data and need rebuild/replay procedures |
| 56CQRS Projection | Keeping write models clean while serving many read models | Projection drift, replay cost, and schema evolution need careful operations |
| 57Read Repair | Healing stale replicas during normal reads | Reads become slightly more complex and stale data can still leak briefly |
| 58Hinted Handoff | Handling writes when a replica is temporarily unavailable | Hint buildup can create recovery storms and needs retention limits |
| 59Anti-Entropy Repair | Converging replicas after missed writes or partitions | Repair jobs consume io and must be paced to avoid user impact |
| 60Leader Election | Choosing one active coordinator without split brain | Clock assumptions, lease expiry, and failover behavior must be designed carefully |
| 61Distributed Lock | Serializing access to shared work across nodes | Locks can expire mid-work; correctness needs fencing tokens or idempotency |
| 62Lease | Granting temporary ownership without permanent locks | Clock skew and renewal pauses can cause overlapping owners |
| 63Fencing Token | Preventing an old owner from writing after a newer owner appears | Every protected resource must validate tokens for the guarantee to hold |
| 64Work Queue | Distributing background work across many workers | Requires backpressure, poison-message handling, and idempotent jobs |
| 65Priority Queue | Letting urgent work bypass routine backlog | Low-priority starvation and priority inflation need controls |
| 66Fan-Out / Fan-In | Parallelizing many independent subtasks and aggregating results | Tail latency, partial failure, and result ordering become explicit concerns |
| 67Scatter-Gather | Querying multiple providers or shards at once | Slow or failed branches need deadlines, fallbacks, and partial response semantics |
| 68Hedged Requests | Reducing tail latency from straggler instances | Extra load can amplify incidents if hedging is not capped |
| 69Request Coalescing | Preventing many identical requests from doing duplicate work | The coalescer becomes a hot path and must isolate failures |
| 70Singleflight | Collapsing identical work inside one process | Only helps per process unless paired with distributed coordination |
| 71Token Bucket | Allowing bursts while enforcing an average rate | Burst size and refill rate must match real capacity |
| 72Leaky Bucket | Smoothing bursty input into steady output | Queues add latency and overflow policy matters |
| 73Adaptive Concurrency Limit | Finding safe concurrency without static limits | Bad feedback signals can oscillate or over-throttle |
| 74Backpressure | Preventing fast producers from overwhelming slow consumers | Requires a policy for what gets delayed, dropped, or degraded |
| 75Load Shedding | Protecting core service during overload | User-visible errors are intentional; prioritization must be defensible |
| 76Brownout | Reducing optional work before the whole service fails | Requires knowing which work is optional and testing degraded paths |
| 77Fail-Fast | Avoiding wasted work when success is unlikely | Can be too aggressive without good health signals |
| 78Fallback Cache | Serving acceptable stale data when live dependencies fail | Staleness must be visible and bounded |
| 79Multi-Region Active-Active | Serving writes and reads from more than one region | Conflict resolution, data residency, and operational complexity rise sharply |
| 80Active-Passive Failover | Recovering from a primary site failure | Rpo/rto depend on replication and rehearsed runbooks |
| 81Cell-Based Architecture | Containing blast radius as a platform grows | Capacity balancing and cross-cell operations get harder |
| 82Shuffle Sharding | Reducing how many tenants share the same failure domain | Routing and capacity math are more complex |
| 83Static Stability | Surviving dependency failure without immediate scaling or coordination | Costs more up front and requires discipline not to rely on emergency scaling |
| 84Strangler Fig | Replacing legacy systems incrementally | Routing, data synchronization, and cutover criteria must be explicit |
| 85Branch by Abstraction | Changing implementations without long-lived feature branches | The abstraction can leak or become permanent if not retired |
| 86Parallel Run | Validating a new system against an old one | Double-running increases cost and comparison logic must handle legitimate differences |
| 87Shadow Traffic | Testing a new service with production-shaped input safely | Privacy, side effects, and amplified load must be controlled |
| 88Feature Flag | Changing behavior without redeploying | Flag debt and inconsistent states need lifecycle management |
| 89Blue-Green Deployment | Cutting over between two complete environments | Requires duplicate capacity and careful database compatibility |
| 90Rolling Deployment | Updating a fleet gradually without full downtime | Mixed versions must be compatible during the rollout |
| 91Schema Versioning | Changing data contracts without breaking old readers | Old versions and migration states must be actively retired |
| 92Expand-Contract Migration | Changing schemas while old and new code overlap | Takes multiple deployments and careful observability |
| 93Tombstone | Representing deletes safely in replicated/evented systems | Tombstones consume storage and retention must exceed replication lag |
| 94Soft Delete | Allowing recovery and audit after deletion | Queries must consistently filter deleted data; privacy rules may require hard delete |
| 95Data Retention Window | Bounding storage, privacy, and replay obligations | Retention must reconcile legal, product, and operational needs |
| 96Audit Log | Explaining who changed what and when | Audit data is sensitive and must be protected from tampering |
| 97Policy Decision Point / Policy Enforcement Point | Separating authorization decisions from enforcement locations | Latency and availability of policy checks become critical |
| 98Secret Rotation | Changing credentials without downtime | Every consumer must be discoverable and rotation must be rehearsed |
| 99Envelope Encryption | Protecting data with manageable key rotation | Key hierarchy, access control, and recovery procedures add complexity |
| 100Control Plane / Data Plane Split | Keeping management operations separate from request serving | Control-plane outages must not immediately stop stable data-plane traffic |