Pattern 31 / FLOW

MapReduce

Use this when processing huge datasets that cannot fit on one machine.

Pressure: Processing huge datasets that cannot fit on one machine
Mechanism: Map partitions independently, shuffle/group intermediate keys, then reduce results
Toll: High latency and operational overhead compared with streaming for realtime needs

Diagram for MapReduce — Architecture plate31

Executive brief

MapReduce fits when processing huge datasets that cannot fit on one machine. Mechanism: map partitions independently, shuffle/group intermediate keys, then reduce results. Use it for batch analytics, indexing, log processing, and offline aggregation. The toll: high latency and operational overhead compared with streaming for realtime needs.

Use when

Batch analytics, indexing, log processing, and offline aggregation.

Example

Counting events by customer across terabytes of logs.

Review framing

Describe the pressure first, then the mechanism, then the cost. That keeps the design grounded.

Same pressure family

Data Processing Patterns

32Stream Processing 33Lambda Architecture 34Change Data Capture (CDC)