Pattern 31 / FLOW

MapReduce

Use this when processing huge datasets that cannot fit on one machine.

Pressure
Processing huge datasets that cannot fit on one machine
Mechanism
Map partitions independently, shuffle/group intermediate keys, then reduce results
Toll
High latency and operational overhead compared with streaming for realtime needs
Architecture plate31
Diagram for MapReduce
Executive brief

MapReduce fits when processing huge datasets that cannot fit on one machine. Mechanism: map partitions independently, shuffle/group intermediate keys, then reduce results. Use it for batch analytics, indexing, log processing, and offline aggregation. The toll: high latency and operational overhead compared with streaming for realtime needs.

Use when

Batch analytics, indexing, log processing, and offline aggregation.

Example

Counting events by customer across terabytes of logs.

Review framing

Describe the pressure first, then the mechanism, then the cost. That keeps the design grounded.

Same pressure family

Data Processing Patterns

32Stream Processing33Lambda Architecture34Change Data Capture (CDC)
โ† 3032 โ†’