Dead Letter Notes
Notes on data pipelines, message queues, and the failure modes in between.

Idempotent consumers, or how to stop fearing retries

Most message brokers give you at-least-once delivery. That means duplicates are not an edge case — they're a guarantee. Sooner or later the same message will be delivered twice: a consumer crashes after doing its work but before committing the offset, a rebalance re-assigns a partition, a network blip triggers a redelivery. If processing the same message twice corrupts your state, you don't have a delivery problem, you have a design problem.

Idempotency is cheaper than exactly-once

People chase exactly-once semantics when what they actually want is "processing this twice has the same effect as processing it once." That's idempotency, and it's usually far simpler to achieve at the consumer than to enforce end-to-end exactly-once across the whole pipeline.

Give every message a stable key

The whole trick rests on a deduplication key that is stable across retries. Sometimes it's a natural business key (an order id), sometimes you have to generate one upstream and carry it through. What you must not do is derive it from anything that changes between attempts — a processing timestamp, a retry counter, a random UUID minted in the consumer.

The dedup table pattern

For a database sink, the pattern I reach for most is an insert guarded by the message key, done in the same transaction as the actual work:

BEGIN;
  INSERT INTO processed_messages (message_key)
  VALUES ($1)
  ON CONFLICT (message_key) DO NOTHING;

  -- if the insert affected 0 rows, we've seen this key: skip the side effects
  -- otherwise, do the real work here, in the same transaction
COMMIT;

Because the dedup insert and the side effect commit atomically, a crash anywhere leaves you consistent: either both happened or neither did. The processed_messages table does grow, so partition it by day and drop old partitions once they're safely past your maximum redelivery window.

Natural upserts are even better when they apply. If your sink write is a MERGE keyed on the business id, the operation is already idempotent and you may not need a separate dedup table at all. Reach for the table when the side effect isn't naturally an upsert — sending an email, calling a payment API, incrementing a counter.

kafkaidempotencypostgres