Idempotent consumers, or how to stop fearing retries
Most message brokers give you at-least-once delivery. That means duplicates are not an edge case — they're a guarantee. Sooner or later the same message will be delivered twice: a consumer crashes after doing its work but before committing the offset, a rebalance re-assigns a partition, a network blip triggers a redelivery. If processing the same message twice corrupts your state, you don't have a delivery problem, you have a design problem.
Idempotency is cheaper than exactly-once
People chase exactly-once semantics when what they actually want is "processing this twice has the same effect as processing it once." That's idempotency, and it's usually far simpler to achieve at the consumer than to enforce end-to-end exactly-once across the whole pipeline.
Give every message a stable key
The whole trick rests on a deduplication key that is stable across retries. Sometimes it's a natural business key (an order id), sometimes you have to generate one upstream and carry it through. What you must not do is derive it from anything that changes between attempts — a processing timestamp, a retry counter, a random UUID minted in the consumer.
The dedup table pattern
For a database sink, the pattern I reach for most is an insert guarded by the message key, done in the same transaction as the actual work:
BEGIN;
INSERT INTO processed_messages (message_key)
VALUES ($1)
ON CONFLICT (message_key) DO NOTHING;
-- if the insert affected 0 rows, we've seen this key: skip the side effects
-- otherwise, do the real work here, in the same transaction
COMMIT;
Because the dedup insert and the side effect commit atomically, a crash
anywhere leaves you consistent: either both happened or neither did. The
processed_messages table does grow, so partition it by day and drop
old partitions once they're safely past your maximum redelivery window.
Natural upserts are even better when they apply. If your sink write is a
MERGE keyed on the business id, the operation is already idempotent and
you may not need a separate dedup table at all. Reach for the table when the side
effect isn't naturally an upsert — sending an email, calling a payment API,
incrementing a counter.