Backfilling a partitioned table without taking down production
Backfills are where careful pipelines go to die. You changed a transformation,
or fixed a bug, and now three years of history need to be recomputed. The naive
approach — one giant INSERT ... SELECT over the whole range — locks
things up, blows out your temp space, and if it fails at hour six you get to
start over. Here's the boring, reliable way I do it now.
Partition by time, iterate by partition
If the table is partitioned by day or month (and a table big enough to need a careful backfill almost always should be), treat each partition as an independent unit of work. Recompute one partition at a time. Each one is small enough to fit in memory, fast enough to retry, and isolated enough that a failure in April 2024 doesn't touch March.
Write to a shadow, then swap
Rather than mutating the live partition in place, I build the corrected data in a staging table and swap it in atomically. In Postgres, partition exchange makes this nearly instant and keeps readers happy:
-- build the corrected partition off to the side
CREATE TABLE events_2024_04_new (LIKE events INCLUDING ALL);
INSERT INTO events_2024_04_new
SELECT * FROM recompute_source WHERE day >= '2024-04-01' AND day < '2024-05-01';
-- swap: detach the old partition, attach the new one
ALTER TABLE events DETACH PARTITION events_2024_04;
ALTER TABLE events ATTACH PARTITION events_2024_04_new
FOR VALUES FROM ('2024-04-01') TO ('2024-05-01');
Readers see the old data until the instant of the swap, then the new data. No half-written partitions, no long-held locks over millions of rows.
Make it resumable and rate-limited
Drive the loop from a small control table that records which partitions are
done, running, or pending. If the job dies, it picks
up where it left off instead of restarting. And put a deliberate sleep between
partitions — a backfill that saturates your I/O will page someone at 3am even if
it technically "isn't touching" production reads. Slower and invisible beats fast
and disruptive every time.