Lately I had to migrate pretty fat (300GB+ data and 150GB+ of index data) postgres table - application grew, we had to get rid of one pretty big column that was redundant now, change main composite index, change one column type etc. etc. The problem was - do it without significant amount of downtime.
You can google up tons of solutions, but let me walk through approach I chose - hopefully you will find it somewhat useful.
I decided to create another table on the side, using
CREATE TABLE ... LIKE INCLUDING ALL - then I dropped all the indexes on new table (you don’t needed extra overhead for the process) and just started copying the data I really needed to new table using dead simple loop in ruby, something more or less like:
insert into new_table (columns...) select columns... from old_table where id between <batch of ~10_000_000 rows> -- do this until insert results 0 rows
The whole process was actually pretty quick (you gotta love SSDs) and took just few hours - without indexes on new table I ended up with 185 GB of fresh & clean data. So far so good.
Then I needed to apply new indexes, the problem was I wanted to use unique index and I knew the current dataset had duplicates, fortunately postgres wiki have a very clever short article how to deal with such problem. And that took another 3.5 hours.
Adding unique index took another 2 hours, then after removing all duplicates and adding index I did full vacuum & analyze on the table and that took yet another hour.
The result was brand new table - with 185GB data & 61GB index - pretty sweet. I prepared the application up-front for upcoming structural changes (that obviously sometimes cannot be done and you have to do some workarounds) and simply renamed both tables within single transaction.
And that’s it. Sounds kinda unimpressive, but I was pretty happy with the result, also performance-wise it was a huge win - by simplifying just few things we lowered our response times by almost twice.