I will give you some of my thoughts about migrating legacy database to Ruby On Rails framework. Welcome to part one.
It’s 32°C outside (and apparently inside), I’m trying to focus and finish my old project that I started after-hours. Being a programmer ain’t easy :P. But let’s get back on track. So to happens I have this legacy, really old, not maintained for years database. It comes from some old custom php cms, that is you might say – far from perfect. It doesn’t have any logical constraints (no data integrity) and validation is a mess.
I have started from rough PHP script that tries to fix what can be fixed in a one way or another, purges database from spam (hello there web bots), applies some ‘static’ fixes (like rewriting some custom prepared data) and leaves the rest as-it. I won’t post the code here, because it’s just written Rambo style – I didn’t payed attention to the optimization or anything as I had to run it basically once.
After that it was time to cook rake tasks that will migrate table by table this data into my new app. I have setup a legacy database in
database.yml and created bunch of legacy classes in external file like that:
1 2 3 4 5
Then was the time to start working or rake task, by trial and error I ended up with a function that could be applied for most of my cases.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
What this function does is takes new and old model and hash that consists of keys corresponding to the new model and raw code to apply for that value, optional I’m passing array if IDs of totally broken objects or object that I decided that I don’t want to import for some reason. I’m checking if object was properly saved (validation message) and rescuing any exceptions as I just let it fail if I couldn’t fix it in php script. Few notices:
if I’m rewriting old IDs (and most of the time I am) first I’m fetching stuff I have already imported (maybe in previous run before some additional tweaks) – I don’t want to import 200k records just because two were broken
find_in_batchesis quite useful here, apparently
allworked fine until I got to table that had half a million rows and then ran out of memory, my bad :P
wrap it inside a transaction for some speed improvement, you may want to disable validation or go with raw SQL generation instead – it all depends of your needs
So I can call it like that:
1 2 3 4 5 6 7 8 9 10 11 12 13
Maybe it’s not the perfect solution, not the smartest and quickest – but in my opinion migrations are supposed to be dirty, doable in reasonable time and just should work. You won’t maintain this code in the future anyway, so it’s up to you how much effort you are willing to put it. As it’s my side project I decided to be cheap ;).
In next chapter I will give you some hints how to convert pretty messed up HTML to BBCode (don’t give me that look!) without using regular expressions (that don’t cut it anyway in the end). Cheers!
After some investigation of my migration I realized that using
include? on huge
Array in this case is not the way to go. Instead convert array to
Set. Why you ask? Here you can find some useful information. Simple benchmark below.
1 2 3 4 5 6 7 8 9 10 11