Optimize database for fun & profit
We got to a point where we ran out of
int32 range on our biggest table - remember when Basecamp years later made a big blog post after the same issue caused some downtime for them? We were overflowing
int32 before it was cool!
Once you get to that point you have to think if you’re using that
datetime field and if maybe
date won’t do the job here (4 bytes less for each row!) or maybe is that
smallint is enough there? Or do you even need that primary
int64 key?! It is funny what you can notice when you stop for a while and look at your data from a perspective. Sometimes it’s hard to leave your code-database bubble - especially when you are in that bubble for a while.
Once we kept attention to field types and dropped the primary key (as we were relying on the unique compound key anyway) I think we cut the table size almost by half (and it was something around 500-600GB at that time).
Fun fact: once you run out of disk space Postgres usually reverts the transaction/whatever you are doing and releases the disk space back to the system. But sometimes it can just crash and exits - I had a small heart attack once that happened 😅, but it seems re-spinning the process was enough to bring it back online.
Oh yeah, running on bare-metal servers on OVH’s vRack to keep everything cost-efficient as possible was fun! Thankfully our whole team was backed by a truly experienced devops/admin person!
Now when I see that company throws like $7k/mo at a single Heroku Postgres instance (that holds like less than 100GB of data) it just hurts. Vendor lock-in is for sure a thing and it seems current cloud platforms like GCP or AWS that were supposed to offload the work from the developers now require whole teams of devops to operate and maintain. Oh, the irony.
Infrastructure-wise I think you can achieve very much using well-tested and boring solutions. I don’t think most companies need to scale that much frankly speaking - surely if you’re a big player and handle tens of millions of requests per day you deal with a completely different set of problems, but again - I don’t think most aspiring startups will ever get to that point and they spent too much preparing for something that is simply not necessary.