Almost a year ago, I wrote about how to deploy a simple MySQL leader/follower setup on Kubernetes.

It’s been a while, and I have decided to look into ‘operator’ solutions, which should be fully compatible with native MySQL implementations (MariaDB was out of the question because of reasons).

If you want something fully MySQL-compatible, you have two major options (I’m not going to mention operators, which seem to be abandonware at this point):

I tested MySQL-Operator on a small testing cluster and decided to move my small workload after a month of battle testing. Migration was supposed to be a quick 20-minute adventure.

Migration gotchas

I wanted to clone data from an existing & running MySQL instance. It turned out ARM won’t be compatible with AMD. You have to run the same patch version of MySQL, but the architecture difference will cause clone init crash, and you will be forced to scrape your cluster and start over once you fix the problem. That really surprised me (prior to battle testing, all my nodes were running on ARM; I did some shuffling in-between). What I had to do is basically reschedule my existing MySQL to an AMD node as my operator was running on AMD CPUs.

Operator doesn’t handle failure upon that initial initDB.clone nicely - once you try to restart it, it will just ignore clone/donor options - the best bet is to scrape the whole configuration (including volumes and any potential configs leftovers - important) and just start over.

Forming a cluster also takes a while - I have a really tiny workload (~12GB of pure data), but it took like a good few minutes for the cluster to establish and for the operator to finally spin up mysql-router pods - as establishing group replication means you have to duplicate that data 3 (or more, depending on your settings) times - for each MySQL pod.

Backup gotchas

MySQL shell dump utility still suffers from a max_execution_time timeout problem during backups, as it seems like there is still no obvious workaround for that.

You can:

General gotchas

Some stuff is simply broken - e.g. unix socket for monitoring is broken; monitorSpec is not being respected at all - I worked around those problems by using Argocd multiple sources and just overriding whole ServiceMonitor definition and its config. Seems like patches done by some random contributors come to the Oracle internal bug tracker to die, which is a pity and makes me question the future of this whole official product.

Operational gotchas

The operator doesn’t observe changes in podSpec, which is quite frustrating as to change resource allocation, you will be patching your stateful set by hand, which seems like a half-baked way to operate your database instance.

The good stuff

Seems like mysql upgrades seems to be handled well - bumping serverVersion in helm values does a nice rolling upgrade on the cluster. In all fairness mysql was always pretty good at in-place upgrades (I’m looking at you PostgreSQL).

All the heavy lifting related to managing group replication, leader election, setting up mysql-router, S3 backups seems to just work - which was a major selling point for me despite shortycomings mentioned above.