I wasted a few last weekends setting up a self-hosted runner set on a Kubernetes cluster just to… that’s actually a good question - I’ve already forgotten why.
So, I have a fairly easy CircleCI configuration for my Rails app. There are a few jobs running on different containers that prepare the dependencies cache, followed by some linters. Then, there are a few parallelized jobs that run tests, a fan-in job that collects test coverage results, and uploads them to Codecov. There’s also an optional step that builds a Docker image, pushes it to ECR, and optionally triggers ArgoCD deployments.
It’s not much, but at the same time, it mimics a setup that you might call a “production” step. There’s caching involved, an automated repository that maintains a CI-relevant Docker image used by CircleCI, and overall, the “waste” in that pipeline is really minimal. CircleCI can also cache those containers on its end (depending on which VM I’ll hit), which is also a nice addition. And Docker layer caching seems to mostly work.
I kind of like CircleCI as a product, but it has those periods where barely anything works; jobs hanging, never finishing. So, I decided to take a peek at the other side.
Getting GitHub Actions “production ready”
Let’s recap what I need from the CI:
- Aggressive caching of dependencies from different ecosystems
- Docker layer caching for the Docker build step
- The ability to parallelize a single job
- (Nice to have) Test timing so I can distribute the tests based on historical execution time evenly
Spoiler alert
In the end, I abandoned this experiment. In the meantime, GitHub announced changes in the pricing model for self-hosted runners (just to postpone it after backslash). I encountered numerous basic issues, such as the inability to provide container arguments, it turned out even if you try to spin up two containers using same image it will fail to start Eventually, the entire setup stopped working because the actions-runner image I was using had become outdated and refused to accept any new jobs (TIL).
Caching
Biggest PITA was caching everything - dependencies and docker layers. I had to hack together:
actions-cache-server - which forces you to build your own runner image (but thankfully this whole even exists in the first place)
Customization of Docker images: you’ll likely need to customize your existing Docker images to use the
runner(1001) user and/home/runnerwork directory. Otherwise all hell breaks loose due to myriad of assumptions along the wayDocker Layer Caching: to cache Docker layers, you’ll need to spin up your Docker registry and make some crazy hacks within the action to force its usage. Here’s an example of how you can set up Docker Buildx:
# NOTE: This spins up a new deployment in Kubernetes and you might need to patch the RBAC provided by ARC.
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
driver: kubernetes
# my specific node selector and tolerations so pods land on CI nodes
driver-opts: |
"nodeselector=ci=true","tolerations=key=ci,value=true,effect=NoSchedule"
# I spin up basic registry, no auth, access from within cluster only
buildkitd-config-inline: |
[registry."docker-registry.docker-registry.svc.cluster.local:5000"]
http = true
# ....
- name: Build and push
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=registry,ref=docker-registry.docker-registry.svc.cluster.local:5000/my-app:latest
cache-to: type=registry,ref=docker-registry.docker-registry.svc.cluster.local:5000/my-app:latest,mode=max
provenance: false
env:
DOCKER_BUILD_SUMMARY: false
DOCKER_BUILD_RECORD_UPLOAD: false
Test Balancing
For splitting test execution based on timing, I found the best solution I could find: split-test. I just preinstalled it on my custom runner image. Assuming you’ll keep timings only on the main branch, it should work pretty well.
Parallelization
In general github actions provide out of the box support for job matrixes which works well. The ability to split CI config into smaller chunks is also a nice addition (along with reusable workflows syntax which is pretty neat once you wrap your head around it).
Conclusion
As I like the Github’s UI around actions and general dev experience when it comes to setting up simple workflows - self hosting it, trying to understand rather not ideal runners source code and all the quirks along the way was not fun. Next I will look into woodpecker because I really miss my Drone CI which I used to use a few years back (where all you had to do was setup two docker containers and things used to work 😆)