…or rather: How I choose to backup databases when using Nomad.

When I was researching backup options after switching to Nomad, I considered using something like docker-db-backup. I quickly realized one downside of having to remember to align postgres-client (backup container) with the version of the server (database container). And as I was running at that time five different databases (Postgres/MySQL) it was a deal-breaker for me.

After more reading, I have decided to write a bash script that would be using Nomad’s raw_exec and cron capabilities.

Leveraging Consul will help obtain the allocation id of the task we’re interested in. Then we can execute nomad alloc exec to call pg_dump within a database docker container.

Then it’s up to us what to do with that dump - I have decided to pipe the output to docker again by using s3cmd docker image to put it on the S3 bucket (actually a Minio bucket). Note: I recommend using a backup location outside your data center as a good practice. I was using Minio as a training exercise.

Preparing the ACL token for the script

You can skip this part if you’re not using Nomad’s ACL capabilities.

# nomad-exec-policy.hcl
namespace "default" {
  policy = "write"
  capabilities = ["alloc-exec"]
}

Create a new policy using the file above:

nomad acl policy apply -description "Nomad exec policy" nomad-exec nomad-exec-policy.hcl

Create new token - Secret ID is the one you will need:

nomad acl token create --global -name="Nomad exec token" -policy=nomad-exec -type=client

Accessor ID  = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx
Secret ID    = xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx

Nomad database job and backup job

Prerequisites: I’m using Consul for service discovery and Vault for fetching passwords, but your mileage may vary here.

After you enabled raw_exec on the client, you should be good to create a new backup job.

I’m using Vault to obtain DB’s credentials for the backup and S3 credentials for s3cmd. S3cmd will send a backup to Minio exposed somewhere within the private network on 9000 port.

To make it all work together, we need a database task that exposes its allocation id (Nomad allocation id). We can register service in Consul and use the tags feature to do that.

To give you a better picture here is the database job (slimmed-down version, removed irrelevant definitions)

job "db-task" {
  datacenters = ["dc1"]
  type        = "service"

  vault {
    policies = ["nomad-read"]
  }

  group "db-task" {
    network {
      port "db" {
        to = 5432
        # I'm using internal network called 'private'
        host_network = "private"
      }
    }

    task "db-task" {
      driver = "docker"

      config {
        # ommiting volume mount here for brievity
        image = "postgres:14.0-alpine3.14"
        ports = ["db"]
      }

      template {
        data = <<EOH
{{- with secret "kv-v1/nomad/db/postgres" -}}
POSTGRES_PASSWORD="{{ .Data.password }}"
POSTGRES_USER="{{ .Data.user }}"
POSTGRES_DB="{{ .Data.db }}"
{{- end -}}
        EOH
        destination = "secrets/file.env"
        env         = true
      }

      resources {
        cpu    = 200
        memory = 200
        memory_max = 300
      }

      service {
        name = "db-task"
        port = "db"

	      # backup service will rely on that particular 'alloc' tag
        tags = ["alloc=${NOMAD_ALLOC_ID}"]

        check {
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

And here is - completely separated - backup job:

job "db-backup" {
  datacenters = ["dc1"]
  type        = "batch"

  vault {
    policies = ["nomad-read"]
  }

  periodic {
    cron             = "0 22 * * * *"
    prohibit_overlap = true
  }

  group "db-backup" {  
    task "postgres-backup" {
      driver = "raw_exec"

      config {
        command = "/bin/bash"
        args    = ["local/script.sh"]
      }

      template {
        data        = <<EOH
        set -e
        
        nomad alloc exec -task db-task $DB_ALLOC_ID \
        bin/bash -c "PGPASSWORD=$PGPASSWORD PGUSER=$PGUSER PGDATABASE=$PGDATABASE pg_dump --compress=4 -v" | \
        docker run -i --rm \
        -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
        -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
        d3fk/s3cmd:stable \
        --host=$S3_HOST_BASE \
        --no-ssl \
        --host-bucket=$S3_HOST_BASE -v \
        put - s3://$S3_BUCKET/$(date "+%Y-%m-%d---%H-%M-%S").dump.gz
        EOH
        destination = "local/script.sh"
      }

      template {
        data = <<EOH
{{- with secret "kv-v1/nomad/db/postgres" -}}
PGPASSWORD="{{ .Data.password }}"
PGUSER="{{ .Data.user }}"
PGDATABASE="{{ .Data.db }}"
{{ end }}

{{- with secret "kv-v1/nomad/s3/backup" -}}
AWS_ACCESS_KEY_ID="{{ .Data.access_key_id }}"
AWS_SECRET_ACCESS_KEY="{{ .Data.secret_access_key }}"

# here you also might want to set NOMAD_TOKEN env
# if you're using ACL capabilities
{{ end }}

# as service 'db-task' is registered in Consul
# we wat to grab its 'alloc' tag
{{- range $tag, $services := service "db-task" | byTag -}}
{{if $tag | contains "alloc"}}
{{$allocId := index ($tag | split "=") 1}}
DB_ALLOC_ID="{{ $allocId }}"
{{end}}
{{end}}

# relying on service DNS discovery provided by Consul
# to obtain Minio IP address
S3_HOST_BASE=minio.service.consul:9000
S3_BUCKET=my-bucket-name
        EOH
        destination = "secrets/file.env"
        env         = true

      }
      resources {
        cpu    = 200
        memory = 200
        memory_max = 300
      }
    }
  }
}

I like this approach as not much magic is going on here - we’re simply calling a plain bash script and piping output from a running docker container to another docker container. As long as there are no breaking changes in pg_dump, we can forget about the backup job - it should just work.