I love Sentry since I discovered it many years ago. Back in the days, self-hosting it was really easy - a single Docker image which you would use for spinning up like 2-3 separate services, Postgres, Redis, a few lines of config, and you’re done.
Nowadays (2024), self-hosting Sentry requires spinning up 50+ different services - and that is of course without any fancy HA setup. It’s still doable, especially with Kubernetes, but the learning curve is definitely more steep. Then again, the feature set of Sentry itself is much richer - it’s not about just catching errors anymore; you have full-fledged build-in performance monitoring, session recording, and tons of other observation-related goodies.
One thing is kinda tricky is monitoring of your Sentry instance. Here is how I do it.
Use built-in endpoints
/_health/- to ensure your web is working/api/relay/healthcheck/live/- to ensure your Relay is alive/api/relay/healthcheck/ready/- to ensure your Relay is ready for events ingestion
…and it’s not enough
I ran into a situation where Relay went into some reconnecting spree (despite being live/ready) - at the moment I don’t remember the exact root cause, but it took me a while to realize that events are not being properly ingested. In the end, I decided to configure a cronjob which would trigger a Sentry event (exception) to one of my projects every ~hour. Then I muted that exception as obviously it was not actionable.
Once you have that exception in Sentry, it’s time to check if it’s being registered every hour.
Sentry exposes API endpoint where you can obtain issue details; one interesting part in that JSON response is lastEvent - you can query this endpoint every ~hour, parse the response, ensure it’s no older than ~2h (so you can have some overlap), and if it’s not - meaning that Sentry is nicely ingesting events.
You can do that using Cloudflare Workers, here is a part of Terraform code which I used to configure it:
resource "cloudflare_worker_script" "sentry_monitoring" {
account_id = var.cloudflare_account_id
name = "sentry-monitoring"
content = file("sentry.js")
plain_text_binding {
name = "ISSUE_ID"
text = "<your issue id>"
}
plain_text_binding {
name = "SENTRY_DOMAIN"
text = cloudflare_record.sentry.name // assuming you have this configured
}
}
resource "cloudflare_worker_secret" "sentry_auth_token" {
account_id = var.cloudflare_account_id
script_name = cloudflare_worker_script.sentry_monitoring.name
name = "SENTRY_API_TOKEN" // issues read-only token
secret_text = var.sentry_api_token // I'm using Terraform Cloud to set this variable
}
resource "cloudflare_worker_route" "sentry_monitoring" {
zone_id = cloudflare_zone.your_zone_id.id // ssuming you have this configured
script_name = cloudflare_worker_script.sentry_monitoring.name
pattern = "https://${cloudflare_record.sentry.name}/uptime*"
}
Here is the worker JS file:
addEventListener('fetch', event => {
event.respondWith(handleRequest())
})
async function handleRequest() {
// you might want to pass project id as argument here
const apiUrl = `https://${SENTRY_DOMAIN}/api/0/issues/${ISSUE_ID}/`
const token = SENTRY_API_TOKEN;
try {
const response = await fetch(apiUrl, {
headers: {
'Authorization': `Bearer ${token}`
}
})
const data = await response.json()
if (!response.ok) {
// worker logs is currently in beta, you can enable it in Cloudflare dashboard,
// seems like terraform provider is lagging behind
console.log(`Error: ${response.status} - ${response.statusText}`);
return new Response('API Error', { status: 500 });
}
const lastEventStr = data.lastSeen
const lastEventDate = new Date(lastEventStr)
const currentTime = new Date()
const diffInHours = (currentTime - lastEventDate) / (1000 * 60 * 60)
if (diffInHours <= 2) {
return new Response(lastEventStr, { status: 200 })
} else {
return new Response(lastEventStr, { status: 500 })
}
} catch (error) {
console.log('Error:', error);
return new Response("Unhandled error", { status: 500 })
}
}
Then under <your Sentry domain>/uptime I would have a Cloudflare worker returning 200/500 response - now it’s easy enough to add that endpoint as well to monitoring software of your choice.
Now you should be monitoring 4 different endpoints which should give you enough confidence that Sentry is indeed up & properly processing events.