BetterStack uptime & logs
Self-hosted Grafana is great for “what’s happening now”, but if the host itself falls over, you can’t open Grafana to find out why. This page wires TLJH up to BetterStack so that:
- A 5-minute heartbeat tells BetterStack “I am alive” — and pages you if it stops.
- Journald logs from JupyterHub, Traefik, Prometheus and our custom services are streamed to BetterStack via Vector, so you can search them after a crash.
- Prometheus mirrors its metrics to BetterStack via
remote_write, so dashboards and alerts survive even if the host is gone.
You’ll need a BetterStack account. The free tier is enough for one TLJH instance.
1. Uptime heartbeat
Section titled “1. Uptime heartbeat”This is the simplest piece: a tiny shell script curls JupyterHub’s /hub/health endpoint, and on success pings a BetterStack heartbeat URL. If the script ever fails to ping (or pings the /fail variant), BetterStack alerts you.
-
Create a heartbeat in BetterStack
BetterStack dashboard → Uptime → Heartbeats → Create heartbeat.
- Name:
jupyterhub - Period:
5 minutes - Grace period:
2 minutes
Copy the heartbeat URL — it looks like
https://uptime.betterstack.com/api/v1/heartbeat/<unique-id>. - Name:
-
Drop the script in place
Terminal window sudo nano /usr/local/bin/jupyterhub-heartbeat.sh/usr/local/bin/jupyterhub-heartbeat.sh #!/usr/bin/env bashset -uo pipefailHEARTBEAT='https://uptime.betterstack.com/api/v1/heartbeat/<your-heartbeat-id>'HEALTH_URL='https://localhost/hub/health'if curl -fsSk --max-time 10 "$HEALTH_URL" > /dev/null; thencurl -fsS --retry 3 --max-time 10 "$HEARTBEAT" > /dev/nullelsecurl -fsS --retry 3 --max-time 10 "$HEARTBEAT/fail" > /dev/nullexit 1fiThe
-kflag on the health-check is intentional: if you’re using an internal-CA cert forhttps://localhost, the chain won’t verify, but a connection tolocalhostis trustworthy by definition.Terminal window sudo chmod +x /usr/local/bin/jupyterhub-heartbeat.sh -
Wrap it in a systemd timer
Terminal window sudo nano /etc/systemd/system/jupyterhub-heartbeat.service/etc/systemd/system/jupyterhub-heartbeat.service [Unit]Description=JupyterHub health -> BetterStack heartbeatAfter=network-online.targetWants=network-online.target[Service]Type=oneshotExecStart=/usr/local/bin/jupyterhub-heartbeat.shTerminal window sudo nano /etc/systemd/system/jupyterhub-heartbeat.timer/etc/systemd/system/jupyterhub-heartbeat.timer [Unit]Description=Run JupyterHub heartbeat every 5 minutes[Timer]OnBootSec=1minOnUnitActiveSec=5minAccuracySec=15sPersistent=true[Install]WantedBy=timers.targetEnable:
Terminal window sudo systemctl daemon-reloadsudo systemctl enable --now jupyterhub-heartbeat.timer -
Verify
Terminal window sudo systemctl start jupyterhub-heartbeat.servicejournalctl -u jupyterhub-heartbeat.service -n 5 --no-pagerWithin a couple of minutes, the BetterStack heartbeat row should turn green.
2. Logs via Vector
Section titled “2. Logs via Vector”Vector tails the systemd journal and ships interesting units to BetterStack as structured JSON. We deliberately keep the include-list narrow so we don’t pay for noise.
-
Create a “Logs” source in BetterStack
BetterStack dashboard → Telemetry → Sources → Connect source → “Vector”.
Note the ingesting host (looks like
s2399753.eu-fsn-3.betterstackdata.com) and source token. You’ll paste them into the Vector config. -
Install Vector
Terminal window curl -1sLf 'https://repositories.timber.io/public/vector/cfg/setup/bash.deb.sh' | sudo -E bashsudo apt install -y vector -
Configure Vector
Terminal window sudo nano /etc/vector/vector.yamlReplace the contents with:
/etc/vector/vector.yaml data_dir: /var/lib/vectorsources:journald:type: journaldinclude_units:- jupyterhub.service- traefik.service- prometheus.service- prometheus-node-exporter.service- jupyterhub-heartbeat.service- vector.service- jupyter-poison-cleaner.servicetransforms:better_stack_transform:type: remapinputs:- journaldsource: |.dt = del(.timestamp).host = del(."_HOSTNAME") || .host.unit = del(."_SYSTEMD_UNIT") || .unit.message = .message || ""sinks:better_stack:type: httpmethod: postinputs:- better_stack_transformuri: https://<your-logs-host>.betterstackdata.com/encoding:codec: jsoncompression: gzipauth:strategy: bearertoken: <your-source-token>The
remapstep renames Vector’s default field names into the ones BetterStack’s UI expects (dt,host,unit,message).Add new units to
include_unitswhenever you stand up another systemd service worth keeping logs for — the auto-cleaner, for example, is already in the list above so itsJUPYTER_CLEANUP …event lines flow straight into BetterStack. -
Allow Vector to read the journal
Vector’s package adds the
vectoruser to thesystemd-journalgroup automatically, but verify:Terminal window id vector# expect: ... groups=...,systemd-journal -
Start Vector
Terminal window sudo systemctl enable --now vectorsudo systemctl status vector --no-pager | head -10BetterStack’s source page should start showing live log volume within ~30 seconds.
3. Metrics via Prometheus remote_write
Section titled “3. Metrics via Prometheus remote_write”If you’ve followed the monitoring guide, you’ve already got Prometheus scraping locally. With one block of config it can also stream every sample off-host to BetterStack — handy for dashboards that survive a host outage and for alerts that fire even when the box is down.
-
Create a “Metrics” source in BetterStack
BetterStack → Telemetry → Sources → Connect source → “Prometheus remote_write”.
Note the ingest URL (
https://s<id>.<region>.betterstackdata.com/metrics) and bearer token. -
Add a
remote_writeblock to/etc/prometheus/prometheus.ymlAppend (don’t replace) at the bottom:
/etc/prometheus/prometheus.yml remote_write:- url: https://<your-metrics-host>.betterstackdata.com/metricsauthorization:credentials: <your-source-token> -
Reload Prometheus
Terminal window sudo systemctl restart prometheusjournalctl -u prometheus -n 20 --no-pager | grep -i remoteYou should see a “Started WAL watcher” line. BetterStack’s metrics source page will start ticking up a sample-rate within a minute.
What to alert on
Section titled “What to alert on”Once these three pieces are wired up, in BetterStack:
- Uptime → Heartbeat: alert immediately if the heartbeat goes red. This is your “is the host alive?” signal.
- Telemetry → Logs: build a query for
unit:"jupyter-poison-cleaner.service" AND message:"JUPYTER_CLEANUP"and have it page you on bursts (a single student dumping a 50 MB blob into a notebook is fine; ten students in a minute is a probable lesson-wide problem). - Telemetry → Metrics: alert on
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.05for sustained low free memory, and on per-userjupyter_user_memory_bytes > <your-limit>to catch users hitting their cgroup ceiling.