

We have a machine running some stuff on Docker, and little by little it has started to become important to keep an eye on it. However, looking for information on monitoring a Docker server it always seem to assume you’re running it in Swarm mode, which is not and WILL NOT be the case of this machine, Swarm adds a layer of complexity unneeded in this case.
What do you recommend for this case? I for one would love if the thing didn’t just give you a view of the things running on it but also gave you notifications if something went wrong (like if a container had to be restarted, or if one suddenly started eating all the CPU or something unusual).
I will be keeping an eye on this thread to see what other people do, but what I have done in the past is to have a couple different health checking strategies.
Gatus sounds pretty cool, I’ll definitely give it a closer look later. Maybe it’s the push I needed to go ahead and look into proper observability as a whole, log ingestion and whatnot. My homelab setup is sorely lacking on that department if I’m being honest lol
Uptime Kuma for web monitoring.
I’m experimenting with both Zabbix and Netdata to see which one I want to keep for monitoring resources on my hosts.
I use healthchecks.io to monitor backup scripts and cronjobs.
I’m using Autoheal to restart containers that are in an unhealthy state. For some containers this means I need to write my own health check. I mostly did this to resolve a rare issue where Plex would lock up but it’s helped in other scenarios too.
Have started experimenting with OpenTelemetry (https://opentelemetry.io/docs/what-is-opentelemetry/) to add observability to different parts of the stack running inside a Docker container.
Not gotten far enough to recommend anything specific, but there is big ecosystem of open source collectors and analytics tools out there.