how does the status monitoring website work under the hood?

alexdeathway ( @alexdeathway@programming.dev ) · edit-2 1 year ago

how does the status monitoring website work under the hood?

doeknius_gloek ( @doeknius_gloek@discuss.tchncs.de ) · 1 year ago

You should check out Uptime Kuma which offers different monitor types. This should give you a good start for your own implementations. Or maybe you’ll find that Uptime Kuma already covers your usecase.

SteveTech ( @SteveTech@programming.dev ) · 1 year ago

A lot of external status services just send a HTTP request to a certain url, if it succeeds then it’s up, if it errors or times out then it’s down. They also usually let you check if TCP ports do the usual handshake thing if you aren’t using HTTP.

The response time can also be used to check if a site is running slower than usual too, and if you have a use for it you can usually specify the required response code for success.

Although I wouldn’t be surprised if GitHub has some per-server analytics they can also use to estimate the load, but Instatus would work as described above.

Sometimes these sorts of things are referred to as health checks, if you’re looking for search terms. For example Docker can be set up to poll a container’s web server every few minutes, and mark it as unhealthy it if it stops replying using the HEALTHCHECK instruction in the Dockerfile.

towerful ( @towerful@programming.dev ) · 1 year ago

A webservice can be passively monitored.
So, the status system would check DNS records, ping IP addresses and do a get request to check it gets a 200 response. Further metrics like ping and response times could be monitored and report if they are too high, indicating heavy load.
Uptime Kuma is a foss project that is popular amongst self-hosters.

A webservice can actively report for monitoring. So a webservice would monitor its CPU/RAM/network usage, database connections, cache misses, stuff like that. If you are load balancing, then an additional service would be needed to aggregate the results of all these and decide when its degraded performance due to too many nodes being offline/overloaded.
Things like prometheus, netdata can do the metrics.

Or, like how i think a lot of these work, just report it manually. Ive seen quite a few companies that report green status, despite having fairly huge issues

Haatveit ( @Haatveit@beehaw.org ) · edit-2 1 year ago

I can’t give an authorative answer (not my domain), but I think there are two ways these types of things are done.

First is just observing the page or service as an external entity; basically requesting a page or hitting an endpoint, and just tracking whether you get a response (if not, it must be down), or for measuring load level in a very naive way, track the response time. This is easy in the sense that you need no special access to the target. But it’s also limited in its accuracy.

Second way, like what your github example is doing, is having access to special api endpoints for (or direct access to) performance metrics. Since the github status page is literally ran by Github, they obviously have easy access to any metric they could want. They probably (certainly) run services whose entire job is to produce reliable data for their status page.

The minute details of each of these options is pretty open ended; many ways to do it.

Just my 5¢ as a non-web developer.