Prometheus-alertmanager and graphana (especially graphana!) seem a bit too involved for monitoring my homelab (prometheus itself is fine: it does collect a lot of statistics I don’t care about, but it doesn’t require configuration so it doesn’t bother me).
Do you know of simpler alternatives?
My goals are relatively simple:
- get a notification when any systemd service fails
- get a notification if there is not much space left on a disk
- get a notification if one of the above can’t be determined (eg. server down, config error, …)
Seeing graphs with basic system metrics (eg. cpu/ram usage) would be nice, but it’s not super-important.
I am a dev so writing a script that checks for whatever I need is way simpler than learning/writing/testing yaml configuration (in fact, I was about to write a script to send heartbeats to something like Uptime Kuma or Tianji before I thought of asking you for a nicer solution).
Have you played around with Grafana? It really is quite simple if you have prometheus already working.
For a home lab environment you dont even need to use prometheus-alertmanager. Grafana can handle alerts as well.
Grafana also has hundreds of pre-made dashboards you can import. Node monitoring is quite straightforward.
Assuming you have prometheus good to go, all you need to do is go to Grafana - Datasources, create a new datasource, point to your prometheus instance.
Then you can import the dashboards you want.
Now you can setup your alerts - you can use SMTP, telegram, slack among others for your notifications.
The easiest solution I found and use is Beszel.
https://github.com/henrygd/beszel
Just a hub with the most important stats and some simple agents on the servers.
ICINGA/NAGIOS? you can even feed data already collected by Prometheus to it if you want.
I mean, you get a lot of advantages from fluffy pretty systems. But extracting data from df and systemctl and curling it into telegram is going to be like a 10 line bash script called from a one-line cron job.
I pump a lot of complicated metrics through Prometheus / grafana to get graphs and history.
Most of my critical stuff is still in Nagios and instead of using nagios standardized plugins I just query the operating system directly in bash.
Is there a self hosted OpenTelemetry consumer?
Edit: found better resources
https://linuxhandbook.com/syslog-guide/
https://github.com/linuxserver/docker-syslog-ng
That should be a good place to start. Syslog will do what you want.
Syslog is considerable overkill for home lab monitoring.