Glad you figured it out already.
What's regarding
instrumentation, I found it very helpful once I had basic monitoring data available for all my servers. I'm still sticking to the solution, I'm knowing and using for many years now(check_mk, open source of course), as it is easy to handle, flexible to extend and with thousands of check plugins ready at hand if needed and lots of features available if you need to do more. So you have the basic metrics of your equipment in reach.
Some examples of many basic graphs I which you get ready configured out of the box:
Network Interface Usage
Memory/Swap Usage
Filesystem grow and trending
So it's just a few clicks away to check and you'll get informed about all the basic stuff(disk full, memory full, cpu overloaded, network errors, ...), so you do not have to care for yourself in case of trouble and often you'll notice anomalies before it get's critical.
There maybe a lot of hot stuff out there like prometheus, netdata(
demo), grafana(
demo), ... but that far exceeds my needs and costs me too much - in terms of time and energy to get acquainted with - which I rather invest in other areas.