09-25-2013
545,
114
Join Date: Jul 2013
Last Activity: 5 January 2020, 9:33 PM EST
Location: Dallas, Texas
Posts: 545
Thanks Given: 14
Thanked 114 Times in 111 Posts
This is a broad subject. Technology has never really been the issue of effectively monitoring an IT infrastructure. We've had the tools for over 20 years now and the problem has always been effective use of and implemenation of tools, It should start from the top with 4 things: a plan, a team/roles, the toolset, and processes to manage the infrastructure.
You raise the issue of non trivial methods so that suggests you're more interested in technical mechanisms. In this case it's best to ask something more specific. The best area I can point you to is this concept that is emerging and it's arguably steeped in virtualization. The concept is Reliability and Serviceabilty (RAS). Computation is becoming non-stop and this means that you can still compute and service the machine at the same time. Hardware reliability is well defined and there are predictive methods for handling this. In fact,every component, network, o/s... is well defined...so I don't really understand the "non-trivial" methods part. Whatever the specific, monitoring in general should support the emerging concept of RAS. Now that term has been mainly associated with hardware, but I think the concept extends to the entire infrastructure. I would be interested to hear more of what you have been working on and what you're targeting.