At some point you may notice that the Health roll-up doesn't appear to be working correctly in Live Maps.
When looking at a Dashboard, you notice that a tile is either RED or GREY, but when you drill into the tile and look at the sub-map, you find that all the objects are GREEN. We have also seen this when a Service shows up as GREY, but there is health on one or more of the perspective views. The point here is that, the health of a tile doesn't match the underlining objects health.
First thing to understand is Live Maps is technically following the health state that SCOM is showing us for the various objects. We don't make any changes to the look of the health state. We follow all monitors and rules for any Object, Group, or Distributed Application. This means if you found the same object in SCOM, you should see the same health status that Live Maps is displaying.
The following is to help you troubleshoot the issue based on our experience for this issue.
Quick Double Check
1. First thing to is to double check the Health status. This may sound obvious, but realize that GREY check marked objects will roll-up RED in Live Maps. Always double check this as sometimes you will see a bunch of check marks and think they are all GREEN.
2. In some cases, especially if SCOM is busy, a workflow will timeout or drop. In these cases, the simplest thing to do is put the Dashboard that is not rolling up correcting into Maintenance Mode. Let it stay in Maintenance Mode for the minimum of 5 mins and then let it come out of Maintenance Mode Naturally.
This can be done from several different places including from the Live Maps Portal. It doesn't matter where you do this from, but make sure you are putting the Live Map dashboard that is not rolling up correctly. You can place a the dashboard doing the following from the Live Maps Portal;
- Drill into the Dashboard that is not rolling up correctly.
- Use the Drop Down Arrow on the top menu bar and select "Start maintenance mode".
- Change the End Time. By default it is set to 30 mins later.
- It is up to you on if you want to include all the contained Objects. You only really need to put the group itself into Maintenance Mode,
- Click on Start.
- Click on OKAY to Acknowledge the Request has been sent. Keep in mind it make take a couple of mins for the Dashboard to go into Maintenance Mode. You may also need to refresh your browser window to verify the dashboard has gone into Maintenance Mode.
- After the Dashboard had come back out of Maintenance Mode you may find that the health of the dashboard is now correct.
Flush Server health State and Cache
3. Assuming that didn't work, then your next option is to Flush the Health State and Cache of your mgmt servers. The following steps should be done on each of your mgmt servers. Keep in mind that this will put a load on your servers for a brief period of time. After clearing the cache, your mgmt servers will go out and ask all the agents for the current status of the objects. How long the load last will vary depending on the size of your environment, but i generally find that it last around 10-30 mins
Agent Health Checks
4. This has a couple different items to check. How the agent is reporting can and does have a direct affect in the overall health rollup calculations
- From the SCOM console, go to Monitoring > Operations Manger > Agent Details
- Look at your Agent State from Health Service Watchers and compare that to the Agent State
- Does anything seem out of order. Meaning Does the Watcher show RED, but the Agent is Green for any of the agents. ESPECIALLY any agents that might be from computer that is in your Live Maps Dashboard that is not rolling up correctly.
- If so, you might need to go and Stop and restart the Agent on the affected computer.
Next Agent check
- From that same location, find the affected agents in the Live Maps Dashboard.
- After selecting them (either individually or several at a time), from the Task pane on the right, under Health Service Tasks, run Flush Health Service State and Cache
The above usually solves the problem, but is considered a one time or infrequent troubleshooting technique to clear an event. Keep in mind that if you are constantly needing to do the above, then you need to start looking into the health of your SCOM environment.
- Things to look at are properly sizing, including Databases, number of management servers, and even the number of agents reporting to the management servers.
- If you have a large SCOM environment i do recommend looking into a Keven Holman blog article - Tweaking SCOM 2012 Management Servers for large environments
- Other 3rd party Management packs. Certain management packs if left un-tuned can hit your SCOM environment hard and can prevent workflows from finishing in the database.
One last thing
If the above doesn't solve the problem, Feel free to contact Savision Support. In the end, it might take a call into Microsoft to solve the problem. You have a group health/calculation rollup issue and they can help isolate the root cause of the problem.