Present Current Useful Telemetry Metrics and Gap Analysis
Description
Acceptance / Success Criteria
Attachments
related to
Lucidchart Diagrams
Activity

Jose April 20, 2022 at 8:17 PM
Questions we need to answer with the Telemetry:
Are people using ALEC on a daily basis?
Total Users using Situations vs. Total Active OpenNMS Installs with ALEC
Are we improving Mean Time to Repair (MTTR) and Mean time Between Failures (MTBF)?
Active Situations vs Alarms Managed
Average Alarm Time Open of both, Active Alarms and Closed Alarms
Are we being accurate?
Accepted vs. Rejected vs Modified Solutions
(Future functionality).
Are we reducing Noise?
New Situations vs. Closed Situations
Are we reducing the Sea of Red? Are we addressing the most important ones?
Alarm Count by Criticality (Minor, Major, Critical, etc.)
*KPIs Needed:*
Unique Users Viewing Situations per Day (or even page loads of Situations)
OpenNMS Installs with ALEC enabled per day.
Active Alarms At Over Time
Active Situations Over Time
Alarm Elapsed Time (open and closed)
Situations Open/Closed over Time
Alarm Count by Severity over Time
Details
Details
Details
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

Research Telemetry requirements so that we can figure out what metrics we need and why. Output is metrics and purpose.