threshold exceeded events deleted and re-created on opennms restart
Description
Environment
Acceptance / Success Criteria
is duplicated by
Lucidchart Diagrams
Activity

Antonio Russo March 21, 2012 at 5:26 AM
You can use the event translator to persist the critical threshold events you are looking for. So you save the right timestamp. Also you can avoid recreating the translated event if already active into the database.

Andrea Russos March 21, 2012 at 4:07 AM
Hi Alejandro!
..as far as you think only to notifications i agree with you that an automation which prevent opennms to send a new notification will help ...
..But..
the real problem ( almost IMO ) is that if opennms restart, the thresholds are seen as NEW critical events, with the time stamp ( i mean the time in which the critical event occur ) corresponding to the service restart; also ( as i've already written ) if the old alarms where acknowledged ( by helpdesk people, as an example ) the new ones related to the same problems where not ....
I think this is a problem which may involve also SLAs ( if they are correlated to an opennms deploy .... )
I think would be better if thresholds ( which are critical events, in most cases ) would be persistent to backend DB, don't you ??
--Andrea

Alejandro Galue March 20, 2012 at 1:25 PM
The states of the thresholds are stored in memory (i.e. they are not persisted to a file or the database). For this reason when you stop OpenNMS, those states are lost and if the threshold condition still exist after starting OpenNMS, you will receive new notifications.
You can avoid this by creating some automations that can prevent sending the second notification if there is an alarm already raised for a particular threshold violation.
Makes sense?
Set up thresholds for filesystem usage on servers; set up notifications when a trigger value is met ( i use datasourcetype=dskIndex in datacollection ); when the trigger value is crossed a new event is created and the corresponding notification is sent.
If you restart opennms, the event related to the threshold is deleted from the db and a new one is created: this result in a new notification sent for the same threshold with a new time-stamp ( corresponding to the restart time and not to the real-event time ).
Also, if the event was acknowledged at the time of it's occurence , now ( 'couse it's a new event ) is not acknowledged...