Outages are not getting created

Description

Pollerd is failing with the following exception:

2015-02-27 09:02:10,431 WARN [OpenNMS.Poller.DefaultPollContext-Thread] o.o.n.e.EventIpcManagerDefaultImpl: run: an unexpected error occured during ListenerThread OpenNMS.Poller.DefaultPollContext
org.springframework.dao.DataIntegrityViolationException: could not insert: [org.opennms.netmgt.model.OnmsOutage]; SQL [insert into outages (ifLostService, ifRegainedService, ifserviceId, svcLostEventId, svcRegainedEventId, suppressTime, suppressedBy, outageId) values (?, ?, ?, ?, ?, ?, ?, ?)]; constraint [one_outstanding_outage_per_service_idx]; nested exception is org.hibernate.exception.ConstraintViolationException: could not insert: [org.opennms.netmgt.model.OnmsOutage]
at org.springframework.orm.hibernate3.SessionFactoryUtils.convertHibernateAccessException(SessionFactoryUtils.java:643) ~[org.apache.servicemix.bundles.spring-orm-3.2.9.RELEASE_1.jar:?]
at org.springframework.orm.hibernate3.HibernateAccessor.convertHibernateAccessException(HibernateAccessor.java:412) ~[org.apache.servicemix.bundles.spring-orm-3.2.9.RELEASE_1.jar:?]
at org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:412) ~[org.apache.servicemix.bundles.spring-orm-3.2.9.RELEASE_1.jar:?]
at org.springframework.orm.hibernate3.HibernateTemplate.executeWithNativeSession(HibernateTemplate.java:375) ~[org.apache.servicemix.bundles.spring-orm-3.2.9.RELEASE_1.jar:?]
at org.springframework.orm.hibernate3.HibernateTemplate.saveOrUpdate(HibernateTemplate.java:738) ~[org.apache.servicemix.bundles.spring-orm-3.2.9.RELEASE_1.jar:?]
at org.opennms.netmgt.dao.hibernate.AbstractDaoHibernate.saveOrUpdate(AbstractDaoHibernate.java:410) ~[opennms-dao-15.0.1.jar:?]
at org.opennms.netmgt.poller.QueryManagerDaoImpl.openOutage(QueryManagerDaoImpl.java:119) ~[opennms-services-15.0.1.jar:?]
at org.opennms.netmgt.poller.QueryManagerDaoImpl.openOutage(QueryManagerDaoImpl.java:112) ~[opennms-services-15.0.1.jar:?]
at org.opennms.netmgt.poller.DefaultPollContext$1.run(DefaultPollContext.java:314) ~[opennms-services-15.0.1.jar:?]
at org.opennms.netmgt.poller.pollables.PendingPollEvent.processPending(PendingPollEvent.java:145) ~[opennms-services-15.0.1.jar:?]
at org.opennms.netmgt.poller.DefaultPollContext.onEvent(DefaultPollContext.java:407) ~[opennms-services-15.0.1.jar:?]
at org.opennms.netmgt.eventd.EventIpcManagerDefaultImpl$EventListenerExecutor$2.run(EventIpcManagerDefaultImpl.java:176) [org.opennms.features.events.daemon-15.0.1.jar:?]

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Jesse White March 11, 2015 at 2:26 PM
Edited

Fixed in 0369ce4cb38ee4d26ca508714511ccfcc2a6e430. This is currently in both develop and release-15.0.2.

Outages are now created while the poller's tree lock is held and updated with the event id once the down event is received back to the event bus.

Jesse White March 10, 2015 at 4:15 PM

Similarly to . This is behavior is triggered by a race condition involving re-ordering of events.

If the following sequence of events are sent for a particular service:
nodeLostService
nodeGainedService
nodeLostService

But they are received as follows:
nodeGainedService
nodeLostService
nodeLostService

The existing poller daemon code will attempt to create a second outstanding outage for the service.

Jesse White March 2, 2015 at 3:29 PM

The exception is now properly caught in e30cfed3817cceaeed945d38dd383f6ab90c72f2.

Waiting on further details to identify the root cause.

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

PagerDuty

Created March 2, 2015 at 3:16 PM
Updated May 11, 2015 at 2:49 PM
Resolved March 11, 2015 at 2:27 PM