Some notifications not persisted despite e-mail being sent - race condition?

Description

Creating from support ticket https://mynms.opennms.com/Ticket/Display.html?id=2685

Received an e-mail notification for a (OSPF LSA expire) trap, but the Notification never made it into the Notification database.

Specifically:

Notice #130 - Generated by 20126
Notice #131 - Generated by 20127

root@nms1:~# psql -A -U opennms opennms -c 'SELECT * FROM notifications WHERE notifyid=131;'
notifyid|textmsg|subject|numericmsg|pagetime|respondtime|answeredby|nodeid|interfaceid|serviceid|queueid|eventid|eventuei|notifconfigname
(0 rows)

root@nms1:~# psql -A -U opennms opennms -c 'SELECT * FROM notifications WHERE notifyid=130;'
notifyid|textmsg|subject|numericmsg|pagetime|respondtime|answeredby|nodeid|interfaceid|serviceid|queueid|eventid|eventuei|notifconfigname
(0 rows)

At almost the exact time the trap came in, you can see this in the log:

2013-12-18 05:28:20,634 ERROR [Thread-318849] NotificationTask: Could not insert notice info into database, aborting send notice
org.postgresql.util.PSQLException: ERROR: insert or update on table "usersnotified" violates foreign key constraint "fk_notifid2"
Detail: Key (notifyid)=(128) is not present in table "notifications".
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeUpdate(NewProxyPreparedStatement.java:105)
at org.opennms.netmgt.config.NotificationManager.updateNoticeWithUserInfo(NotificationManager.java:748)
at org.opennms.netmgt.notifd.NotificationTask.run(NotificationTask.java:242)

2013-12-18 05:28:20,661 ERROR [Thread-318855] NotificationTask: Could not insert notice info into database, aborting send notice
org.postgresql.util.PSQLException: ERROR: insert or update on table "usersnotified" violates foreign key constraint "fk_notifid2"
Detail: Key (notifyid)=(129) is not present in table "notifications".
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeUpdate(NewProxyPreparedStatement.java:105)
at org.opennms.netmgt.config.NotificationManager.updateNoticeWithUserInfo(NotificationManager.java:748)
at org.opennms.netmgt.notifd.NotificationTask.run(NotificationTask.java:242)

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Jesse White April 12, 2017 at 1:51 PM

It looks like the notifications tasks only processed after a successful insertion in the notifications table, so my guess would be that the notifications are somehow being deleted. Also, the notifications table does have foreign keys with ON DELETE CASCADE for both the nodeid and the eventid columns, so this could explain the removal of the notifications in some cases.

I think we'll need a full set of logs to be able to isolate the problem here.

Seth Leger April 10, 2017 at 1:27 PM

The foreign key constraint messages are a symptom but the real problem is that the row is missing in the notifications table. Either it was never inserted properly, or it was deleted somehow before the NotificationManager.updateNoticeWithUserInfo() method executed.

Seth Leger April 9, 2015 at 11:29 AM

We need to try to reproduce this issue before trying to fix it.

Details

Assignee

Reporter

Sprint

Affects versions

Priority

PagerDuty

Created March 27, 2014 at 4:30 PM
Updated September 21, 2021 at 6:22 PM