Add clock skew correction mechanism

Description

The clock skew correction mechanism allows to corrct all timestamps related to a flow by assuming the receive time is more accurate then the exported timestamp. This only triggers if the receive time and the exported time differs more than a configurable threshold.

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

fooker November 30, 2020 at 10:35 AM

fooker November 26, 2020 at 11:05 PM

fooker November 18, 2020 at 1:18 PM

I think I've got some more insights about what is happening: The whole pipeline maintains a watermark which acts as a clock and therefore can only be moved forward. The value of the clock is taken from last_switched of the incoming flows. One can think about it as something like current_time is the latest last_switched we had seen so far over all incoming flows. This watermark is used to determine if a window is still open or should be closed.

If one of the exporters has a skewed clock and assumes it is in the future, all current windows will be closed and flows associated to these windows are considered late and (depending on the allowedLatenessMs parameter) will be dropped.

We were able to reproduce this behavior in a test and found a solution which builds windows by time-frame and exporter node ID. This should help to avoid interfering watermarks. Things coming "from the future" are now held in open windows until the the window gets closed depending on wall-clock time of the flink cluster. Flows "from the past" are still affected by allowedLatenessMs.

fooker November 5, 2020 at 2:18 PM

Zoë Knox October 27, 2020 at 1:44 PM

yes.

Fixed

Details

Assignee

Reporter

Components

Sprint

Fix versions

Affects versions

Priority

PagerDuty

Created November 30, 2020 at 10:34 AM
Updated November 30, 2020 at 10:35 AM
Resolved November 30, 2020 at 10:35 AM