Enrichment of flows takes longer than persisting flows to elastic

Description

With the current statistics we have, enriching a flow takes roughly 3 ms on average.
Investigating the issue showed, that a big part of that time is spent classifying the flow.
In the worst case scenario (no mapping exists) all rules have to be checked, which are a lot by default. The algorithm to determine if there is a matching rule should be improved by e.g. pre-sorting/filtering by port-mapping.

As it is not clear yet, where the rest of the time is spent some additional statistics should be gathered:

  • How long does it take to actually classify a flow

  • How long does it take to load a node from the database

Acceptance / Success Criteria

None

Lucidchart Diagrams

Activity

Show:

Jesse White February 12, 2018 at 8:10 PM

Ignore my comment above Streams are lazy, so we don't actually perform any more work then is necessary.

Jesse White February 12, 2018 at 8:02 PM

I found some additional places where we can improve the logic to gain better performance:
1. DefaultClassificationEngine#classify currently calls #classify on all of the filteredClassifiers, and then takes the first. We should rework this logic to avoid calling #classify after the first match is made.
2. Similar to above, ProtocolMatcher#matches compares the protocol against all of the protocols (in the matcher) and then returns checks if a match was made. We should rework this to return immediately on the first match.

Markus von Rüden February 4, 2018 at 4:19 PM

Fixed

Details

Assignee

Reporter

Sprint

Fix versions

Affects versions

Priority

PagerDuty

Created February 2, 2018 at 7:53 AM
Updated June 3, 2019 at 8:05 AM
Resolved February 7, 2018 at 12:15 AM