Enrichment of flows takes longer than persisting flows to elastic
Description
With the current statistics we have, enriching a flow takes roughly 3 ms on average. Investigating the issue showed, that a big part of that time is spent classifying the flow. In the worst case scenario (no mapping exists) all rules have to be checked, which are a lot by default. The algorithm to determine if there is a matching rule should be improved by e.g. pre-sorting/filtering by port-mapping.
As it is not clear yet, where the rest of the time is spent some additional statistics should be gathered:
How long does it take to actually classify a flow
How long does it take to load a node from the database
Acceptance / Success Criteria
None
Lucidchart Diagrams
Activity
Show:
Jesse White February 12, 2018 at 8:10 PM
Ignore my comment above Streams are lazy, so we don't actually perform any more work then is necessary.
Jesse White February 12, 2018 at 8:02 PM
I found some additional places where we can improve the logic to gain better performance: 1. DefaultClassificationEngine#classify currently calls #classify on all of the filteredClassifiers, and then takes the first. We should rework this logic to avoid calling #classify after the first match is made. 2. Similar to above, ProtocolMatcher#matches compares the protocol against all of the protocols (in the matcher) and then returns checks if a match was made. We should rework this to return immediately on the first match.
With the current statistics we have, enriching a flow takes roughly 3 ms on average.
Investigating the issue showed, that a big part of that time is spent classifying the flow.
In the worst case scenario (no mapping exists) all rules have to be checked, which are a lot by default. The algorithm to determine if there is a matching rule should be improved by e.g. pre-sorting/filtering by port-mapping.
As it is not clear yet, where the rest of the time is spent some additional statistics should be gathered:
How long does it take to actually classify a flow
How long does it take to load a node from the database