8. The order of different SequenceModels is unpredictable #10

ernstleierzopf · 2020-06-13T08:28:51Z

The order of different SequenceModels is unpredictable. There is potential to create a better model by queueing the more frequent SequenceModels first.

landauermax · 2020-06-29T08:07:59Z

Could you please provide an example? I assume the issue is about SequenceModelElements that follow a FirstMatchModelElement. The order should always be deterministic, so running the parsergenerator multiple times should always result in the same parser. The sorting should be based on lexicographic ordering of the elements of the models inside the SequenceModelElements.

ernstleierzopf · 2020-06-29T08:44:03Z

Maybe the comment was not exactly right. SequenceModels are always the same for the same set and the same ordering of the set. However the ordering is of the SequenceModels is using the first seen elements first. It does not consider frequency of occurences. Rare sequences should be placed last to get better performance.

This is an advanced problem and should be considered to implement as the set of logs is fixed and limited. Decisions can be based on the whole list of logs instead of FIFO.

Test 3 must set fixed ordering of the first elements to produce consistent models.

landauermax · 2020-06-30T09:19:07Z

This should not be the case. SequenceModels are always ordered in reverse lexicographic ordering. Example:

Input (note that I disabled to aggregate fixed elements to create the SequenceModels):
a a a
a aaa aaa
a aa aa

Parser:
model = SequenceModelElement('sequence0', [
FixedDataModelElement('fixed1', b'a'),
FixedDataModelElement('fixed2', b' '),
FirstMatchModelElement('firstmatch3', [
SequenceModelElement('sequence4', [
FixedDataModelElement('fixed5', b'aaa'),
FixedDataModelElement('fixed6', b' '),
FixedDataModelElement('fixed7', b'aaa')]),
SequenceModelElement('sequence8', [
FixedDataModelElement('fixed9', b'aa'),
FixedDataModelElement('fixed10', b' '),
FixedDataModelElement('fixed11', b'aa')]),
SequenceModelElement('sequence12', [
FixedDataModelElement('fixed13', b'a'),
FixedDataModelElement('fixed14', b' '),
FixedDataModelElement('fixed15', b'a')])])])

It is visible in the parser that the SequenceModels are ordered by the content of the first element: aaa-aa-a. This is necessary to ensure that the AMiner attempts to enter more specific paths first. I agree that it would be good for performance to have the most frequent paths first, but this could create issues with the AMiner entering incorrect paths and lead to unparsed logs. Let us leave this issue open to find a solution that combines the advantages of both strategies in the future.

ernstleierzopf · 2020-06-30T13:24:54Z

I have added another unittest for the reverse lexicographic ordering. Considering this rule the other generated models are also probably working fine. This should also be tested with subtrees. Please leave this also open for testing the subtrees.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8. The order of different SequenceModels is unpredictable #10

8. The order of different SequenceModels is unpredictable #10

ernstleierzopf commented Jun 13, 2020

landauermax commented Jun 29, 2020

ernstleierzopf commented Jun 29, 2020 •

edited

Loading

landauermax commented Jun 30, 2020

ernstleierzopf commented Jun 30, 2020

8. The order of different SequenceModels is unpredictable #10

8. The order of different SequenceModels is unpredictable #10

Comments

ernstleierzopf commented Jun 13, 2020

landauermax commented Jun 29, 2020

ernstleierzopf commented Jun 29, 2020 • edited Loading

landauermax commented Jun 30, 2020

ernstleierzopf commented Jun 30, 2020

ernstleierzopf commented Jun 29, 2020 •

edited

Loading