Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8. The order of different SequenceModels is unpredictable #10

Open
ernstleierzopf opened this issue Jun 13, 2020 · 4 comments
Open

8. The order of different SequenceModels is unpredictable #10

ernstleierzopf opened this issue Jun 13, 2020 · 4 comments

Comments

@ernstleierzopf
Copy link

The order of different SequenceModels is unpredictable. There is potential to create a better model by queueing the more frequent SequenceModels first.

@landauermax
Copy link
Contributor

Could you please provide an example? I assume the issue is about SequenceModelElements that follow a FirstMatchModelElement. The order should always be deterministic, so running the parsergenerator multiple times should always result in the same parser. The sorting should be based on lexicographic ordering of the elements of the models inside the SequenceModelElements.

@ernstleierzopf
Copy link
Author

ernstleierzopf commented Jun 29, 2020

Maybe the comment was not exactly right. SequenceModels are always the same for the same set and the same ordering of the set. However the ordering is of the SequenceModels is using the first seen elements first. It does not consider frequency of occurences. Rare sequences should be placed last to get better performance.

This is an advanced problem and should be considered to implement as the set of logs is fixed and limited. Decisions can be based on the whole list of logs instead of FIFO.

Test 3 must set fixed ordering of the first elements to produce consistent models.

@landauermax
Copy link
Contributor

This should not be the case. SequenceModels are always ordered in reverse lexicographic ordering. Example:

Input (note that I disabled to aggregate fixed elements to create the SequenceModels):
a a a
a aaa aaa
a aa aa

Parser:
model = SequenceModelElement('sequence0', [
FixedDataModelElement('fixed1', b'a'),
FixedDataModelElement('fixed2', b' '),
FirstMatchModelElement('firstmatch3', [
SequenceModelElement('sequence4', [
FixedDataModelElement('fixed5', b'aaa'),
FixedDataModelElement('fixed6', b' '),
FixedDataModelElement('fixed7', b'aaa')]),
SequenceModelElement('sequence8', [
FixedDataModelElement('fixed9', b'aa'),
FixedDataModelElement('fixed10', b' '),
FixedDataModelElement('fixed11', b'aa')]),
SequenceModelElement('sequence12', [
FixedDataModelElement('fixed13', b'a'),
FixedDataModelElement('fixed14', b' '),
FixedDataModelElement('fixed15', b'a')])])])

It is visible in the parser that the SequenceModels are ordered by the content of the first element: aaa-aa-a. This is necessary to ensure that the AMiner attempts to enter more specific paths first. I agree that it would be good for performance to have the most frequent paths first, but this could create issues with the AMiner entering incorrect paths and lead to unparsed logs. Let us leave this issue open to find a solution that combines the advantages of both strategies in the future.

@ernstleierzopf
Copy link
Author

I have added another unittest for the reverse lexicographic ordering. Considering this rule the other generated models are also probably working fine. This should also be tested with subtrees. Please leave this also open for testing the subtrees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants