Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reports can be off with out-of-sync documents #35

Open
LFDM opened this issue Aug 14, 2014 · 0 comments
Open

Reports can be off with out-of-sync documents #35

LFDM opened this issue Aug 14, 2014 · 0 comments

Comments

@LFDM
Copy link
Member

LFDM commented Aug 14, 2014

#34 recovered from various errors, including problems with artificial (elliptic) word nodes/tokens.

Such occurrences aren't really well reflected in the reports - in fact, it can happen that they don't appear at all and if they do, they are wrong.

I wasn't able to fix this quickly. Problems arise mainly from the way we are comparing documents with each other.

When the Review file contains an additional node, the report will report it as totally right, even when the Gold file doesn't contain it at all.

Error counts are written to the report of a Gold file and we check for errors by iterating through all tokens of a Gold file and looking at the equivalent token in the Reviewable file. When the Gold file contains 8 tokens, we ask 8 tokens in the Reviewable file if there are difference - when there is a 9th token in the Reviewable file, it just doesn't get checked and we have now way of knowing that it even exists.

It would probably be good to make a quick check if the token counts of a sentence are the same when we start to do the comparison - and if they are not, make a special check that takes artificial tokens into account.

This is of course quite simple, the problem is that I don't really know at the moment, what the output of such a thing should be. This also plays a role when artificial tokens are encountered the other way around (Gold file has more tokens than the reviewable). In such a case #34 inserts a dummy node in the Reviewable file to avoid exceptions when doing comparisons. Empty nodes are reporting empty values for every attribute they might have. It's now perfectly valid to leave specific parts of an elliptic node unannotated - in such a case an inexistant node in the Reviewable file would report back that they are the same (both containing unnannotated values), when in fact this should be an error.

Need to think a little more about this before making another attempt at solving this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant