Reports can be off with out-of-sync documents #35

LFDM · 2014-08-14T11:01:06Z

#34 recovered from various errors, including problems with artificial (elliptic) word nodes/tokens.

Such occurrences aren't really well reflected in the reports - in fact, it can happen that they don't appear at all and if they do, they are wrong.

I wasn't able to fix this quickly. Problems arise mainly from the way we are comparing documents with each other.

When the Review file contains an additional node, the report will report it as totally right, even when the Gold file doesn't contain it at all.

Error counts are written to the report of a Gold file and we check for errors by iterating through all tokens of a Gold file and looking at the equivalent token in the Reviewable file. When the Gold file contains 8 tokens, we ask 8 tokens in the Reviewable file if there are difference - when there is a 9th token in the Reviewable file, it just doesn't get checked and we have now way of knowing that it even exists.

It would probably be good to make a quick check if the token counts of a sentence are the same when we start to do the comparison - and if they are not, make a special check that takes artificial tokens into account.

This is of course quite simple, the problem is that I don't really know at the moment, what the output of such a thing should be. This also plays a role when artificial tokens are encountered the other way around (Gold file has more tokens than the reviewable). In such a case #34 inserts a dummy node in the Reviewable file to avoid exceptions when doing comparisons. Empty nodes are reporting empty values for every attribute they might have. It's now perfectly valid to leave specific parts of an elliptic node unannotated - in such a case an inexistant node in the Reviewable file would report back that they are the same (both containing unnannotated values), when in fact this should be an error.

Need to think a little more about this before making another attempt at solving this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reports can be off with out-of-sync documents #35

Reports can be off with out-of-sync documents #35

LFDM commented Aug 14, 2014

Reports can be off with out-of-sync documents #35

Reports can be off with out-of-sync documents #35

Comments

LFDM commented Aug 14, 2014