-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Improve the calculation of similarity scores between answers and correct solutions #2
Comments
Rather leave the choice to the user of what is significant and what is not, as an option (see #25). This could include:
|
In my experience the order is completely off. There have been absurd sentences at the top (without any noticeable similarity) when the alphabetical sort gave me much more similar answers. |
@tobiornottobi Could you please send one or two screenshots with examples of such behavior? I'm only aware of this happening with missing or different diacritics, but I'll increase the priority of this issue if this happens to be more widespread. Thanks! |
@blmage Yes, I can. One thing I have to add: I wasn't sure if .* sort↓ button toggles the other option or says which option is currently active. The results weren't sorted alphabetically, so maybe it's actually the alphabetical sort that is broken for me. I'll try to remember making a screenshot in the future. |
@tobiornottobi Thanks for the screenshots! The UI reflects the current state, so when "Alphabetical sort ↓" is displayed, solutions are/should be sorted alphabetically and in descending order. The order on the first screenshot seems correct, apart from the two solutions at the top, but I couldn't reproduce the same result in isolation (when testing the comparison algorithm, "ä" comes before "b", as expected). Could you point me to a skill in the Norwegian tree that uses a lot of accented words? (I'll try to reproduce it from there instead) |
@blmage Thank you. :) |
My bad! In the case of Swedish then, this seems to be the expected behavior:
Wikipedia |
Currently, the similarity between answers and correct solutions is computed as-is, with only Unicode normalization being applied. Therefore, accented letters and their unaccented counterparts are considered completely different characters.
While this is desirable when the user enters a "perfect" answer with regards to accents, it turns out that the results can get quite random in the contrary case.
A solution would be to compute two similarity scores, applying more or less normalization, then averaging them in a consistent way.
The text was updated successfully, but these errors were encountered: