-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perfect 5th detection in mir_eval.key is asymmetric #337
Comments
I do not know music theory :( @bmcfee ? |
From reading the comments, I think this is behaving to spec / consistent with the mirex implementation. I also think it's weird, but 🤷 |
To follow up on this a bit, yes, it's intended to be asymmetric, and the docstring is unambiguous about the behavior: I suppose this was originally intended to break symmetry between 5th and 4th estimation errors by only considering movement in one direction from the root. I did try to dig into this more in the mirex wiki, but there's not much provenance around how this metric came about. Maybe @jpauwels has some insight, as I think he was the most recent task caption for key detection? |
Hi all, as far as I know, it was never intended to make a distinction between perfect fifth and perfect fourth errors (as long as the mode is the same). It certainly isn't the case in the current implementation used for MIREX. That doesn't mean it's impossible that there is some ambiguously worded description somewhere on the wiki. Are you referring to the following table: https://www.music-ir.org/mirex/wiki/2005:Audio_and_Symbolic_Key (copied over every year since)? In that case, it should be read as perfect fifth either way, not just perfect fifth above. The canonical reference I use for this measure is Emilia Gomez's PhD thesis (section 4.3.1.2), but I suspect the exact weighting was determined collaboratively with other participants of MIREX 2005. Between us, I've never been a fan of this particular measure, too many magic numbers in there. I always suggest listing the error categories separately and did so in my own papers, but I still include this combination measure in MIREX for historical comparisons. |
Seconding what @jpauwels said, he actually stated that in the 2017 MIREX results, too:
This is taken directly from the 2017 MIREX Key Detection results (https://www.music-ir.org/mirex/wiki/2017:Audio_Key_Detection_Results) So apparently the "official" behavior changed from only counting ascending ones to also counting descending ones in 2017. It would be really useful if both approaches were implemented, maybe even adding a warning to the "old" approach that notifies users that they are using a metric that might not correspond with the one they're expecting. I personally just spent multiple days trying to find a bug on my end of the code while reproducing prior MIREX results, only to find out that the one thing where I didn't expect the cause of the issue (who would expect the MIR algorithm evaluation library to output different results than the MIR algorithm evaluation campaign) is actually the cause. Adding a warning to the "old" approach could be really useful for cases like mine without breaking prior code. I'll happily submit a PR adding a second method or additional argument and a warning incase the "old" method is used if this is the direction that the maintainers want to go with this. |
I would support this change. AFAIK the mir_eval implementation was based on mirex in ~2014, so if mirex changed later, it's not surprising that we got out of sync.
🚀 please do! |
https://github.com/craffel/mir_eval/blob/576aad4e0b5931e7c697c078a1153c99b885c64f/mir_eval/key.py#L145
Is this intended to be asymmetric? E.g. if estimated = G major and reference = C major the score is 0.5, but if estimated = C major and reference = G major the score is zero.
The text was updated successfully, but these errors were encountered: