-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PROPOSAL: Refactor EID class into a pseudo-UUID #25666
Conversation
src/engraving/rw/write/twrite.cpp
Outdated
xml.tag("eid", eid.toStdString()); | ||
} | ||
|
||
bool TWrite::itemNeedsWriteEid(const EngravingObject* item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not write for all elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to do some performance testing on a big file. I'll come back to you about this :)
src/engraving/infrastructure/eid.h
Outdated
namespace std { | ||
template<> | ||
struct hash<mu::engraving::EID> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this would better be written as
namespace std { | |
template<> | |
struct hash<mu::engraving::EID> | |
template<> | |
struct std::hash<mu::engraving::EID> |
because otherwise some IDEs start thinking that the std namespace is ours...
The method looks good, but do we really need 128 bit IDs?
IDs generated in series will never conflict, so it's only IDs generated in parallel that you need to worry about. That's an awful lot of pull requests that people would have to open on your score repository before you would see any conflicts in terms of IDs. In reality, you would run into ordinary merge conflicts long before you encounter any ID collisions. If a score does happen to contain two elements with the same ID, you could show an error message and rely on the user to fix it manually. Only developers store scores in Git repositories, so it's only developers who could encounter this problem. When we finally do multi-movement scores properly, depending on the implementation, IDs might only need to be unique for a single movement rather than an entire score, which would reduce the likelihood of collisions even more.
What about in the MSCX? A 128-bit number will be 39 decimal digits long, but only 22 characters in base64.
Since we know the result is supposed to be 128 bits, we can discard the trailing 96 bits requires 29 decimal digits, or 16 characters in base64
64 bits requires 20 decimal digits, or 11 characters in base64
|
Thank you all for the comments! I'll look into it and come back to you. Meanwhile, I've been wondering whether to import old EIDs into the new format, but it turns out we can't do that, because of this issue. In fact, the vtests were crashing because at least one file had a repeated EID. So I think I'll just discard the old EIDs. |
Ok, here are my findings.
Now, in terms of cost, long story short is that it's completely negligible, even with 128bit IDs. The large file I've used (Beethoven 9 with parts generated) required about 10^6 individual IDs.
|
@mike-spa The main cost I'm worried about is seeing these long IDs in files and in diffs, but separating it into pieces goes a long way towards addressing that. In practice, I would only have to glance at the first half of two IDs to see if they are the same.
Perhaps you could separate the pieces with underscores rather than hyphens? This would enable the entire ID to be selected with double click, for an easy Ctrl+F search in a large file. |
I've made some final changes! I've implemented a simple base-64 encoding, so the resulting string in the file is a few characters shorter, and I've used the underscore for separator, so now the EIDs in our files are written like this |
@mike-spa, that's a massive improvement, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve, though @igorkorsukov or @cbjeukendrup might want to give this a final look before it gets merged.
I notice you don't write IDs in test mode, probably because non-native formats don't have IDs (e.g. MIDI, MusicXML), so you'd get different random IDs each time, which would cause the tests to fail.
But it's a shame not to test whether IDs, spanner links, and parts links are preserved.
One idea might be to seed the PRNG to a specific value in test mode, so you get the same IDs generated every time. (Maybe something for a future PR?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good for me :)
@shoogle Yes, we discussed about testing with @cbjeukendrup for exactly that reason. I should point out that even "legacy" EIDs weren't written in test mode (they were written only for the Score object, don't know why). At the moment EIDs only have one use case, that is SystemLocks. But as we start adopting them more, they should definitely be included in tests, and I think that seeding the RNG with a constant value may be indeed a great solution. In the meantime, I'll merge this :) |
Resolves: see previous discussion here.
The EID class has the purpose of giving a unique ID to every EngravingObject. It was initially conceived as a helper for debugging, but I can't say I have ever used it for that purpose. Instead, it has become clear to me that assigning unique IDs to items has extremely useful applications in establishing relationships between items in our files. Saving the start- and end measure of a SystemLock was an obvious first one, but the real big one to me will be part-score linking, as I've pointed out here.
The previous implementation of EID had a few shortcomings for this purpose:
So, here's a few directions I've proposed:
About the implementation details: