-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More than one curator? #197
Comments
could you provide an instance where one annotation is provided by more than one curator, instead of multiple annotations provided by one curator each? |
It's not that a single annotation is "provided by more than one curator" (the annotation is provided by an annotator, right?), but that a single annotation can be curated by one or more people. For example, the HEMAN dataset employs data that were curated (but not annotated) by me and that later @irisyupingren further reviewed, cleaned, published, and formatted (i.e., curated?). Maybe I'm employing the word "curator" in the wrong way? To be honest, it's a bit confusing from the jams specification docs:
|
…Removed Oriol until JAMS allows for more than one curator, following this: marl/jams#197
@urinieto your take is correct (imo), annotator is whoever generated the specific annotation, while curator is the person (or people) who did the work of collecting all the annotations into a dataset. Under this reasoning, it makes perfect sense to have more than one curator. |
Glad we agree on this, Justin. I just reviewed the original paper, and it also feels a bit confusing to me (who wrote this paper?):
|
ah yes, you're right + I'm wrong – curator is the person(s) responsible for collecting the annotation, the annotator is the observer. I guess one thing that we punted on pretty hard was having an "agent" datatype in the schema; we weren't really sure what kinds of curators would crop up (people? teams? universities?), and so it got left as a single string. In hindsight, this is kind of a great problem to have, since it's lightyears ahead of unstructured text files.. maybe I could rephrase my question better: would an array of open-ended strings be enough? or is there enough data to infer what a more structured also I definitely wrote that section of the paper, so that makes me 0/2 on this thread. |
haha even if you wrote that section, we should've pointed out how ambiguous it was (i.e., don't be too hard on yourself, we The Authors are all to blame here 💃 ). Ok, back to your question, I would go with either a single open-ended string (e.g., "Eric Humphrey [email protected] & Justin Salamon [email protected]"), or an array of Also, ideally, I would allow either one |
I see the In that light, I'm not sure having multiple points of contact makes much sense, but I agree that the current Maybe it's worth considering the proposal data management / revisioning that @mcartwright wrote up for our OSS-MIR paper in IEEE-SPL? Thinking about how the |
generally if a field is to be repeated, it should always be an array, and maybe specify a minimum number of elements. afaik mixing data types (allowing Curator and Array) is poor form. to @bmcfee's point, I really like the idea of thinking about why it exists. If curator equals "who do I bother", then perhaps either URLs or email addresses are equally fine? |
My 2c as the person who put the "curator" field in there in the first place :) The intention was precisely for attribution. While a dataset can have many annotators (especially in a crowdsourcing scenario), it usually has a small set of curators who are in charge of putting the whole thing together, quality control, etc. Basically like an art exhibition that may consist of artworks by multiple artist (annotators in this analogy), it is usually curated by just one or two people, the curators. Personally I think it's important to have such a field, because annotator(s) != curator(s) != point of contact. The assumption was that the first curator is also the POC, and that people would infer that on their own. If you think it's worth adding an explicit "contact" field (e.g. with an email address) I'm totally fine with that, but not at the expense of the "curator" field, IMO. |
p.s. forgot to add, in light of the above, I'd support @urinieto's proposal of making the curator field a list of |
Coming back to this one, it seems to me that I'm thinking it might be better to lift that up a level; annotations can belong to collections, and collections can have curators, as well as other properties: home page, DOI, etc. For my typical use cases, a DOI pointing to a zenodo page for the dataset would be perfect. From there, I can get all the attribution and contact info I need, and the maintainers can worry about keeping things up to date there. For example, if a curator changes email address, there's currently no mechanism to propagate that information back to a bunch of jams files out on the internet. Relying on zenodo (or figshare, or whatever it happens to be) for this seems like a much better approach. |
I agree with @bmcfee: My only concern is that changing this would potentially make pretty much all JAMS files to date incompatible with the new schema. Unless we do something smart about it, with deprecation warnings and so on. |
Yup, that'll happen. The ideal fix here will be to 1) standardize the schema into a self-contained definition (ie without namespace runtime patching) as noted in #178, 2) put the schema under proper version control, and 3) put converters in place for migrating between versions. If we set this up properly, then migration should be pretty easy, since we're going from a "exactly one of" to a "zero or more of" type of field, though obviously the python object model will have to change to stay usable. |
From this issue I realized that the current jams schema doesn't allow for a given annotation to be curated by more than one person. Is that true? If so, we should consider enhancing the schema to allow a list of curators.
The text was updated successfully, but these errors were encountered: