-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow merge of many records into one canonical record #12
Comments
I'd suggest that an agent record should have one to many names. Reasons could be regular name changes (e.g. after marriage) or different ways to write the name (which @dshorthouse mentioned above). |
Originally independent agent records (referring to the same person or organization, but written differently and thus treated as separate instances during import or data entry) should be associated via |
Indeed, this is definitely a requirement (+ language attribute for those aliases). In practice, especially during workflows such as high throughput digitization, we do not want the onus to be on the person doing the transcription to correctly interpret the identity of a collector. What that may mean is a need for a dirty bucket of Agent strings that have yet to be reconciled with & subsequently flagged as aliases (= merged) of other Agent entries. However, this can quickly get out of hand, as is often the case in EMu. The dirt tends to persist indefinitely. When there is spillage of Agents into other modules or components of modules (eg determination histories), we'll have to decide if, at the moment an agent as human or organization (other types?) requires entry, does that necessitate a new entry when none exists or a link to an existing entry in the Agent module? In EMu's case, there tends to be hundreds of entries of eg M. Smith because each are tied to different objects in the system. The It might be useful for us to see what's happening on wikidata. Here's Alexander von Humboldt: https://www.wikidata.org/wiki/Q6694. A canonical label at the very top, lists (flat?) of language-dependent aliases each with a single label (= So... I think we can assume that there will be dirt in the Agent module. And that we'll need utilities to merge entries whereby an entry becomes an alias of another. But, the mechanics of what merge means will undoubtedly be contextual and will evolve in time as modules become more interwoven and linked. |
I agree.
But this is also true for many other transcribed data (e.g. locality). So, do we really want dirty buckets for each module? So people working in high-throughput digitization won't enter anything in the collection management system directly. This at least is the plan at MfN. Besides that, we could keep things simple and assume that MVP 1.0 is getting verified data only. |
As a user, I will see many seemingly independent agent records but among some of these, I will see that some are alternate representations of the same entity (eg. John R. Smith vs John Smith). I would like the capacity to merge instances like these into a single destination record of my choosing. This means that values in identical fields across agent records to be merged are collapsed and all incoming links are re-attributed to the destination record I chose. I would like multi-entry fields to be deduped upon concatenation. In the event of conflict in the collapse of single-entry fields, I would like merge to cease with an error telling me the reason so that I can manually reconcile (i.e. make values identical) the differences that caused the error(s) to be thrown.
The text was updated successfully, but these errors were encountered: