Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Recorded by" not processed if it contains an apostrophe and/or >2 names #578

Open
sat01a opened this issue Aug 14, 2023 · 5 comments
Open
Assignees

Comments

@sat01a
Copy link

sat01a commented Aug 14, 2023

A user's reported an issue to do with processing the "Recorded by" field. This record appears to be correct: https://biocache.ala.org.au/occurrences/6536f255-a49e-45b3-ba3c-83c62862102d
It shows Thomas Mesaglio as the original and [Mesaglio, Thomas] as the processed value.
Whereas on this record: https://biocache.ala.org.au/occurrences/04e8e8dd-c8ff-497a-ad48-93a631b373f6
It shows Louis Gerald O'Neill as original and the processed value is blank.

Data team investigation suggests it might be due to the apostrophe, or due to the name having more than 2 parts. It's also believed to be an issue in pipelines specifically.

3,876,171 records have provided recordedBy but don’t have processed value, so this is a very visible issue.

Raised in https://support.ehelp.edu.au/a/tickets/182572

Reported by @timhicks-ala

@timhicks-ala
Copy link

Another example has been shared in this record: https://biocache.ala.org.au/occurrences/f6ee114b-1d76-4c4e-93f1-2becfa1e5ef4
The Recorded by field is supplied as Petra Holland but our processed value is Petra, Petra.

@adam-collins
Copy link
Contributor

Due to the large variety of delimiters, abbreviations and name formats in use by data providers, parsing Recorded By is unnecessarily difficult. Putting this into the backlog for now. When there is time it would be worth including a review of all records with unprocessed Recorded By.

@adam-collins
Copy link
Contributor

My preference is to remove the processed version

@adam-collins
Copy link
Contributor

@peggynewman as discussed, using the raw_recordedBy as recordedBy. Pull request gbif/pipelines#987

@adam-collins
Copy link
Contributor

To test that this has been applied, https://biocache-test.ala.org.au/fields?filter=recordedBy lists no raw_recordedBy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants