Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I found that when two authors share the same name, they are incorrectly grouped together as the same author. #101

Open
codetsang opened this issue Aug 26, 2024 · 0 comments

Comments

@codetsang
Copy link

In the data processing, I found that if two authors have the same name, for example, Yang, Xia (University A) and Yang, Xia (University B), they are grouped as the same author, even though they are from different universities. In this case, different authors with the same name are actually distinct individuals. Could this be considered a significant issue for the project?

Here is an example dataset (These authors are different individuals but are incorrectly grouped as the same author):

<style> </style>
  authorID groupID author_name author_order address university department postal_code city state country RP_address
429 429 4595 Yang, Xia 6 Univ Malaya, Kuala Lumpur, Malaysia. univ malaya NA NA kuala lumpur NA malaysia NA
1211 1211 4595 Yang, Xia 1 Cent South Univ, Xiangya Hosp 3, Dept Pediat, 138 Tongzipo Rd, Changsha 410013, Hunan, Peoples R China. cent south univ xiangya hosp 3 41001 changsha hunan peoples r china NA
1294 1294 4595 Yang, Xia 5 Air Force Med Ctr, Dept Anesthesiol, Beijing, Peoples R China. air force med ctr NA NA dept anesthesiol beijing peoples r china NA
1505 1505 4595 Yang, Xia 6 Shenzhen Univ, Shenzhen Peoples Hosp 2, Affiliated Hosp 1, Dept Traumat Orthoped,Shenzhen Translat Med Inst, Shenzhen 518028, Peoples R China. shenzhen univ shenzhen peoples hosp 2 51802 shenzhen NA peoples r china NA
1723 1723 4595 Yang, Xia 2 Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang 212003, Jiangsu, Peoples R China. jiangsu univ sci & technol sch comp 21200 zhenjiang jiangsu peoples r china NA
3647 3647 4595 Yang, Xia 1 Nanjing Univ Chinese Med, Affiliated Hosp Integrated Tradit Chinese & Wester, Dept Endocrinol, Nanjing, Peoples R China. nanjing univ chinese med affiliated hosp integrated tradit chinese & wester NA dept endocrinol nanjing peoples r china NA
4072 4072 4595 Yang, Xia 1 Dalian Maritime Univ, Nav Coll, Dalian, Peoples R China. dalian maritime univ NA NA nav coll dalian peoples r china NA
4479 4479 4595 Yang, Xia 1 Shandong Univ, Qilu Hosp, Cheeloo Coll Med, Dept Neurosurg, Jinan, Peoples R China. shandong univ qilu hosp NA dept neurosurg jinan peoples r china NA
4541 4541 4595 Yang, Xia 4 Hunan Univ Chinese Med, Coll Chinese Med, Changsha 410208, Hunan Province, Peoples R China. hunan univ chinese med coll chinese med 41020 changsha hunan province peoples r china NA
4595 4595 4595 Yang, Xia 1 Beijing Univ Chinese Med, Grad Sch, Beijing, Peoples R China. beijing univ chinese med NA NA grad sch beijing peoples r china NA

https://docs.ropensci.org/refsplitr/articles/refsplitr.html#author-address-parsing-and-name-disambiguation

Once we have our subset of possible similar entries, we match the existing info of row 1 against the subset. The entry only needs to match one extra piece of information - either address, email, or middle name. If it matches we assume it is the same person, and change the groupID numbers to reflect this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant