Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bot development for new taxa #4

Open
myrmoteras opened this issue Jun 27, 2022 · 0 comments
Open

Bot development for new taxa #4

myrmoteras opened this issue Jun 27, 2022 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@myrmoteras
Copy link
Contributor

myrmoteras commented Jun 27, 2022

Hi Donat,

It took me a while to get a grip on the data. I ended up writing an HTML scraper to get the JSON out of Zenodo, it is not ideal, but the Wikidata bot now works. I cherry-picked a few treatments from the main plazi website and created the following wikidata items: https://w.wiki/5A3h

There are more treatments added, which did not show up in the query above. This was because of some issues with publication links in their associated publications. This query will show all treatments currently in Wikidata: https://w.wiki/5A3i .

I can now add more treatments to wikidata given a set of plazi UUIDs. The bot uses both the RDF and the JSON from zenodo. I would able to rely on the RDF only if the following changes are made to the RDF:

  1. Add the DOI of the scientific publication associated with the treatment. In some cases it often contains zenodo intermediate DOIs, which need to be resolved through the json.
  2. Add the locations to the RDF. The second item the bot takes from the JSON is the location coordinates.
  3. Use URIs and rdfs:label in the RDF. The taxonomic tree, currently uses literals for the different clades. For each clade the complete parent branch is repeated. Can this be simplified by changing the clades from strings to URIs? as in this example:"

<http://taxon-concept.plazi.org/id/Animalia/Brighstoneus_simmondsi_Lockwood_2021> a dwcFP:TaxonConcept ;

    trt:hasTaxonName <[http://taxon-name.plazi.org/id/Animalia/Brighstoneus_simmondsi](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-name.plazi.org%2Fid%2FAnimalia%2FBrighstoneus_simmondsi&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tHtOlCv7Bo4ieJbfA81xXxn2LPMKB7IGqemDoAh1B4k%3D&reserved=0)> ;
    dwc:genus "Brighstoneus" ;
    dwc:kingdom "Animalia" ;
    dwc:order "Ornithischia" ;
    dwc:rank "species" ;
    dwc:scientificNameAuthorship "Lockwood & Martill & Maidment, 2021" ;
    dwc:species "simmondsi" .

Would become:

<[http://taxon-concept.plazi.org/id/Animalia/Brighstoneus_simmondsi_Lockwood_2021](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-concept.plazi.org%2Fid%2FAnimalia%2FBrighstoneus_simmondsi_Lockwood_2021&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TMe9RO%2BISuIabuowVG1vndNwo6l2q0Iwr7dm4Hqg8po%3D&reserved=0)> a dwcFP:TaxonConcept ;
    trt:hasTaxonName <[http://taxon-name.plazi.org/id/Animalia/Brighstoneus_simmondsi](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-name.plazi.org%2Fid%2FAnimalia%2FBrighstoneus_simmondsi&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tHtOlCv7Bo4ieJbfA81xXxn2LPMKB7IGqemDoAh1B4k%3D&reserved=0)> .
<[http://taxon-name.plazi.org/id/Animalia/Brighstoneus_simmondsi](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-name.plazi.org%2Fid%2FAnimalia%2FBrighstoneus_simmondsi&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tHtOlCv7Bo4ieJbfA81xXxn2LPMKB7IGqemDoAh1B4k%3D&reserved=0)> rdfs:label "Brighstoneus simmondsi" ;
    dwc:rank wd:Q7432 ; # Q7432 = species
    trt:hasParentName <[http://taxon-name.plazi.org/id/Animalia/Brighstoneus](https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxon-name.plazi.org%2Fid%2FAnimalia%2FBrighstoneus&data=05%7C01%7C%7C8594dd4f57e946a22b9508da416c7902%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637894231403416246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mP07%2Fy99C71Yf0CofUhnUcHu6OejqbrKCs%2BiuzE3%2BLw%3D&reserved=0)> .

The next step is to request a bot account and/or permission to do this on scale. But I propose to first discuss the current schema on Wikidata and make some possible adaptations.

Cheers,

Andra

@myrmoteras myrmoteras added the documentation Improvements or additions to documentation label Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant