-
Notifications
You must be signed in to change notification settings - Fork 0
Caroline-Vandyck/InformationScienceProjectCarolineVandyck
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is the repository for the exam project of the course Information Science, part of the MA Digital Text Analysis of the University of Antwerp. This repository contains the code (VandyckCarolineInformationScience.ipynb) for data reconciliation between a sample of the Anet SQLite export file and the CERL Thesaurus SRU/CQL service. For the code to work, the notebook and the sample need to be located in the same directory. The purpose of the code is to retrieve the tags/identifiers (one from Anet, one from the CERL Thesaurus) that correspond for the same person. The program will query CERL on the basis of the surname obtained from Anet and will select the first 50 records. For those 50 (or less) records, it will be checked whether the first name and year of birth match. In the end, every record obtained from CERL will be returned with a statement on which metadata it matches Anet. It only uses Python's standard library. At the end of the notebook, some observations and possible improvements are listed. However, most of these 'possible improvements' boil down to personal choices/preferences (relating to what you find important and how you want to standardize names). There are always many options to choose from and choices to make. I attempted to make the optimal ones and explained why, yet opinions can differ.
About
This repository contains the project for the course Information Science, which is part of the MA DTA of the University of Antwerp.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published