A KNIME Analytics Platform workflow for conservatively matching DFD (Digital Dictionary of Surnames in Germany) entries with Wikidata family name items. It outputs a list of matches ready for the QuickStatements bulk editing tool, which then adds the DFD-ID using the corresponding property P6597 to the Wikidata items.
The workflow fetches a full list of currently published DFD articles and a list of family name items from Wikidata’s SPARQL endpoint. Family name items are first filtered in the SPARQL query to:
- exclude items which already have “DFD-ID”
- only items with the statement ”instance of: family name”
- only items with the statement ”writing system: Latin script”
- exclude items with additional other values for “writing system”
- only items with a value for “native label”
Further filtering is done in KNIME, because it would be too costly for the Wikidata SPARQL endpoint:
- exclude items whose value of “native label” occurs on more than one family name item
- exclude items with more than one value of “native label”
This results in a list of items where unequivocal 1:1 matches between DFD and Wikidata are possible and a new statement with “DFD-ID” is unlikely to be erroneous or problematic.
The name forms from DFD are then joined to the native labels from Wikidata, using exact (and case-sensitive) string matching.
The resulting list of matches is transformed to the CSV format required by QuickStatements. The final result can be copy-and-pasted to a new batch on QuickStatements. QuickStatements will then perform a batch of edits on Wikidata to add the statements.
All releases can be downloaded as a .knwf file from the release page of this repository.
The .knwf file can then be imported in KNIME version 4.2.3 or higher.
The KNIME Semantic Web extension is required. When importing the workflow, installation of this extension is automatically prompted.
The software is published under the terms of the MIT license.
Copyright 2019 Julian Jarosch
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.