Skip to content

Latest commit

 

History

History
52 lines (41 loc) · 2.23 KB

README.md

File metadata and controls

52 lines (41 loc) · 2.23 KB

Wiktionary Matcher

This matcher participated in the OAEI campaigns 2019 and 2020. The matcher is implemented with the Matching and EvaLuation Toolkit (MELT) and can be packaged for SEALS.

How to Cite?

 Portisch, Jan; Hladik, Michael; Paulheim, Heiko. Wiktionary Matcher. CEUR Workshop Proceedings OM 2019 - Proceedings of the 14th International Workshop on Ontology Matching co-located with the 18th International Semantic Web Conference (ISWC 2019). Auckland, New Zealand. October 26, 2019. Pages 181 - 188.

An open-access version of the paper can be found here.

Installation / Setup

(1) Download Wiktionary Files

Download core: http://kaiko.getalp.org/about-dbnary/download/ (en_dbnary_ontolex.ttl.bz2)

  • en
  • de
  • es
  • pt
  • ru
  • nl
  • fr

You need only core - not disambiguation translation!

(2) Download tdb

Download https://jena.apache.org/download/index.cgi and add it to path.

(3) Load Data with tdbloader

tdbloader2 --loc ./ <path to en_dbnary_ontolex.ttl.bz2> <path to pt_dbnary_ontolex.ttl.bz2> ...

Note that if you do not load all files with tdbloader2 at once, you can only add with tdbloader.

(4) Create oaei-resources

  • In the project create /oaei-resources/wiktionary-tdb/ and place the database files obtained in (3) there.
  • In the project create /oaei-resources/stopwords/ and place a file named english_stopwords.txt in there. It should contain one stopword per line (e.g. a, the).

Future Improvements

This matcher uses DBnary as general knowledge background source. Due to restrictions of the extraction framework the following relations are not extracted albeit present on Wiktionary and helpful for matching:

  • derived terms
  • alternative forms

They will be added to this matcher when they are available (two enhancement requests have been submitted on the dbnary bitbucket).