-
Notifications
You must be signed in to change notification settings - Fork 205
KB Relations
When we are querying a structured knowledge base, whether based on raw question representation or a logical form, we need to map question terminology to the actual graph relations in the KB.
This concerns two specific problems - first, mapping from natural language vocabulary to relations; second, finding template subgraphs in order to capture constraints (like co-occurrence with another entity, or yielding only "first" entity by particular ordering or their count).
This is a big TODO.
Currently, we use two approaches both at once.
First, we produce answers from all the immediate relations of a concept. Some vocabulary mapping is done by assigning each answer the LAT based on the relation name. This is an "emergency" solution.
Second, we produce answers from specific (fixed-label) relation paths. So, vocabulary is fixed; template subgraph is just a 1 or 2 entity path. Logistic regression based multi-label classifier based on (few) lexical question features. Based on Yao: Lean question answering over freebase from scratch (2015).
Compare embedding of question and property label (using transformation matrix) to determine how likely it is to be answer-producing. Work in progress by Silvestr in f/property-selection, based on his sentence selection work.
Instead of fixed-label relation path, consider a more complex subgraph template with other entity references. Our first iteration will keep using the fixed vocabulary, just add a "T-shaped" subgraph of three entities in addition to the path. Investigated by Honza P.
When done, this will yield (with regard to subgraph problem) a baseline that is popular across systems, with three subgraph templates - direct relation, one-hop relation, and T-shaped relation with an extra fixed entity. This is enough to get huge WebQuestions coverage, apparently.
We now call this extension "Branched fbpaths". Branched fbpaths try to cover question which have additional relation between two concepts in addition to relation between question entity and answer. These paths have to have one common relation.
For example one path: tv/tv_character/appeared_in_tv_program", "/tv/regular_tv_appearance/actor" and second path: tv/tv_character/appeared_in_tv_program", "/tv/regular_tv_appearance/series
This is typical for question which looks like: Who played character X in film Y.
The webquestion dataset (can be obtained from here) was used for training classifier for branched (T-shaped) fbpaths relations. You gen get the file from "d-freebase-rp" directory and then you need to create tsv format of these file using "scripts/json2tsv.py". Finally, you can follow the README file in the yodaqa repository in "data/ml/fbpath" to create classifier.
Because this type of fbpath corresponds mostly to one type of question from questions about movies, it improves MRR on movies dataset. Unfortunately, it make MRR on curated dataset worse.
moviesC-test u58e6f15 2015-09-18 Added sparql query f... 100/177/233 42.9%/76.0% mrr 0.506 avgtime 745.413
curated-test u88085fb 2015-09-18 Added sparql query f... 135/329/430 31.4%/76.5% mrr 0.408 avgtime 3921.207
For further information see Benchmarks wiki page.
Many systems use semantic parsing first to produce a logical form, then learn rules that convert this logical form to a SPARQL query. Often, this SPARQL query is fixed to be essentially just a subgraph template like we do, e.g. the QALD5 winner Xser (the FBGraph subgraphs).
Another subgraph template matching approach is Bast, Haussmann: More accurate question answering on freebase (2015). It matches the FBGraph subgraphs. Answers are produced aggressively, and for vocabulary, it measures Freebase relation(s) alignment with the question - number of overlaping words, derived words, word vector embedding cosine similarities and indicator words in question trained by distant supervision.
Distant supervision is common in other systems too (TODO) - Wikipedia sentences that contain two entities connected with such relation in Freebase would often have the indicator word on the path between the entities in dependency parse). Some tools may be reused for this (TODO).
Another way to map vocabulary to relations is using the PATTY resource (Xser). TODO link