-
Notifications
You must be signed in to change notification settings - Fork 205
KB Relations
When we are querying a structured knowledge base, whether based on raw question representation or a logical form, we need to map question terminology to the actual graph relations in the KB.
This concerns two specific problems - first, mapping from natural language vocabulary to relations; second, finding template subgraphs in order to capture constraints (like co-occurrence with another entity, or yielding only "first" entity by particular ordering or their count).
TODO: Dataset.
Currently, we use two approaches both at once.
First, we produce answers from all the immediate relations of a concept. Some vocabulary mapping is done by assigning each answer the LAT based on the relation name. This is an "emergency" solution.
Second, we produce answers from specific (fixed-label) relation paths. So, vocabulary is fixed; template subgraph is just a 1 or 2 entity path. Logistic regression based multi-label classifier based on (few) lexical question features. Based on Yao: Lean question answering over freebase from scratch (2015).
Compare embedding of question and property label (using transformation matrix) to determine how likely it is to be answer-producing. Work in progress by Silvestr in f/property-selection, based on his sentence selection work.
Instead of fixed-label relation path, consider a more complex subgraph template with other entity references. Our first iteration will keep using the fixed vocabulary, just add a "T-shaped" subgraph of three entities in addition to the path. Investigated by Honza P.
When done, this will yield (with regard to subgraph problem) a baseline that is popular across systems, with three subgraph templates - direct relation, one-hop relation, and T-shaped relation with an extra fixed entity. This is enough to get huge WebQuestions coverage, apparently.
Many systems use semantic parsing first to produce a logical form, then learn rules that convert this logical form to a SPARQL query. Often, this SPARQL query is fixed to be essentially just a subgraph template like we do, e.g. the QALD5 winner Xser (the FBGraph subgraphs).
Another subgraph template matching approach is Bast, Haussmann: More accurate question answering on freebase (2015). It matches the FBGraph subgraphs. Answers are produced aggressively, and for vocabulary, it measures Freebase relation(s) alignment with the question - number of overlaping words, derived words, word vector embedding cosine similarities and indicator words in question trained by distant supervision.
Distant supervision is common in other systems too (TODO) - Wikipedia sentences that contain two entities connected with such relation in Freebase would often have the indicator word on the path between the entities in dependency parse). Some tools may be reused for this (TODO).
Another way to map vocabulary to relations is using the PATTY resource (Xser). TODO link