Skip to content
pichljan edited this page Sep 23, 2015 · 9 revisions

Knowledge Base Relations

When we are querying a structured knowledge base, whether based on raw question representation or a logical form, we need to map question terminology to the actual graph relations in the KB.

This concerns two specific problems - first, mapping from natural language vocabulary to relations; second, finding template subgraphs in order to capture constraints (like co-occurrence with another entity, or yielding only "first" entity by particular ordering or their count).

TODO: Dataset.

Baseline

Currently, we use two approaches both at once.

Trivial Baseline

First, we produce answers from all the immediate relations of a concept. Some vocabulary mapping is done by assigning each answer the LAT based on the relation name. This is an "emergency" solution.

FBPath Baseline

Second, we produce answers from specific (fixed-label) relation paths. So, vocabulary is fixed; template subgraph is just a 1 or 2 entity path. Logistic regression based multi-label classifier based on (few) lexical question features. Based on Yao: Lean question answering over freebase from scratch (2015).

Extension of FBPath Baseline

We call this extension "Branched fbpaths". Branched fbpaths try to cover question which have additional relation between two concepts in addition to relation between question entity and answer. These paths have to have one common relation.

For example one path: tv/tv_character/appeared_in_tv_program", "/tv/regular_tv_appearance/actor" and second path: tv/tv_character/appeared_in_tv_program", "/tv/regular_tv_appearance/series

This is typical for question which looks like: Who played character X in film Y.

Investigated Alternatives

Embedding Vocabulary

Compare embedding of question and property label (using transformation matrix) to determine how likely it is to be answer-producing. Work in progress by Silvestr in f/property-selection, based on his sentence selection work.

FBGraph

Instead of fixed-label relation path, consider a more complex subgraph template with other entity references. Our first iteration will keep using the fixed vocabulary, just add a "T-shaped" subgraph of three entities in addition to the path. Investigated by Honza P.

When done, this will yield (with regard to subgraph problem) a baseline that is popular across systems, with three subgraph templates - direct relation, one-hop relation, and T-shaped relation with an extra fixed entity. This is enough to get huge WebQuestions coverage, apparently.

Other Ideas

Many systems use semantic parsing first to produce a logical form, then learn rules that convert this logical form to a SPARQL query. Often, this SPARQL query is fixed to be essentially just a subgraph template like we do, e.g. the QALD5 winner Xser (the FBGraph subgraphs).

Another subgraph template matching approach is Bast, Haussmann: More accurate question answering on freebase (2015). It matches the FBGraph subgraphs. Answers are produced aggressively, and for vocabulary, it measures Freebase relation(s) alignment with the question - number of overlaping words, derived words, word vector embedding cosine similarities and indicator words in question trained by distant supervision.

Distant supervision is common in other systems too (TODO) - Wikipedia sentences that contain two entities connected with such relation in Freebase would often have the indicator word on the path between the entities in dependency parse). Some tools may be reused for this (TODO).

Another way to map vocabulary to relations is using the PATTY resource (Xser). TODO link