PCFG: probabilistic CFG Magerman 1995, Collins, Charniak
- Estimators of a phrase's structure and meaning
- Draw a tree where each subtree(with 3 nodes) is corresponding to a rule
- We can calculate the probability in order to decide whether a given word should be ahead of another word in the sentence, and hence get the structure
- Word-to-word affinities are useful for certain ambiguities
In 1997, by Charniak A straight forward method to parse PCFG
- The probability of verbal complement frames Bilexical probabilities
P(prices | n-plural) = .013 P(prices | n-plural, NP) = .013 P(h|ph, c, pc) linear interpolation shrinkage
- problem
when using Maximum Likelihood Estimate, people may commonly sees previously unseen events, which would have probability 0.
the independence assumtions may be too strong(and hence make the parsing result unprecise)
Johnson 98
- using state splitting and encode the symbols, like NP to NPs
- parent annotation
- marking possessive NPs (only in English)
- A much better way than lexical PCFG;
- what "unlexicalized" PCFG mean: Grammar rules are not systematically specified
- experimental approach
Merge states
rewrites depend on past k ancestor nodes(in parse tree)
Petrov and Klein 2006, 2007 When there is some situations that cannot be tagged directly from sentences, we should take care of the method of HMM to calculate latent variables.
Using EM algorithms, like Forward-Backword for HMMs, but constrained by tree.