Skip to content
Petr Baudis edited this page Mar 18, 2016 · 2 revisions

tf-idf Experiments

Answer Sentence Selection

Word overlap is much better than cosine distance. BM25 is awesome (while treating s0 as the query, i.e. weighing based only on s1 occurences).

wang:

Model trainAllMRR devMRR testMAP testMRR settings
termfreq 0.813992 0.829004 0.630100 0.765363 (defaults) termfreq-5e150127bfa12fab-00
termfreq 0.714169 0.725217 0.578200 0.708957 freq_mode="tf" termfreq-2d3b759c31ae7a0c-00
termfreq 0.602093 0.684234 0.545400 0.641078 score_mode='cos' termfreq-11d9aad0ee302e88-00
termfreq 0.601831 0.696384 0.549600 0.634582 freq_mode="tf" score_mode='cos' termfreq-5121bb88a5922f9-00

curatedv2:

Model trainAllMRR devMRR testMAP testMRR settings
termfreq 0.483538 0.452647 0.294300 0.484530 (defaults) termfreq-7c2a88efab16d07d-00
termfreq 0.339544 0.324693 0.242700 0.337893 freq_mode="tf" termfreq-26a946355b7ba20d-00
termfreq 0.254189 0.214607 0.201000 0.275696 score_mode='cos' termfreq--4326af5eba873e89-00
termfreq 0.251412 0.238331 0.204800 0.278305 freq_mode="tf" score_mode='cos' termfreq--4e5be392f5f78798-00

large2470:

Model trainAllMRR devMRR testMAP testMRR settings
termfreq 0.441573 0.432115 0.313900 0.490822 (defaults) termfreq-1c1547925afa2a69-00
termfreq 0.325390 0.328255 0.266800 0.362613 freq_mode="tf" termfreq--1146821b4b0960cf-00

STS

Model train val ans.for. ans.stud belief headline images t. mean settings
termfreq TF-IDF #w 0.497085 0.651653 0.607226 0.676746 0.622920 0.725578 0.714331 0.669360 freq_mode='tf' termfreq--3e24e018ccdd67cf
termfreq BM25 #w 0.503736 0.656081 0.626950 0.690302 0.632223 0.725748 0.718185 0.678681 (defaults) termfreq--61f19fd57ac70195
termfreq 0.529827 0.623189 0.614495 0.561347 0.489752 0.674801 0.681121 0.604303 score_mode='cos' termfreq-48a0f78b96dd8c67
termfreq 0.516296 0.607707 0.615016 0.557084 0.491858 0.675379 0.683860 0.604639 freq_mode='tf' score_mode='cos' termfreq-1408c2bf9274f4d8
Clone this wiki locally