Skip to content

1605MCTest

Petr Baudis edited this page Aug 10, 2016 · 6 revisions

1605 MCTest Experiments

Master Table

  • R_rm_2avgBV_EP100_L1e-5_mask
  • R_rm_2danBV_EP100_L1e-5_mask_W13
  • R_rm_2rnnBV_EP100_L1e-4_mask_i13d13 (16)
  • R_rm_2cnnBV_EP100_L1e-4_mask_i13d13 (16)
  • R_rm_2rnncnnBV_EP100_L1e-4_mask_i13d13 (8)
  • R_rm_2a51BV_EP100_L1e-5_mask_fasgmn_crelu (16)
  • R_urm_11299592rnnBV_EP100_mask_rmsprop_mlp

160 all

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.576985 0.618229 0.764453 0.556250 0.725000 (defaults)
±0.008616 ±0.018391 ±0.011629 ±0.011982 ±0.008204
DAN 0.590034 0.628125 0.769965 0.576823 0.741298 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.008681 ±0.014717 ±0.007376 ±0.010082 ±0.006551
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
RNN 0.608488 0.551042 0.728038 0.533333 0.719444 inp_e_dropout=1/3 dropout=1/3
±0.029666 ±0.011416 ±0.007944 ±0.019546 ±0.011545
CNN 0.658066 0.560417 0.733811 0.578385 0.745616 inp_e_dropout=1/3 dropout=1/3
±0.021499 ±0.019324 ±0.012702 ±0.013669 ±0.008400
RNN-CNN 0.597128 0.584375 0.745486 0.551042 0.725651 inp_e_dropout=1/3 dropout=1/3
±0.038743 ±0.027304 ±0.015462 ±0.019763 ±0.015108
attn1511 0.686613 0.537500 0.719227 0.544010 0.721050 focus_act='sigmoid/maxnorm' cnnact='relu'
±0.061481 ±0.028172 ±0.018213 ±0.032517 ±0.020648
Ubu. RNN w/ MLP 0.678243 0.592222 0.750370 0.611944 0.759630 vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop'
±0.035476 ±0.020943 ±0.013777 ±0.022575 ±0.014356

no-relevance variants

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.485743 0.517083 0.698958 0.435208 0.652899 rel_mode=None
±0.018741 ±0.027811 ±0.017296 ±0.017106 ±0.011999
avg 0.549747 0.560417 0.738628 0.561979 0.734896 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] f_add_S1=['bm25'] rel_mode=None
±0.012311 ±0.015481 ±0.007239 ±0.027891 ±0.019026
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
cnn 0.465963 0.385417 0.616927 0.354687 0.600174 inp_e_dropout=1/3 dropout=1/3 rel_mode=None
±0.104367 ±0.056357 ±0.038451 ±0.069110 ±0.047296
cnn 0.550760 0.504167 0.705556 0.568229 0.744965 inp_e_dropout=1/3 dropout=1/3 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.010368 ±0.026982 ±0.016194 ±0.027091 ±0.020080
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
Ubu. RNN w/ MLP 0.595861 0.519792 0.709722 0.547917 0.724175 vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' rel_mode=None
±0.033640 ±0.036690 ±0.023545 ±0.057450 ±0.035159
Ubu. RNN w/ MLP 0.607939 0.511458 0.709288 0.541146 0.721832 vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.029949 ±0.034823 ±0.020107 ±0.029683 ±0.017930

160 one

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.632732 0.744104 0.847976 0.653460 0.789156 mcqtypes=['one']
±0.012922 ±0.026593 ±0.017121 ±0.027497 ±0.015760
DAN 0.650221 0.771226 0.872642 0.680804 0.811477 mcqtypes=['one'] inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.014159 ±0.025104 ±0.014396 ±0.016779 ±0.010076
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
RNN 0.643409 0.649764 0.795204 0.583147 0.755348 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3
±0.029702 ±0.023028 ±0.013176 ±0.033079 ±0.019035
CNN 0.706646 0.708726 0.829599 0.655134 0.794968 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3
±0.023560 ±0.021613 ±0.013824 ±0.020176 ±0.012053
RNN-CNN 0.643778 0.731132 0.840212 0.617188 0.771670 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3
±0.047369 ±0.020491 ±0.013404 ±0.040959 ±0.026156
attn1511 0.724411 0.641509 0.793042 0.611049 0.764602 mcqtypes=['one'] focus_act='sigmoid/maxnorm' cnnact='relu'
±0.061083 ±0.053909 ±0.031519 ±0.051954 ±0.033187
Ubu. RNN w/ MLP 0.749239 0.701887 0.825052 0.736310 0.841071 mcqtypes=['one'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop'
±0.024649 ±0.031043 ±0.017206 ±0.033285 ±0.020334

no-relevance variants

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.476215 0.559434 0.734513 0.422768 0.652790 mcqtypes=['one'] rel_mode=None
±0.023280 ±0.032598 ±0.021436 ±0.022397 ±0.014106
avg 0.549337 0.625000 0.788129 0.563616 0.740885 mcqtypes=['one'] prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] f_add_S1=['bm25'] rel_mode=None
±0.012957 ±0.016614 ±0.008663 ±0.030191 ±0.018246
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
cnn 0.467968 0.403302 0.634237 0.312500 0.575149 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3 rel_mode=None
±0.097296 ±0.049215 ±0.032097 ±0.080221 ±0.051932
cnn 0.563328 0.570755 0.753734 0.561384 0.741071 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.018148 ±0.048458 ±0.029714 ±0.043754 ±0.030048
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
Ubu. RNN w/ MLP 0.629971 0.554245 0.736832 0.600446 0.759952 mcqtypes=['one'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' rel_mode=None
±0.036849 ±0.032459 ±0.022015 ±0.058150 ±0.034819
Ubu. RNN w/ MLP 0.634573 0.537736 0.731329 0.590402 0.755487 mcqtypes=['one'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.032776 ±0.056324 ±0.031742 ±0.037774 ±0.021738

160 multiple

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.529728 0.518657 0.698383 0.471191 0.668864 mcqtypes=['multiple']
±0.010205 ±0.022407 ±0.012595 ±0.019731 ±0.014296
DAN 0.539014 0.514925 0.688744 0.485840 0.679891 mcqtypes=['multiple'] inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.007369 ±0.011594 ±0.005723 ±0.010220 ±0.007904
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
RNN 0.578886 0.472948 0.674907 0.489746 0.688029 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3
±0.031184 ±0.018592 ±0.012321 ±0.017889 ±0.010241
cnn 0.616885 0.443097 0.658038 0.511230 0.702433 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3
±0.021803 ±0.030378 ±0.019382 ±0.012134 ±0.007949
rnncnn 0.557584 0.468284 0.670553 0.493164 0.685384 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3
±0.033727 ±0.034702 ±0.019533 ±0.021274 ±0.014198
attn1511 0.654572 0.455224 0.660836 0.485352 0.682943 mcqtypes=['multiple'] focus_act='sigmoid/maxnorm' cnnact='relu'
±0.064150 ±0.024352 ±0.015673 ±0.025059 ±0.016047
rnn 0.618061 0.505473 0.691294 0.503125 0.688368 mcqtypes=['multiple'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop'
±0.045654 ±0.035694 ±0.022683 ±0.016402 ±0.010690

no-relevance variants

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.493820 0.483582 0.670833 0.446094 0.652995 mcqtypes=['multiple'] rel_mode=None
±0.015751 ±0.025030 ±0.014417 ±0.016142 ±0.012134
avg 0.550094 0.509328 0.699471 0.560547 0.729655 mcqtypes=['multiple'] prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] f_add_S1=['bm25'] rel_mode=None
±0.013905 ±0.028205 ±0.014314 ±0.029164 ±0.022051
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
cnn 0.464263 0.371269 0.603234 0.391602 0.622070 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3 rel_mode=None
±0.112748 ±0.067033 ±0.045232 ±0.069617 ±0.050883
cnn 0.540106 0.451493 0.667444 0.574219 0.748372 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.009202 ±0.026285 ±0.014266 ±0.024656 ±0.017188
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
Ubu. RNN w/ MLP 0.566948 0.492537 0.688277 0.501953 0.692871 mcqtypes=['multiple'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' rel_mode=None
±0.034849 ±0.051825 ±0.029364 ±0.060283 ±0.037658
Ubu. RNN w/ MLP 0.585362 0.490672 0.691853 0.498047 0.692383 mcqtypes=['multiple'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.029574 ±0.027506 ±0.015395 ±0.028045 ±0.017263

500 all

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.576985 0.556563 0.727448 0.542500 0.719635 (defaults)
±0.008616 ±0.011603 ±0.007554 ±0.010536 ±0.006081
DAN 0.590034 0.542813 0.718958 0.559583 0.728273 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.008681 ±0.008632 ±0.004980 ±0.007347 ±0.004740
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.608488 0.500625 0.689818 0.493750 0.686988 inp_e_dropout=1/3 dropout=1/3
±0.029666 ±0.014831 ±0.008939 ±0.012292 ±0.008004
cnn 0.658066 0.528750 0.711172 0.522500 0.708437 inp_e_dropout=1/3 dropout=1/3
±0.021499 ±0.013272 ±0.008906 ±0.008752 ±0.005107
rnncnn 0.597128 0.511875 0.697760 0.508125 0.695729 inp_e_dropout=1/3 dropout=1/3
±0.038743 ±0.019034 ±0.013000 ±0.014454 ±0.009477
attn1511 0.686613 0.485000 0.683255 0.506875 0.698012 focus_act='sigmoid/maxnorm' cnnact='relu'
±0.061481 ±0.026726 ±0.015612 ±0.020704 ±0.013460
rnn 0.678243 0.570333 0.726528 0.538000 0.717074 vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop'
±0.035476 ±0.011504 ±0.008024 ±0.015005 ±0.010569

no-relevance variants

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.485743 0.471000 0.668667 0.423000 0.643403 rel_mode=None
±0.018741 ±0.015864 ±0.010487 ±0.014483 ±0.010295
avg 0.549747 0.518750 0.704531 0.506250 0.699028 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] f_add_S1=['bm25'] rel_mode=None
±0.012311 ±0.011400 ±0.006789 ±0.011941 ±0.008802
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
cnn 0.465963 0.380625 0.610521 0.373125 0.608177 inp_e_dropout=1/3 dropout=1/3 rel_mode=None
±0.104367 ±0.052186 ±0.036275 ±0.036130 ±0.026713
cnn 0.550760 0.510625 0.699531 0.508958 0.700868 inp_e_dropout=1/3 dropout=1/3 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.010368 ±0.015977 ±0.010053 ±0.026774 ±0.019235
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
Ubu. RNN w/ MLP 0.595861 0.491875 0.674219 0.516875 0.703681 vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' rel_mode=None
±0.033640 ±0.026679 ±0.017617 ±0.031509 ±0.020828
Ubu. RNN w/ MLP 0.607939 0.491875 0.676042 0.508958 0.698889 vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.029949 ±0.027246 ±0.015917 ±0.015310 ±0.009226

500 one

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.632732 0.611192 0.768290 0.587086 0.748851 mcqtypes=['one']
±0.012922 ±0.017245 ±0.009873 ±0.018124 ±0.010366
DAN 0.650221 0.603924 0.761446 0.636259 0.776597 mcqtypes=['one'] inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.014159 ±0.016462 ±0.008623 ±0.012985 ±0.007635
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.643409 0.545058 0.722626 0.539062 0.716912 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3
±0.029702 ±0.021225 ±0.011171 ±0.016418 ±0.010984
cnn 0.706646 0.587936 0.752483 0.570542 0.738990 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3
±0.023560 ±0.021122 ±0.013158 ±0.012510 ±0.006880
rnncnn 0.643778 0.575581 0.742127 0.553768 0.725107 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3
±0.047369 ±0.031873 ±0.020092 ±0.023455 ±0.015103
attn1511 0.724411 0.523256 0.710938 0.571232 0.737956 mcqtypes=['one'] focus_act='sigmoid/maxnorm' cnnact='relu'
±0.061083 ±0.035323 ±0.021406 ±0.035533 ±0.023051
rnn 0.749239 0.617054 0.768928 0.641422 0.783211 mcqtypes=['one'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop'
±0.024649 ±0.024375 ±0.016828 ±0.016967 ±0.010961

no-relevance variants

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.476215 0.487791 0.684351 0.434559 0.649877 mcqtypes=['one'] rel_mode=None
±0.023280 ±0.015629 ±0.010567 ±0.020523 ±0.014827
avg 0.549337 0.524709 0.714026 0.505055 0.699257 mcqtypes=['one'] prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] f_add_S1=['bm25'] rel_mode=None
±0.012957 ±0.018469 ±0.010065 ±0.003745 ±0.002568
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
cnn 0.467968 0.375000 0.611192 0.364430 0.602711 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3 rel_mode=None
±0.097296 ±0.055579 ±0.038218 ±0.023480 ±0.019553
cnn 0.563328 0.491279 0.692224 0.497243 0.695159 mcqtypes=['one'] inp_e_dropout=1/3 dropout=1/3 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
Ubu. RNN w/ MLP 0.629971 0.488372 0.687258 0.551930 0.728248 mcqtypes=['one'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' rel_mode=None
±0.036849 ±0.028756 ±0.019306 ±0.042435 ±0.027678
Ubu. RNN w/ MLP 0.634573 0.482558 0.681807 0.539982 0.722312 mcqtypes=['one'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.032776 ±0.023311 ±0.014309 ±0.032325 ±0.018144

500 multiple

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.529728 0.515351 0.696637 0.505526 0.695408 mcqtypes=['multiple']
±0.010205 ±0.015808 ±0.011206 ±0.010165 ±0.006319
DAN 0.539014 0.496711 0.686906 0.495998 0.688199 mcqtypes=['multiple'] inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.007369 ±0.011787 ±0.007208 ±0.007236 ±0.005021
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.578886 0.467105 0.665068 0.456174 0.662173 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3
±0.031184 ±0.019237 ±0.011411 ±0.013046 ±0.007820
cnn 0.616885 0.484101 0.680007 0.482660 0.683101 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3
±0.021803 ±0.018044 ±0.010786 ±0.011948 ±0.007248
rnncnn 0.557584 0.463816 0.664291 0.470274 0.671367 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3
±0.033727 ±0.015745 ±0.011134 ±0.016007 ±0.010931
attn1511 0.654572 0.456140 0.662372 0.453506 0.664888 mcqtypes=['multiple'] focus_act='sigmoid/maxnorm' cnnact='relu'
±0.064150 ±0.027405 ±0.015576 ±0.010830 ±0.006845
rnn 0.618061 0.535088 0.694542 0.452236 0.662229 mcqtypes=['multiple'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop'
±0.045654 ±0.011077 ±0.006828 ±0.016914 ±0.012455

no-relevance variants

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.550094 0.514254 0.697368 0.507241 0.698838 mcqtypes=['multiple'] prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] f_add_S1=['bm25'] rel_mode=None
±0.013905 ±0.011559 ±0.006596 ±0.023703 ±0.015608
avg 0.493820 0.458333 0.656835 0.413415 0.638034 mcqtypes=['multiple'] rel_mode=None
±0.015751 ±0.021049 ±0.012974 ±0.013104 ±0.008891
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
cnn 0.464263 0.384868 0.610015 0.380335 0.612710 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3 rel_mode=None
±0.112748 ±0.056263 ±0.037811 ±0.054425 ±0.036644
cnn 0.540106 0.525219 0.705044 0.518674 0.705602 mcqtypes=['multiple'] inp_e_dropout=1/3 dropout=1/3 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.009202 ±0.024167 ±0.011976 ±0.030143 ±0.021190
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
Ubu. RNN w/ MLP 0.566948 0.494518 0.664382 0.487805 0.683308 mcqtypes=['multiple'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' rel_mode=None
±0.034849 ±0.031100 ±0.018810 ±0.027629 ±0.018656
Ubu. RNN w/ MLP 0.585362 0.498904 0.671692 0.483232 0.679465 mcqtypes=['multiple'] vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop' prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.029574 ±0.037292 ±0.020297 ±0.010663 ±0.006841

Warning

The evaluation below was with broken data load routines and the absolute numbers are severely outdated (though relative comparisons still hopefully are).

Baselines

rm1 (mc160) BV_EP100_L1e-4_mask:

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.473810 0.450000 0.599317 0.341667 0.539138 (defaults)
±0.167931 ±0.091999 ±0.074355 ±0.074034 ±0.058038
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.888095 0.516667 0.669134 0.380556 0.573030 (defaults)
±0.117224 ±0.044017 ±0.047076 ±0.060096 ±0.037937
cnn 0.909524 0.461111 0.623281 0.297222 0.515611 (defaults)
±0.154028 ±0.113202 ±0.094241 ±0.082914 ±0.070081
rnncnn 0.861905 0.372222 0.556731 0.261111 0.477589 (defaults)
±0.199392 ±0.071167 ±0.060497 ±0.061976 ±0.051649
attn1511 0.690476 0.405556 0.572132 0.286111 0.499144 (defaults)
±0.301293 ±0.097732 ±0.072291 ±0.077839 ±0.053568

rm5 (mc500) BV_EP100_L1e-4_mask:

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.475000 0.556667 0.700880 0.425556 0.607853 (defaults)
±0.096706 ±0.110676 ±0.075876 ±0.092588 ±0.066727
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.749444 0.553333 0.697184 0.468889 0.641769 (defaults)
±0.122174 ±0.069259 ±0.036753 ±0.031201 ±0.020402
cnn 0.828000 0.404000 0.583433 0.285333 0.500116 (defaults)
±0.256912 ±0.126040 ±0.085004 ±0.051668 ±0.049098
rnncnn 0.681111 0.380000 0.552492 0.340000 0.539139 (defaults)
±0.562313 ±0.371792 ±0.291447 ±0.190750 ±0.151157
attn1511 0.895000 0.555000 0.693143 0.438333 0.632456 (defaults)
±0.180124 ±0.072484 ±0.049883 ±0.048829 ±0.034950

More complete rm5 measurements:

16x R_rm5_2avgBV_EP100_mask_L1e-5 - 0.632500 (95% [0.622620, 0.642380]):

11299522.arien.ics.muni.cz.R_rm5_2avgBV_EP100_mask_L1e-5 etc.
[0.620000, 0.600000, 0.640000, 0.660000, 0.640000, 0.620000, 0.600000, 0.620000, 0.640000, 0.620000, 0.660000, 0.640000, 0.640000, 0.620000, 0.660000, 0.640000, ]

16x R_rm5_2danBV_EP100_mask_L1e-5 - 0.613750 (95% [0.576124, 0.651376]):

11299524.arien.ics.muni.cz.R_rm5_2danBV_EP100_mask_L1e-5 etc.
[0.660000, 0.660000, 0.640000, 0.640000, 0.620000, 0.600000, 0.660000, 0.420000, 0.600000, 0.700000, 0.620000, 0.460000, 0.620000, 0.620000, 0.660000, 0.640000, ]

9x R_rm5_2rnnBV_EP100_L1e-4_mask - 0.551111 (95% [0.521133, 0.581089]):

11290472.arien.ics.muni.cz.R_rm5_2rnnBV_EP100_L1e-4_mask etc.
[0.540000, 0.600000, 0.600000, 0.560000, 0.540000, 0.600000, 0.500000, 0.520000, 0.500000, ]

8x R_rm5_2cnnBV_EP100_L1e-4_mask - 0.472500 (95% [0.362558, 0.582442]):

11290473.arien.ics.muni.cz.R_rm5_2cnnBV_EP100_L1e-4_mask etc.
[0.360000, 0.600000, 0.400000, 0.580000, 0.620000, 0.600000, 0.340000, 0.280000, ]

6x R_rm5_2rnncnnBV_EP100_L1e-4_mask - 0.440000 (95% [0.308367, 0.571633]):

11290474.arien.ics.muni.cz.R_rm5_2rnncnnBV_EP100_L1e-4_mask etc.
[0.580000, 0.340000, 0.300000, 0.380000, 0.640000, 0.400000, ]

3x R_rm5_2a51BV_EP100_L1e-4_mask - 0.540000 (95% [0.432673, 0.647327]):

11290475.arien.ics.muni.cz.R_rm5_2a51BV_EP100_L1e-4_mask etc.
[0.560000, 0.580000, 0.480000, ]

Joint dataset rm (mc660):

16x R_rm_2avgBV_EP100_L1e-5_mask - 0.592969 (95% [0.583858, 0.602080]):

11304888.arien.ics.muni.cz.R_rm_2avgBV_EP100_L1e-5_mask etc.
[0.600000, 0.600000, 0.612500, 0.587500, 0.587500, 0.600000, 0.575000, 0.587500, 0.600000, 0.625000, 0.562500, 0.600000, 0.612500, 0.575000, 0.600000, 0.562500, ]

16x R_rm_2danBV_EP100_L1e-5_mask_W13 - 0.598437 (95% [0.586229, 0.610646]):

11304890.arien.ics.muni.cz.R_rm_2danBV_EP100_L1e-5_mask_W13 etc.
[0.575000, 0.625000, 0.625000, 0.612500, 0.612500, 0.625000, 0.600000, 0.575000, 0.587500, 0.550000, 0.612500, 0.600000, 0.600000, 0.625000, 0.562500, 0.587500, ]

6x R_rm_2rnnBV_EP100_L1e-4_mask - 0.531250 (95% [0.486604, 0.575896]):

11290454.arien.ics.muni.cz.R_rm_2rnnBV_EP100_L1e-4_mask etc.
[0.575000, 0.450000, 0.562500, 0.562500, 0.525000, 0.512500, ]

3x R_rm_2cnnBV_EP100_L1e-4_mask - 0.512500 (95% [0.205104, 0.819896]):

11290455.arien.ics.muni.cz.R_rm_2cnnBV_EP100_L1e-4_mask etc.
[0.600000, 0.337500, 0.600000, ]

1x R_rm_2rnncnnBV_EP100_L1e-4_mask - 0.337500 (95% [nan, nan]):

11290456.arien.ics.muni.cz.R_rm_2rnncnnBV_EP100_L1e-4_mask etc.
[0.337500, ]

3x R_rm_2a51BV_EP100_L1e-4_mask - 0.495833 (95% [0.457105, 0.534562]):

11290457.arien.ics.muni.cz.R_rm_2a51BV_EP100_L1e-4_mask etc.
[0.512500, 0.500000, 0.475000, ]

3x R_urm_11299592rnnBV_EP100_mask_rmsprop_mlp - 0.562500 (95% [0.511793, 0.613207]):

11305164.arien.ics.muni.cz.R_urm_11299592rnnBV_EP100_mask_rmsprop_mlp etc.
[0.562500, 0.587500, 0.537500, ]

Only one-sentence questions

6x R_rm_2avgBV_EP100_L1e-4_mask_one - 0.393163 (95% [0.246874, 0.539451]):

11290483.arien.ics.muni.cz.R_rm_2avgBV_EP100_L1e-4_mask_one etc.
[0.346154, 0.269231, 0.564103, 0.333333, 0.602564, 0.243590, ]

6x R_rm_2danBV_EP100_L1e-4_mask_one - 0.480769 (95% [0.364836, 0.596703]):

11290484.arien.ics.muni.cz.R_rm_2danBV_EP100_L1e-4_mask_one etc.
[0.525641, 0.320513, 0.551282, 0.564103, 0.333333, 0.589744, ]

3x R_rm_2rnnBV_EP100_L1e-4_mask_one - 0.538462 (95% [0.493421, 0.583502]):

11290485.arien.ics.muni.cz.R_rm_2rnnBV_EP100_L1e-4_mask_one etc.
[0.564103, 0.525641, 0.525641, ]

4x R_rm_2cnnBV_EP100_L1e-4_mask_one - 0.522436 (95% [0.296013, 0.748858]):

11290486.arien.ics.muni.cz.R_rm_2cnnBV_EP100_L1e-4_mask_one etc.
[0.576923, 0.576923, 0.282051, 0.653846, ]

4x R_rm_2rnncnnBV_EP100_L1e-4_mask_one - 0.352564 (95% [0.156095, 0.549033]):

11290487.arien.ics.muni.cz.R_rm_2rnncnnBV_EP100_L1e-4_mask_one etc.
[0.564103, 0.282051, 0.307692, 0.256410, ]

6x R_rm_2a51BV_EP100_L1e-4_mask_one - 0.544872 (95% [0.509488, 0.580256]):

11290488.arien.ics.muni.cz.R_rm_2a51BV_EP100_L1e-4_mask_one etc.
[0.589744, 0.564103, 0.551282, 0.564103, 0.500000, 0.500000, ]

Model Exploration

CNN check:

7x R_rm5_2cnnBV_EP100_L1e-4_mask - 0.500000 (95% [0.391693, 0.608307]):

11290473.arien.ics.muni.cz.R_rm5_2cnnBV_EP100_L1e-4_mask etc.
[0.360000, 0.600000, 0.400000, 0.580000, 0.620000, 0.600000, 0.340000, ]

6x R_rm1_2cnnBV_EP100_L1e-4_mask_c121212 - 0.283333 (95% [0.239316, 0.327350]):

11293382.arien.ics.muni.cz.R_rm1_2cnnBV_EP100_L1e-4_mask_c121212 etc.
[0.266667, 0.333333, 0.200000, 0.300000, 0.300000, 0.300000, ]

6x non-Siamese R_rm1_2cnnSBV_EP100_L1e-4_mask_c121212 - 0.361111 (95% [0.257308, 0.464915]):

11293383.arien.ics.muni.cz.R_rm1_2cnnSBV_EP100_L1e-4_mask_c121212 etc.
[0.466667, 0.266667, 0.400000, 0.200000, 0.366667, 0.466667, ]

6x R_rm5_2cnnBV_EP100_L1e-4_mask_i13d13 - 0.626667 (95% [0.580259, 0.673074]):

11310482.arien.ics.muni.cz.R_rm5_2cnnBV_EP100_L1e-4_mask_i13d13 etc.
[0.680000, 0.660000, 0.620000, 0.580000, 0.660000, 0.560000, ]

Neat!

attn1511 check:

6x R_rm_2a51BV_EP100_L1e-4_mask_one - 0.544872 (95% [0.509488, 0.580256]):

6x R_rm_2a51BV_EP100_L1e-4_mask_one_1 - 0.348291 (95% [0.253339, 0.443242]):

11290497.arien.ics.muni.cz.R_rm_2a51BV_EP100_L1e-4_mask_one_1 etc.
[0.358974, 0.243590, 0.333333, 0.525641, 0.269231, 0.358974, ]

3x R_rm5_2a51BV_EP100_L1e-4_mask - 0.540000 (95% [0.432673, 0.647327]):

11290475.arien.ics.muni.cz.R_rm5_2a51BV_EP100_L1e-4_mask etc.
[0.560000, 0.580000, 0.480000, ]

4x R_rm5_2a51BV_EP100_L1e-4_mask_1 - 0.465000 (95% [0.373938, 0.556062]):

x.R_rm5_2a51BV_EP100_L1e-4_mask_1 etc.
[0.540000, 0.400000, 0.500000, 0.420000, ]

6x R_rm5_2a51BV_EP100_L1e-4_mask_fasgmn_crelu - 0.550000 (95% [0.500406, 0.599594]):

11304873.arien.ics.muni.cz.R_rm5_2a51BV_EP100_L1e-4_mask_fasgmn_crelu etc.
[0.580000, 0.600000, 0.600000, 0.540000, 0.480000, 0.500000, ]

6x R_rm5_2a51BV_EP100_L1e-5_mask_fasgmn_crelu - 0.576667 (95% [0.469359, 0.683974]):

11304877.arien.ics.muni.cz.R_rm5_2a51BV_EP100_L1e-5_mask_fasgmn_crelu etc.
[0.660000, 0.580000, 0.620000, 0.660000, 0.580000, 0.360000, ]

fasgmn ok.

8x R_rm5_2a51BV_EP100_L1e-4_mask_fasgmn_crelu_i13d13 - 0.552500 (95% [0.513343, 0.591657]):

11310469.arien.ics.muni.cz.R_rm5_2a51BV_EP100_L1e-4_mask_fasgmn_crelu_i13d13 etc.
[0.520000, 0.600000, 0.580000, 0.560000, 0.540000, 0.540000, 0.460000, 0.620000, ]

Dropout undecidable.

RNNCNN check

6x R_rm5_2rnncnnBV_EP100_L1e-4_mask - 0.440000 (95% [0.308367, 0.571633]):

11290474.arien.ics.muni.cz.R_rm5_2rnncnnBV_EP100_L1e-4_mask etc.
[0.580000, 0.340000, 0.300000, 0.380000, 0.640000, 0.400000, ]

3x R_rm5_2rnncnnBV_EP100_L1e-4_mask_c121212 - 0.460000 (95% [0.099443, 0.820557]):

11299582.arien.ics.muni.cz.R_rm5_2rnncnnBV_EP100_L1e-4_mask_c121212 etc.
[0.320000, 0.660000, 0.400000, ]

4x R_rm5_2rnncnnBV_EP100_L1e-4_mask_i13d13 - 0.615000 (95% [0.567931, 0.662069]):

11310491.arien.ics.muni.cz.R_rm5_2rnncnnBV_EP100_L1e-4_mask_i13d13 etc.
[0.600000, 0.580000, 0.620000, 0.660000, ]

RNN check

9x R_rm5_2rnnBV_EP100_L1e-4_mask - 0.551111 (95% [0.521133, 0.581089]):

11290472.arien.ics.muni.cz.R_rm5_2rnnBV_EP100_L1e-4_mask etc.
[0.540000, 0.600000, 0.600000, 0.560000, 0.540000, 0.600000, 0.500000, 0.520000, 0.500000, ]

6x R_rm5_2rnnBV_EP100_L1e-4_mask_one - 0.465986 (95% [0.375614, 0.556359]):

11299597.arien.ics.muni.cz.R_rm5_2rnnBV_EP100_L1e-4_mask_one etc.
[0.285714, 0.489796, 0.551020, 0.510204, 0.448980, 0.510204, ]

3x R_rm5_2rnnBV_EP100_L1e-4_mask_ranknet - 0.266667 (95% [0.182222, 0.351111]):

11299926.arien.ics.muni.cz.R_rm5_2rnnBV_EP100_L1e-4_mask_ranknet etc.
[0.280000, 0.220000, 0.300000, ]

6x R_rm5_2rnnBV_EP100_L1e-4_mask_ranknet_bal - 0.253333 (95% [0.230129, 0.276537]):

11299932.arien.ics.muni.cz.R_rm5_2rnnBV_EP100_L1e-4_mask_ranknet_bal etc.
[0.220000, 0.240000, 0.280000, 0.280000, 0.260000, 0.240000, ]

6x R_rm5_2rnnBV_EP100_L1e-5_mask - 0.573333 (95% [0.533757, 0.612910]):

11304854.arien.ics.muni.cz.R_rm5_2rnnBV_EP100_L1e-5_mask etc.
[0.580000, 0.520000, 0.620000, 0.540000, 0.560000, 0.620000, ]

5x R_rm5_2rnnBV_EP100_L1e-4_mask_i13d13 - 0.508000 (95% [0.424006, 0.591994]):

11304860.arien.ics.muni.cz.R_rm5_2rnnBV_EP100_L1e-4_mask_i13d13 etc.
[0.560000, 0.560000, 0.500000, 0.540000, 0.380000, ]

Rejecting i13d13 (from other models) not necessary.

7x R_rm5_2rnnBV_EP100_L1e-4_mask_i13d13_ranknet_balBS8 - 0.302857 (95% [0.243183, 0.362531]):

11311040.arien.ics.muni.cz.R_rm5_2rnnBV_EP100_L1e-4_mask_i13d13_ranknet_balBS8 etc.
[0.380000, 0.280000, 0.260000, 0.280000, 0.200000, 0.320000, 0.400000, ]

Contrastive training bad.

Transfer check

4x R_urm_11299592rnnBV_EP100_mask_rmsprop_dot - 0.359375 (95% [0.285788, 0.432962]):

11305163.arien.ics.muni.cz.R_urm_11299592rnnBV_EP100_mask_rmsprop_dot etc.
[0.400000, 0.350000, 0.287500, 0.400000, ]

15x R_urm_11299592rnnBV_EP100_mask_rmsprop_mlp - 0.568333 (95% [0.555478, 0.581189]):

11305164.arien.ics.muni.cz.R_urm_11299592rnnBV_EP100_mask_rmsprop_mlp etc.
[0.562500, 0.587500, 0.537500, 0.575000, 0.575000, 0.625000, 0.575000, 0.575000, 0.550000, 0.587500, 0.562500, 0.587500, 0.537500, 0.537500, 0.550000, ]