Skip to content

1605EightGrade

Petr Baudis edited this page May 16, 2016 · 5 revisions

1605 ai2-8grade HypEv Experiments

All these experiments are done with the new BV_EP100 vocabulary mode as considered in 1605BigVocab.

We have three variants: r8c for ck12 memory snippet sources, r8e for enwiki memory snippet sources, and r8 which has the two merged together. We use r8c as the reference dataset.

Master Table

  • R_r8c_2avgBV_EP100_mask_L1e-5
  • R_r8c_2danBV_EP100_mask_L1e-5_W13
  • R_r8c_2rnnBV_EP100_L1e-4_mask_i13d13
  • R_r8c_2cnnBV_EP100_L1e-4_mask_i13d13 (8)
  • R_r8c_2rnncnnBV_EP100_L1e-4_mask_i13d13 (8)
  • R_r8c_2a51BV_EP100_L1e-4_mask_fasgmn_crelu (8)
  • R_ur8c11299592rnnBV_EP100_mask_rmsprop_mlp
Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.505379 0.442402 0.779080 0.400881 0.687528 (defaults)
±0.024486 ±0.021844 ±0.018049 ±0.015593 ±0.013768
DAN 0.555840 0.491422 0.816840 0.390969 0.686856 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.038450 ±0.014991 ±0.013183 ±0.007895 ±0.006521
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.711967 0.381176 0.730000 0.360705 0.658262 dropout=1/3 inp_e_dropout=1/3
±0.052589 ±0.015708 ±0.012071 ±0.012326 ±0.008562
cnn 0.675973 0.442402 0.780961 0.384086 0.686716 dropout=1/3 inp_e_dropout=1/3
±0.056218 ±0.012234 ±0.012751 ±0.011142 ±0.009110
rnncnn 0.582480 0.438725 0.780527 0.375826 0.679659 dropout=1/3 inp_e_dropout=1/3
±0.056659 ±0.024469 ±0.023307 ±0.014320 ±0.011600
attn1511 0.724898 0.383578 0.724826 0.357654 0.658294 focus_act='sigmoid/maxnorm' cnnact='relu'
±0.069447 ±0.011663 ±0.011245 ±0.015420 ±0.011558
Ubu. RNN w/ MLP 0.569672 0.493873 0.827836 0.441355 0.728019 vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add_kw':False} opt='rmsprop'
±0.058833 ±0.012373 ±0.011304 ±0.010668 ±0.007045

no-relevance variants

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.445953 0.427696 0.776331 0.366189 0.660562 rel_mode=None
±0.015408 ±0.020307 ±0.016901 ±0.010194 ±0.008976
avg 0.421363 0.411765 0.752170 0.364262 0.657202 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode='bm25'
±0.025634 ±0.025858 ±0.025434 ±0.010684 ±0.008501
avg 0.508197 0.486520 0.813802 0.406663 0.698505 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] f_add_S2=['bm25']
±0.016470 ±0.007587 ±0.006840 ±0.009561 ±0.007658
avg 0.465676 0.479167 0.812211 0.414648 0.701333 prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.008130 ±0.009752 ±0.009105 ±0.007780 ±0.005593
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.694160 0.355392 0.712963 0.348568 0.648690 dropout=1/3 inp_e_dropout=1/3 rel_mode=None
±0.106601 ±0.027719 ±0.022360 ±0.014551 ±0.010562
cnn 0.704918 0.419118 0.761285 0.384912 0.682740 dropout=1/3 inp_e_dropout=1/3 rel_mode=None
±0.090636 ±0.024503 ±0.022660 ±0.019742 ±0.014265
rnncnn 0.549693 0.446078 0.785012 0.372247 0.675291 inp_e_dropout=1/3 dropout=1/3 rel_mode=None
±0.050194 ±0.022817 ±0.016462 ±0.023073 ±0.015974
attn1511 0.783299 0.352941 0.699653 0.351322 0.650370 rel_mode=None focus_act='sigmoid/maxnorm' cnnact='relu'
±0.060942 ±0.032785 ±0.032089 ±0.010044 ±0.005568
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
Ubu. RNN w/ MLP 0.494109 0.463235 0.789641 0.416300 0.712618 vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add':[]} opt='rmsprop' rel_mode=None
±0.030447 ±0.008954 ±0.007800 ±0.010789 ±0.006459
Ubu. RNN w/ MLP 0.609631 0.433824 0.767361 0.417952 0.711358 vocabt='ubuntu' pdim=1 ptscorer=B.mlp_ptscorer dropout=0 inp_e_dropout=0 task1_conf={'ptscorer':B.dot_ptscorer, 'f_add':[]} opt='rmsprop' prescoring='termfreq' prescoring_weightsf='weights-anssel-termfreq-3368350fbcab42e4-bestval.h5' prescoring_input='bm25' f_add=['bm25'] rel_mode=None f_add_S1=['bm25']
±0.103653 ±0.017266 ±0.020435 ±0.009468 ±0.006757

Baselines

(acc is AbcdAccuracy)

r8c:

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.290301 0.290850 0.638889 0.293686 0.592443 (defaults)
±0.033995 ±0.024967 ±0.019934 ±0.011219 ±0.011900
DAN 0.245902 0.303922 0.408565 0.287812 0.427867 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.013140 ±0.030866 ±0.057874 ±0.015715 ±0.046143
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.722678 0.424837 0.769676 0.390602 0.680556 (defaults)
±0.156078 ±0.049935 ±0.041505 ±0.033515 ±0.028432
cnn 0.300546 0.271242 0.618056 0.284141 0.590576 (defaults)
±0.040626 ±0.030094 ±0.028330 ±0.021643 ±0.011880
rnncnn 0.353825 0.323529 0.668981 0.298091 0.608498 (defaults)
±0.092234 ±0.044053 ±0.043229 ±0.021297 ±0.019228
attn1511 0.477459 0.339869 0.687114 0.326725 0.630003 (defaults)
±0.193444 ±0.038801 ±0.029131 ±0.037975 ±0.030858

r8e:

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.280330 0.348958 0.636616 0.266049 0.547683 (defaults)
±0.044841 ±0.037472 ±0.028759 ±0.018288 ±0.009624
DAN 0.244994 0.296875 0.534848 0.266049 0.513158 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.027805 ±0.025047 ±0.082815 ±0.008811 ±0.056038
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.463486 0.361979 0.642424 0.295062 0.575405 (defaults)
±0.193759 ±0.029051 ±0.018036 ±0.025256 ±0.018475
cnn 0.219670 0.299479 0.596212 0.261111 0.548752 (defaults)
±0.014743 ±0.033359 ±0.022407 ±0.013417 ±0.009095
rnncnn 0.316254 0.335938 0.618182 0.258025 0.547402 (defaults)
±0.068402 ±0.024596 ±0.010901 ±0.014658 ±0.018273
attn1511 0.397527 0.390625 0.664141 0.236420 0.532276 (defaults)
±0.069729 ±0.042338 ±0.027718 ±0.028422 ±0.022669

r8:

Model trn Acc val Acc val MRR tst Acc tst MRR settings
avg 0.261484 0.315104 0.594303 0.275309 0.537929 (defaults)
±0.040622 ±0.029051 ±0.018859 ±0.027908 ±0.019492
DAN 0.290342 0.302083 0.575893 0.289506 0.542025 inp_e_dropout=0 inp_w_dropout=1/3 deep=2 pact='relu'
±0.017316 ±0.024444 ±0.016973 ±0.010625 ±0.009295
-------------------------- ---------- ---------- ---------- ---------- ----------- ----------
rnn 0.392815 0.341146 0.611529 0.300617 0.558611 (defaults)
±0.165470 ±0.025782 ±0.035079 ±0.014786 ±0.010929
cnn 0.213781 0.328125 0.568729 0.267284 0.515060 (defaults)
±0.009753 ±0.051853 ±0.039688 ±0.034358 ±0.066781
rnncnn 0.285630 0.351562 0.620571 0.273457 0.546830 (defaults)
±0.041758 ±0.045155 ±0.037842 ±0.019228 ±0.011815
attn1511 0.476443 0.359375 0.619522 0.293210 0.551151 (defaults)
±0.187898 ±0.035423 ±0.027051 ±0.044679 ±0.024191

It seems that r8c is the best dataset to use - r8e or ensemble of both seems no good.

Re-testing r8c with masking enabled and l2reg=1e-4:

6x R_r8c_2avgBV_EP100_L1e-4_mask - 0.398692 (95% [0.328744, 0.468641]):

11290443.arien.ics.muni.cz.R_r8c_2avgBV_EP100_L1e-4_mask etc.
[0.352941, 0.450980, 0.470588, 0.470588, 0.333333, 0.313725, ]

6x R_r8c_2danBV_EP100_L1e-4_mask - 0.405229 (95% [0.359731, 0.450727]):

11290444.arien.ics.muni.cz.R_r8c_2danBV_EP100_L1e-4_mask etc.
[0.352941, 0.352941, 0.392157, 0.470588, 0.431373, 0.431373, ]

16x R_r8c_2avgBV_EP100_mask_L1e-5 - 0.442402 (95% [0.420557, 0.464246]):

11299356.arien.ics.muni.cz.R_r8c_2avgBV_EP100_mask_L1e-5 etc.
[0.470588, 0.450980, 0.352941, 0.490196, 0.333333, 0.431373, 0.470588, 0.450980, 0.431373, 0.470588, 0.470588, 0.470588, 0.450980, 0.450980, 0.450980, 0.431373, ]

16x R_r8c_2danBV_EP100_mask_L1e-5_W13 - 0.491422 (95% [0.476431, 0.506413]):

11299352.arien.ics.muni.cz.R_r8c_2danBV_EP100_mask_L1e-5_W13 etc.
[0.490196, 0.490196, 0.450980, 0.549020, 0.490196, 0.470588, 0.529412, 0.509804, 0.509804, 0.509804, 0.470588, 0.509804, 0.470588, 0.431373, 0.490196, 0.490196, ]

6x R_r8c_2rnnBV_EP100_L1e-4_mask - 0.326797 (95% [0.311460, 0.342134]):

11290445.arien.ics.muni.cz.R_r8c_2rnnBV_EP100_L1e-4_mask etc.
[0.313725, 0.313725, 0.352941, 0.313725, 0.333333, 0.333333, ]

6x R_r8c_2cnnBV_EP100_L1e-4_mask - 0.431372 (95% [0.402272, 0.460473]):

11290446.arien.ics.muni.cz.R_r8c_2cnnBV_EP100_L1e-4_mask etc.
[0.450980, 0.431373, 0.450980, 0.372549, 0.450980, 0.431373, ]

6x R_r8c_2rnncnnBV_EP100_L1e-4_mask - 0.382353 (95% [0.333731, 0.430975]):

11290447.arien.ics.muni.cz.R_r8c_2rnncnnBV_EP100_L1e-4_mask etc.
[0.392157, 0.450980, 0.392157, 0.294118, 0.372549, 0.392157, ]

6x R_r8c_2a51BV_EP100_L1e-4_mask - 0.405229 (95% [0.368292, 0.442166]):

11290448.arien.ics.muni.cz.R_r8c_2a51BV_EP100_L1e-4_mask etc.
[0.392157, 0.372549, 0.372549, 0.431373, 0.470588, 0.392157, ]

Model Tuning

Original, l2reg=1e-4

6x R_r8_2avgBV_EP100_L1e-4 - 0.463542 (95% [0.434107, 0.492976]):

11288548.arien.ics.muni.cz.R_r8_2avgBV_EP100_L1e-4 etc.
[0.468750, 0.468750, 0.468750, 0.468750, 0.500000, 0.406250, ]

6x R_r8_2danBV_EP100_L1e-4 - 0.460938 (95% [0.429898, 0.491977]):

11288549.arien.ics.muni.cz.R_r8_2danBV_EP100_L1e-4 etc.
[0.453125, 0.500000, 0.468750, 0.484375, 0.453125, 0.406250, ]

6x R_r8_2rnnBV_EP100_L1e-4 - 0.354167 (95% [0.326296, 0.382037]):

11288550.arien.ics.muni.cz.R_r8_2rnnBV_EP100_L1e-4 etc.
[0.312500, 0.328125, 0.359375, 0.390625, 0.359375, 0.375000, ]

5x R_r8_2cnnBV_EP100_L1e-4 - 0.400000 (95% [0.304481, 0.495519]):

11288551.arien.ics.muni.cz.R_r8_2cnnBV_EP100_L1e-4 etc.
[0.406250, 0.359375, 0.406250, 0.296875, 0.531250, ]

6x R_r8_2rnncnnBV_EP100_L1e-4 - 0.395833 (95% [0.319117, 0.472550]):

11288552.arien.ics.muni.cz.R_r8_2rnncnnBV_EP100_L1e-4 etc.
[0.484375, 0.406250, 0.437500, 0.421875, 0.375000, 0.250000, ]

6x R_r8_2a51BV_EP100_L1e-4 - 0.361979 (95% [0.324808, 0.399151]):

11288553.arien.ics.muni.cz.R_r8_2a51BV_EP100_L1e-4 etc.
[0.390625, 0.312500, 0.406250, 0.328125, 0.390625, 0.343750, ]

Seems like a good idea!

RNN checks

6x R_r8c_2rnnBV_EP100_L1e-4_mask - 0.326797 (95% [0.311460, 0.342134]):

25x R_r8c_2rnnBV_EP100_L1e-4_mask_i13d13 - 0.381177 (95% [0.365469, 0.396884]):

11299310.arien.ics.muni.cz.R_r8c_2rnnBV_EP100_L1e-4_mask_i13d13 etc.
[0.431373, 0.411765, 0.372549, 0.411765, 0.352941, 0.333333, 0.372549, 0.352941, 0.352941, 0.411765, 0.333333, 0.411765, 0.411765, 0.352941, 0.372549, 0.392157, 0.411765, 0.431373, 0.313725, 0.294118, 0.431373, 0.392157, 0.431373, 0.372549, 0.372549, ]

6x R_r8c_2rnnBV_EP100_L1e-4_mask_i12d12 - 0.359477 (95% [0.309542, 0.409412]):

11301277.arien.ics.muni.cz.R_r8c_2rnnBV_EP100_L1e-4_mask_i12d12 etc.
[0.313725, 0.411765, 0.411765, 0.333333, 0.392157, 0.294118, ]

6x R_r8c_2rnnBV_EP100_L1e-4_mask_i23d23 - 0.326797 (95% [0.296123, 0.357472]):

11301264.arien.ics.muni.cz.R_r8c_2rnnBV_EP100_L1e-4_mask_i23d23 etc.
[0.333333, 0.313725, 0.372549, 0.294118, 0.352941, 0.294118, ]

6x R_r8c_2rnnBV_EP100_L1e-4_mask_i45d45 - 0.320261 (95% [0.285287, 0.355236]):

11299320.arien.ics.muni.cz.R_r8c_2rnnBV_EP100_L1e-4_mask_i45d45 etc.
[0.313725, 0.352941, 0.333333, 0.313725, 0.352941, 0.254902, ]

i13d13 dropout very worthwhile.

7x R_r8c_2rnnBV_EP100_L1e-5_mask_i13d13 - 0.366947 (95% [0.329764, 0.404129]):

11304784.arien.ics.muni.cz.R_r8c_2rnnBV_EP100_L1e-5_mask_i13d13 etc.
[0.392157, 0.392157, 0.313725, 0.352941, 0.431373, 0.372549, 0.313725, ]

L2 regularization not harmful.

4x R_r8c_2rnnBV_EP100_L1e-4_mask_s1_i13d13 - 0.382353 (95% [0.301291, 0.463414]):

11309750.arien.ics.muni.cz.R_r8c_2rnnBV_EP100_L1e-4_mask_s1_i13d13 etc.
[0.333333, 0.450980, 0.411765, 0.333333, ]

Less parameters not clearly beneficial.

CNN checks

5x R_r8_2cnnBV_EP100_L1e-4 - 0.400000 (95% [0.304481, 0.495519]):

6x R_r8_2cnnBV_EP100_L1e-4_c121212 - 0.403646 (95% [0.356866, 0.450426]):

11288556.arien.ics.muni.cz.R_r8_2cnnBV_EP100_L1e-4_c121212 etc.
[0.406250, 0.312500, 0.453125, 0.437500, 0.406250, 0.406250, ]

6x R_r8c_2cnnBV_EP100_L1e-4_mask - 0.431372 (95% [0.402272, 0.460473]):

6x R_r8c_2cnnBV_EP100_L1e-4_mask_ranknet - 0.307190 (95% [0.281525, 0.332854]):

11299928.arien.ics.muni.cz.R_r8c_2cnnBV_EP100_L1e-4_mask_ranknet etc.
[0.313725, 0.352941, 0.313725, 0.294118, 0.294118, 0.274510, ]

6x R_r8c_2cnnBV_EP100_L1e-4_mask_ranknet_bal - 0.290850 (95% [0.256384, 0.325316]):

11299930.arien.ics.muni.cz.R_r8c_2cnnBV_EP100_L1e-4_mask_ranknet_bal etc.
[0.254902, 0.294118, 0.294118, 0.254902, 0.294118, 0.352941, ]

8x R_r8c_2cnnBV_EP100_L1e-4_mask_bal - 0.424020 (95% [0.404048, 0.443991]):

11304715.arien.ics.muni.cz.R_r8c_2cnnBV_EP100_L1e-4_mask_bal etc.
[0.411765, 0.411765, 0.450980, 0.411765, 0.431373, 0.411765, 0.470588, 0.392157, ]

Balancing nor ranknet worth it.

8x R_r8c_2cnnBV_EP100_L1e-5_mask - 0.392157 (95% [0.354597, 0.429717]):

11304716.arien.ics.muni.cz.R_r8c_2cnnBV_EP100_L1e-5_mask etc.
[0.392157, 0.352941, 0.470588, 0.431373, 0.372549, 0.333333, 0.431373, 0.352941, ]

L2 regularization worthwhile.

8x R_r8c_2cnnBV_EP100_L1e-4_mask_i13d13 - 0.431373 (95% [0.417176, 0.445569]):

11310683.arien.ics.muni.cz.R_r8c_2cnnBV_EP100_L1e-4_mask_s1_i13d13 etc.
[0.411765, 0.431373, 0.450980, 0.450980, 0.431373, 0.450980, 0.411765, 0.411765, ]

Dropout inconclusive.

8x R_r8c_2cnnBV_EP100_L1e-4_mask_ranknet_balBS8 - 0.294118 (95% [0.268199, 0.320036]):

11311037.arien.ics.muni.cz.R_r8c_2cnnBV_EP100_L1e-4_mask_ranknet_balBS8 etc.
[0.254902, 0.254902, 0.274510, 0.313725, 0.294118, 0.352941, 0.294118, 0.313725, ]

No contrastive loss suitable.

attn1511 checks

6x R_r8_2a51BV_EP100_L1e-4 - 0.361979 (95% [0.324808, 0.399151]):

6x R_r8_2a51BV_EP100_L1e-4_s2 - 0.390625 (95% [0.339643, 0.441607]):

11289063.arien.ics.muni.cz.R_r8_2a51BV_EP100_L1e-4_s2 etc.
[0.375000, 0.421875, 0.296875, 0.453125, 0.390625, 0.406250, ]

Uh, maybe.

6x R_r8c_2a51BV_EP100_L1e-4_mask - 0.405229 (95% [0.368292, 0.442166]):

6x R_r8c_2a51BV_EP100_L1e-4_mask_1 - 0.366013 (95% [0.309452, 0.422575]):

11290450.arien.ics.muni.cz.R_r8c_2a51BV_EP100_L1e-4_mask_1 etc.
[0.392157, 0.392157, 0.431373, 0.392157, 0.313725, 0.274510, ]

Not so catastrophic as rg, but...

8x R_r8c_2a51BV_EP100_L1e-4_mask_fasgmn_crelu - 0.394608 (95% [0.377342, 0.411874]):

11305165.arien.ics.muni.cz.R_r8c_2a51BV_EP100_L1e-4_mask_fasgmn_crelu etc.
[0.392157, 0.352941, 0.411765, 0.392157, 0.411765, 0.411765, 0.411765, 0.372549, ]

4x R_r8c_2a51BV_EP100_L1e-5_mask_fasgmn_crelu - 0.397059 (95% [0.356528, 0.437590]):

11305166.arien.ics.muni.cz.R_r8c_2a51BV_EP100_L1e-5_mask_fasgmn_crelu etc.
[0.372549, 0.431373, 0.372549, 0.411765, ]

cnnact='relu' and focus_act='sigmoid/maxnorm' seems again pretty good.

8x R_r8c_2a51BV_EP100_L1e-4_mask_fasgmn_crelu_i13d13 - 0.401961 (95% [0.378778, 0.425143]):

11309634.arien.ics.muni.cz.R_r8c_2a51BV_EP100_L1e-4_mask_fasgmn_crelu_i13d13 etc.
[0.450980, 0.411765, 0.352941, 0.411765, 0.372549, 0.392157, 0.411765, 0.411765, ]

RNNCNN checks

6x R_r8c_2rnncnnBV_EP100_L1e-4_mask - 0.382353 (95% [0.333731, 0.430975]):

8x R_r8c_2rnncnnBV_EP100_L1e-4_mask_i13d13 - 0.431372 (95% [0.388783, 0.473962]):

11310703.arien.ics.muni.cz.R_r8c_2rnncnnBV_EP100_L1e-4_mask_i13d13 etc.
[0.470588, 0.372549, 0.352941, 0.450980, 0.509804, 0.470588, 0.431373, 0.392157, ]

Dropout inconclusive.

Transfer checks

16x R_ur8c11299592rnnBV_EP100_mask_rmsprop_mlp - 0.493872 (95% [0.481499, 0.506246]):

11305161.arien.ics.muni.cz.R_ur8c11299592rnnBV_EP100_mask_rmsprop_mlp etc.
[0.509804, 0.470588, 0.450980, 0.490196, 0.549020, 0.490196, 0.490196, 0.490196, 0.470588, 0.490196, 0.470588, 0.490196, 0.490196, 0.509804, 0.529412, 0.509804, ]

4x R_ur8c11299592rnnBV_EP100_mask_rmsprop_dot - 0.382353 (95% [0.292736, 0.471969]):

11305162.arien.ics.muni.cz.R_ur8c11299592rnnBV_EP100_mask_rmsprop_dot etc.
[0.313725, 0.470588, 0.372549, 0.372549, ]

8x R_ur8ca51_11299592rnnBV_EP100_mask_rmsprop_mlp - 0.399510 (95% [0.385462, 0.413558]):

11310532.arien.ics.muni.cz.R_ur8ca51_11299592rnnBV_EP100_mask_rmsprop_mlp etc.
[0.392157, 0.411765, 0.411765, 0.372549, 0.392157, 0.392157, 0.431373, 0.392157, ]
Clone this wiki locally