You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ClashLuke opened this issue
May 17, 2022
· 0 comments
Labels
coreImproves core model while keeping core idea intactMLRequires machine-learning knowledge (can be built up on the fly)researchCreative project that might fail but could give high returns
Currently, our model can be either an encoder or a decoder. Combining these two, as in T5, is not possible. The best approximation we could get at the moment would be to expand the context of our decoder, but the performance of a decoder-only model isn't as good. Ideally, we could run full "attention" for one part and sample autoregressive for the other.
This issue discusses ideas for implementing such a scheme and benchmarking it against the baseline fully-autoregressive model.
The text was updated successfully, but these errors were encountered:
ClashLuke
added
engineering
Software-engineering problems that don't require ML-Expertise
ML
Requires machine-learning knowledge (can be built up on the fly)
core
Improves core model while keeping core idea intact
research
Creative project that might fail but could give high returns
and removed
engineering
Software-engineering problems that don't require ML-Expertise
labels
May 17, 2022
coreImproves core model while keeping core idea intactMLRequires machine-learning knowledge (can be built up on the fly)researchCreative project that might fail but could give high returns
Currently, our model can be either an encoder or a decoder. Combining these two, as in T5, is not possible. The best approximation we could get at the moment would be to expand the context of our decoder, but the performance of a decoder-only model isn't as good. Ideally, we could run full "attention" for one part and sample autoregressive for the other.
This issue discusses ideas for implementing such a scheme and benchmarking it against the baseline fully-autoregressive model.
The text was updated successfully, but these errors were encountered: