Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoder-Decoder Architecture #44

Open
ClashLuke opened this issue May 17, 2022 · 0 comments
Open

Encoder-Decoder Architecture #44

ClashLuke opened this issue May 17, 2022 · 0 comments
Labels
core Improves core model while keeping core idea intact ML Requires machine-learning knowledge (can be built up on the fly) research Creative project that might fail but could give high returns

Comments

@ClashLuke
Copy link
Member

Currently, our model can be either an encoder or a decoder. Combining these two, as in T5, is not possible. The best approximation we could get at the moment would be to expand the context of our decoder, but the performance of a decoder-only model isn't as good. Ideally, we could run full "attention" for one part and sample autoregressive for the other.
This issue discusses ideas for implementing such a scheme and benchmarking it against the baseline fully-autoregressive model.

@ClashLuke ClashLuke added engineering Software-engineering problems that don't require ML-Expertise ML Requires machine-learning knowledge (can be built up on the fly) core Improves core model while keeping core idea intact research Creative project that might fail but could give high returns and removed engineering Software-engineering problems that don't require ML-Expertise labels May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Improves core model while keeping core idea intact ML Requires machine-learning knowledge (can be built up on the fly) research Creative project that might fail but could give high returns
Projects
None yet
Development

No branches or pull requests

1 participant