From 7e5251f4b69191c578100c6baf57270c571d616a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alexandre=20D=C3=A9fossez?= Date: Thu, 19 Sep 2024 16:56:20 +0200 Subject: [PATCH] new answer in faq --- FAQ.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/FAQ.md b/FAQ.md index 0cda0bb..b5510cc 100644 --- a/FAQ.md +++ b/FAQ.md @@ -29,3 +29,10 @@ While we keep those limitations in mind for future versions, there is no immedia At the moment no, we might look into adding this feature when we get the time. At the moment it is however possible to use the Rust backend, which should run in int8 with CUDA. + +### Moshi stopped talking after 5 min. + +This is expected on the MLX and Rust implementation. We only use a fixed buffer, and we do not discard +past entries. The PyTorch version should work for unlimited times, although this is mostly untested and we +expect the quality to degrade after a bit (we have no attention sink or other mechanism to improve the streaming +beyond the finite context used at training).