Skip to content

Commit

Permalink
clarify frame size
Browse files Browse the repository at this point in the history
  • Loading branch information
adefossez committed Sep 19, 2024
1 parent 7e5251f commit 3df7e80
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 2 deletions.
5 changes: 3 additions & 2 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ it is however possible to use the Rust backend, which should run in int8 with CU

### Moshi stopped talking after 5 min.

This is expected on the MLX and Rust implementation. We only use a fixed buffer, and we do not discard
past entries. The PyTorch version should work for unlimited times, although this is mostly untested and we
This is expected on the MLX and Rust implementation.
We only use a fixed buffer, and we do not discard past entries.
The PyTorch version should work for unlimited times, although this is mostly untested and we
expect the quality to degrade after a bit (we have no attention sink or other mechanism to improve the streaming
beyond the finite context used at training).
6 changes: 6 additions & 0 deletions moshi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,12 @@ with torch.no_grad():
codes = mimi.encode(frame)
assert codes.shape[-1] == 1, codes.shape
all_codes.append(codes)

## WARNING: When streaming, make sure to always feed a total amount of audio that is a multiple
# of the frame size (1920), otherwise the last frame will not be complete, and thus
# will not be encoded. For simplicity, we recommend feeding in audio always in multiple
# of the frame size, so that you always know how many time steps you get back in `codes`.

# Now if you have a GPU around.
mimi.cuda()
moshi_weight = hf_hub_download(loaders.DEFAULT_REPO, loaders.MOSHI_NAME)
Expand Down

0 comments on commit 3df7e80

Please sign in to comment.