Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighter version of the model #81

Open
0xBEEEF opened this issue Jan 31, 2020 · 5 comments
Open

Lighter version of the model #81

0xBEEEF opened this issue Jan 31, 2020 · 5 comments

Comments

@0xBEEEF
Copy link

0xBEEEF commented Jan 31, 2020

In the readme file in this repository you will find the following sentence:

I'll soon publish a lighter version of the model that should run with less RAM.

This in itself is a bit longer ago. I just wanted to find out how it is. It would be really great if this lightweight model would be released as announced.

@adefossez
Copy link
Contributor

Hey @0xBEEEF , I'm currently under a heavy work load due to the ICML deadline coming up. I will release the lighter models after that. Sorry for the wait...

@adefossez
Copy link
Contributor

@0xBEEEF , took some time today to upload the lighter version ;) you can use -n light instead of -n demucs and it will use the light version.
Light version is 1GB to download and then about 4x times faster than the normal one.

@0xBEEEF
Copy link
Author

0xBEEEF commented Jan 31, 2020

That's just great! You're really a fast guy. I also think it's great that you have made a few more optimizations.

But now I still have a few (maybe stupid) questions:

  1. if I don't want to separate 4 sources as suggested, but only 2, will the load be reduced and I will need less memory? I would only be interested in the separation between speech/voice and the accompaniment, like Spleeter does.

  2. question about the training duration. Unfortunately, I can't afford the high-end graphics cards described in the readme to achieve a similar speed as you do when training. But do you have any approximate guidelines for the amount of time you need to invest in a normal high-end graphics card, such as the Geforce RTX 2080? It doesn't have to be exact to the second, just an approximate guide value.

  3. is the further development of the model planned? I have tried various models so far, and this model delivers outstanding results, especially with very strongly overlapping signals. The results are very impressive. I have only noticed that, for example, very deep male voices are recognized as "bass", and some speech sounds are also recognized as "drum". But all in all this model is ingenious! And the loss of quality is not as strong as with the spectrogram based ones.

All in all, congratulations to you and your whole team for developing this great model.

@adefossez
Copy link
Contributor

Thanks for the feedback :)

  1. That would require training a new model. It could reduce the size of the model or speed but not that much I think, in particular compared with the light model. The bulk of the model is shared by all sources and only the last layer has specific weights per output source.

  2. You can train on a single GPU but indeed it might take a bit of time. You will have to use a small number of channels (48 or 64), a batch size of 4. I estimate the training time to be of the order of a week in that case. You can train a slightly smaller number of epochs than me (60 to 80 epochs will get you most of the perf, maybe even less because a smaller batch size means more iterations per epoch). Use --split_valid, this will limit the amount of memory used at evaluation time. If you start the training but interrupt it before completion, just use the exact same command line flags + --save_model. This will load the checkpoint and save the current best model in a corresponding file under the models folder which you can then use with demucs.separate.

  3. I'm getting close to the end of my PhD. I have other projects to complete but I might still work a bit on the topic. Sadly I don't have the bandwidth to make this into a complete, nice package with docs etc.

@junh1024
Copy link

junh1024 commented Mar 9, 2020

RE: The lighter model is this about reducing the numerical precision of the tensors?

  1. Also, a Voice + others model would be nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants