We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello. Thank you for your wonderful code :) I have a question about the freqs_cis term in the apply_rope function in modules/layers.py.
This function is used for attention, and if we look at model.py, we can see that the embeddings of txt_id and img_id are used as the freqs_cis term.
What are txt_id and img_id? Do we need any other terms besides the text and music pairs?
I commented out the apply_rope function and trained my model with just text/music pairs, but I didn't get good results.
It would be great if you could tell me what format this data is in.
Thank you
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hello. Thank you for your wonderful code :)
I have a question about the freqs_cis term in the apply_rope function in modules/layers.py.
This function is used for attention, and if we look at model.py, we can see that the embeddings of txt_id and img_id are used as the freqs_cis term.
What are txt_id and img_id? Do we need any other terms besides the text and music pairs?
I commented out the apply_rope function and trained my model with just text/music pairs, but I didn't get good results.
It would be great if you could tell me what format this data is in.
Thank you
The text was updated successfully, but these errors were encountered: