Generate image from audio(w/ prompt or image) with ImageBind's unified latent space and stable-diffusion-2-1-unclip.
- No training is need.
- Integration with ppdiffusers.
Support Tasks
Update
[2023/8/15]:
- [v0.0]: Support fusing audio, text(prompt) and imnage in ImageBind latent space.
example: Use audio generate image across modalities (e.g. Image, Text and Audio) with the model of ImageBind and StableUnCLIPImg2ImgPipeline.
cd applications/Audio2Img
python audio2img_imagebind.py \
--model_name_or_path imagebind-1.2b/ \
--stable_unclip_model_name_or_path stabilityai/stable-diffusion-2-1-unclip \
--input_audio https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/bird_audio.wav \
cd applications/Audio2Img
python audio2img_imagebind.py \
--model_name_or_path imagebind-1.2b/ \
--stable_unclip_model_name_or_path stabilityai/stable-diffusion-2-1-unclip \
--input_audio https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/bird_audio.wav \
Input Audio | Output Image |
---|---|
bird_audio.wav |
cd applications/Audio2Img
python audio2img_imagebind.py \
--model_name_or_path imagebind-1.2b/ \
--stable_unclip_model_name_or_path stabilityai/stable-diffusion-2-1-unclip \
--input_audio https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/bird_audio.wav \
--input_text 'A photo.' \
Input Audio | Input Text | Output Image |
---|---|---|
bird_audio.wav | 'A photo.' |
cd applications/Audio2Img
python audio2img_imagebind.py \
--model_name_or_path imagebind-1.2b/ \
--stable_unclip_model_name_or_path stabilityai/stable-diffusion-2-1-unclip \
--input_audio https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/wave.wav \
--input_image https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/dog_image.jpg \
Input Audio | Input Image | Output Image |
---|---|---|
wave.wav |