Skip to content

Latest commit

 

History

History
106 lines (79 loc) · 3.69 KB

README.md

File metadata and controls

106 lines (79 loc) · 3.69 KB

音频生成图像(Audio-to-Image Generation)

1. Application introduction


Generate image from audio(w/ prompt or image) with ImageBind's unified latent space and stable-diffusion-2-1-unclip.


Support Tasks


Update

[2023/8/15]:

  • [v0.0]: Support fusing audio, text(prompt) and imnage in ImageBind latent space.

2. Run


example: Use audio generate image across modalities (e.g. Image, Text and Audio) with the model of ImageBind and StableUnCLIPImg2ImgPipeline.

cd applications/Audio2Img

python audio2img_imagebind.py \
--model_name_or_path imagebind-1.2b/ \
--stable_unclip_model_name_or_path stabilityai/stable-diffusion-2-1-unclip \
--input_audio https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/bird_audio.wav \

3. Visualization


Audio to Image

3.1.1 Instruction

cd applications/Audio2Img

python audio2img_imagebind.py \
--model_name_or_path imagebind-1.2b/ \
--stable_unclip_model_name_or_path stabilityai/stable-diffusion-2-1-unclip \
--input_audio https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/bird_audio.wav  \

3.1.2 Result

Input Audio Output Image
bird_audio.wav audio2img_output_bird

Audio+Text to Image

3.2.1 Instruction

cd applications/Audio2Img

python audio2img_imagebind.py \
--model_name_or_path imagebind-1.2b/ \
--stable_unclip_model_name_or_path stabilityai/stable-diffusion-2-1-unclip \
--input_audio https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/bird_audio.wav  \
--input_text 'A photo.' \

3.2.2 Result

Input Audio Input Text Output Image
bird_audio.wav 'A photo.' audio_text_to_img_output_bird_a_photo

Audio+Image to Image

3.3.1 Instruction

cd applications/Audio2Img

python audio2img_imagebind.py \
--model_name_or_path imagebind-1.2b/ \
--stable_unclip_model_name_or_path stabilityai/stable-diffusion-2-1-unclip \
--input_audio https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/wave.wav \
--input_image https://paddlenlp.bj.bcebos.com/models/community/paddlemix/audio-files/dog_image.jpg \

3.3.2 Result

Input Audio Input Image Output Image
wave.wav input_dog_image audio_img_to_img_output_wave_dog