Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The current usage involves porting the open-source model Stable Diffusion 3 Medium to the SG2300X chip series products via the Sophon SDK for local TPU hardware-accelerated inference, enabling fast inference to generate stylized images with text, and using Gradio for user interaction.
For more technical details on Stable Diffusion 3 Medium, please refer to the official website and the research paper.
Recommend TPU Memory: NPU->7615MB, VPU->2360MB, VPP->2360MB
-
Clone the repository
git clone https://github.com/zifeng-radxa/SD3-Medium-TPU.git
-
Download the Stable Diffusion 3 Medium models package provided by radxa
Users can also compile the Stable Diffusion 3 Medium model by referring to Model Conversion
cd SD3-Medium-TPU/python_demo/ bash tar_downloader.sh
-
Extract the model in the current directory
tar -xvf models.tar.gz
-
Configure the environment
cd SD3-Medium-TPU/python_demo/ python3 -m virtualenv .venv source .venv/bin/activate
-
Install dependencies
pip3 install --upgrade pip pip3 install -r requirements.txt
-
Start the Web service
python3 gr.py
-
Access the Airbox IP address on port 8999 via a browser
Prompt: A cat with a sign text Welcome to radxa!
TODO
Community License: Free for research, non-commercial, and commercial use. You only need a paid Enterprise license if your yearly revenues exceed USD$1M and you use Stability AI models in commercial products or services. Read more: https://stability.ai/license
For companies above this revenue threshold: please contact us: https://stability.ai/enterprise