You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In short, it is a watershed w.r.t. time step, before which position embedding is linearly scaled, and after which position embedding is NTK scaled.
More details: to make a model trained at 1k resolution to generate images at 1.5k or higher resolutions, an extrapolation on position embedding (i.e. RoPE in Lumina) is needed. We find that linear RoPE scaling leads to good global structure and composition, but the nearby pixels tend not to be harmonious; In contrast, NTK scaling makes good local texture, but global structure is usually unreasonable. Therefore, we use a combination of them two, applying linear scaling in the initial diffusion steps to define the global composition (intuitively like to draw a draft), and then switch to NTK for high-quality texture. It follows the same intuition as the method introduced in Sec 2.2 of the Lumina-Next paper but usually behaves more stably.
Hi. Thanks for sharing great works!
I wonder what is the role of
scale_watershed
inLumina-T2X/lumina_next_t2i/models/model.py
Line 921 in 7bc7d7d
Lumina-T2X/lumina_next_t2i/sample.py
Line 312 in 7bc7d7d
The text was updated successfully, but these errors were encountered: