Effstabledreamfusion #492

jadevaibhav · 2024-07-25T21:48:18Z

Efficient training of DreamFusion-like systems on higher-resolution images

I am working on a feature with Dreamfusion system(which can be extended to others). The basic idea is: to train using a higher-resolution image, we subsample pixels from it for NeRF rendering with a mask. Then we calculate the SDS loss at the original resolution image. The computational benefit is from a subsampling number of rays for NeRF training, while we train using higher resolution images (for a better visual model) in diffusion; resulting in roughly the same compute cost.

On testing using the demo prompt, using 128x128 image resolution and 64x64 subsampling for NeRF training, I get the following result.

I would like any feedback on potential issues with this idea, and how to improve results. I am looking forward to hearing from this community! @DSaurus @voletiv @bennyguo @thuliu-yt16

jadevaibhav · 2024-07-29T21:52:07Z

For comparison, with the efficient sampling method described above, I get ~30 min for training NeRF with 128x128 resolution (subsampled to 64x64). Without efficient sampling I get ~41 min of training duration (128x128 resolution), keeping all other parameters the same.

DSaurus

Hi @jadevaibhav ,

Great job! Thank you for contributing to threestudio. Could you provide examples about how to run efficient dream fusion and some 3D rendering videos of results? Then I'm glad to merge these commits and add this feature in README.

jadevaibhav · 2024-08-26T15:56:47Z

Hi @DSaurus, thanks for your approval! I have created a separate yaml config for this, so you just have to run:

python launch.py --config configs/dreamfusion-sd-eff.yaml --train  system.prompt_processor.prompt="a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes"

Here are the videos I generated, although they are not good quality... I am still investigating where the issue with generation quality is, and if this method can be extended to other generative systems.

it10000-test.mp4

DSaurus · 2024-08-26T23:33:49Z

Hi @jadevaibhav ,

Perhaps you could try to cache the rendering images without gradient first. Then, you sample some rays of this complete rendering image and update the corresponding pixels to do the SDS process. I think it is more robust for 3D generation.

jadevaibhav · 2024-08-26T23:58:40Z

@DSaurus, could you please explain what you mean here?
If I understand correctly, caching multiple images before updating through SDS would be equivalent to directly generating bigger-resolution images. This defeats the purpose of generating a sub-sampled grid... My idea is essentially to take advantage of the continuous representation of 3D space learned through MLP. So at each iteration, we randomly sub-sample a set of ray directions, and over the complete optimization process, we learn at the original (bigger) resolution.

Here's my code of sub-sampling for clarity:

def mask_ray_directions(
    H: int,
    W:int,
    s_H:int,
    s_W:int
    ) -> Float[Tensor, "s_H s_W"]:
    """
    Masking the (H,W) image to (s_H,s_W), for efficient training at higher resolution image.
    pixels from (s_H,s_W) are sampled more (1-aspect_ratio) than outside pixels(aspect_ratio).
    the masking is deferred to before calling get_rays().
    """
    indices_all = torch.meshgrid(
        torch.arange(W, dtype=torch.float32) ,
        torch.arange(H, dtype=torch.float32) ,
        indexing="xy",
    )
    
    mask = torch.zeros(H,W, dtype=torch.bool)
    mask[(H-s_H)//2 : H - math.ceil((H-s_H)/2),(W-s_W)//2 : W - math.ceil((W-s_W)/2)] = True

    in_ind_1d = (indices_all[0]+H*indices_all[1])[mask]
    out_ind_1d = (indices_all[0]+H*indices_all[1])[torch.logical_not(mask)]
    ### tried using 0.5 p ratio of sampling inside vs outside, as smaller area already 
    ### leads to more samples inside anyways

    p = 0.5#(s_H*s_W)/(H*W)
    select_ind = in_ind_1d[
        torch.multinomial(
        torch.ones_like(in_ind_1d)*(1-p),int((1-p)*(s_H*s_W)),replacement=False)]
    select_ind = torch.concatenate(
        [select_ind, out_ind_1d[torch.multinomial(
            torch.ones_like(out_ind_1d)*(p),int((p)*(s_H*s_W)),replacement=False)]
        ],
        dim=0).to(dtype=torch.int).view(s_H,s_W)

    
    return select_ind

DSaurus · 2024-08-27T19:11:28Z

@jadevaibhav Sure, my idea is to use these cached images multiple times, and each time you can apply your sub-sampler to update these images. If my understanding is correct, the current mask sub-sampler will render images that are not complete. However, diffusion models like Stable Diffusion are not designed to recover these incomplete images. I think this is the reason why the current mask sub-sampler leads to unstable results.

jadevaibhav · 2024-08-27T20:12:19Z

@DSaurus the sub-sampler is used on generated directions, so we only pass selected directions to NeRF. And while calculating SDS loss, I pass the original resolution image with rendered color filled at given indices, and 0 elsewhere. I also believe that diffusion is unable to recover the incomplete image.
Rather than creating an incomplete image, I am thinking of doing an interpolation using these rendered colors. This way, even the gradients are not being wasted. What do you think?
I will be happy to continue the caching discussion on Discord if you want. Also, should we merge the current version in the meantime?

jadevaibhav · 2024-09-10T23:19:22Z

Hi @DSaurus thanks for approving the PR! I don't have the write access, so could you please merge?

I looked into the "interpolation", but currently there is no way to do it with randomly sampled positions. I was looking into the grid_sample() method, but I can't define a transformation or mapping from the original resolution coordinate system to the sampled grid coordinates. I am now experimenting with uniform subsampling, with a random offset for the top-left grid corner.

jadevaibhav · 2024-09-23T00:55:44Z

I finished the new experiment, and it works better than before! The training time is still the same (~33 mins)!

it10000-test-new.mp4

DSaurus · 2024-09-29T23:46:59Z

@jadevaibhav LGTM! Could you please create a file named eff_dreamfusion.py in the system folder and put your current code into this file?

jadevaibhav · 2024-09-30T05:00:58Z

Sure!

jadevaibhav · 2024-10-01T20:54:55Z

Done! Please review @DSaurus

DSaurus

@jadevaibhav Thanks!

…dreamfusion Effstabledreamfusion

Fix format

jadevaibhav · 2024-10-02T04:02:02Z

Thanks! I would like to contribute more, is there any new papers/ implementations we're looking at?

DSaurus · 2024-10-02T21:02:05Z

@jadevaibhav I think it would be great if you are interested in implementing Wonder3D and its following papers, which could generate 3D objects in seconds.

jadevaibhav added 3 commits July 23, 2024 16:51

Modified camera data module

950c166

modified masking logic and shape adjust in SD input

7f2f283

(Working)new sampling maskand SD loss edits

ec02948

jadevaibhav marked this pull request as ready for review August 5, 2024 04:46

Changing the subsampling a bit, not better results

bfe69c8

DSaurus previously approved these changes Aug 9, 2024

View reviewed changes

new exp with upsampling before SDS

96cb5b8

jadevaibhav dismissed DSaurus’s stale review via 96cb5b8 September 23, 2024 00:56

jadevaibhav requested a review from DSaurus September 23, 2024 15:41

jadevaibhav added 2 commits October 1, 2024 16:48

refactoring

ee55c66

Merge branch 'main' into effstabledreamfusion

5f4f664

DSaurus previously approved these changes Oct 2, 2024

View reviewed changes

DSaurus and others added 3 commits October 1, 2024 17:45

Merge pull request threestudio-project#505 from jadevaibhav/effstable…

dd19ea6

…dreamfusion Effstabledreamfusion

fix format

083e397

Merge pull request #2 from threestudio-project/saurus/effsd

bc8af2a

Fix format

jadevaibhav dismissed DSaurus’s stale review via bc8af2a October 2, 2024 01:29

init file fix

792b310

jadevaibhav requested a review from DSaurus October 2, 2024 01:42

DSaurus approved these changes Oct 2, 2024

View reviewed changes

DSaurus merged commit bdd6db0 into threestudio-project:main Oct 2, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effstabledreamfusion #492

Effstabledreamfusion #492

jadevaibhav commented Jul 25, 2024 •

edited

Loading

jadevaibhav commented Jul 29, 2024 •

edited

Loading

DSaurus left a comment

jadevaibhav commented Aug 26, 2024

DSaurus commented Aug 26, 2024

jadevaibhav commented Aug 26, 2024

DSaurus commented Aug 27, 2024

jadevaibhav commented Aug 27, 2024

jadevaibhav commented Sep 10, 2024

jadevaibhav commented Sep 23, 2024

DSaurus commented Sep 29, 2024

jadevaibhav commented Sep 30, 2024

jadevaibhav commented Oct 1, 2024

DSaurus left a comment

jadevaibhav commented Oct 2, 2024

DSaurus commented Oct 2, 2024

Effstabledreamfusion #492

Effstabledreamfusion #492

Conversation

jadevaibhav commented Jul 25, 2024 • edited Loading

jadevaibhav commented Jul 29, 2024 • edited Loading

DSaurus left a comment

Choose a reason for hiding this comment

jadevaibhav commented Aug 26, 2024

DSaurus commented Aug 26, 2024

jadevaibhav commented Aug 26, 2024

DSaurus commented Aug 27, 2024

jadevaibhav commented Aug 27, 2024

jadevaibhav commented Sep 10, 2024

jadevaibhav commented Sep 23, 2024

DSaurus commented Sep 29, 2024

jadevaibhav commented Sep 30, 2024

jadevaibhav commented Oct 1, 2024

DSaurus left a comment

Choose a reason for hiding this comment

jadevaibhav commented Oct 2, 2024

DSaurus commented Oct 2, 2024

jadevaibhav commented Jul 25, 2024 •

edited

Loading

jadevaibhav commented Jul 29, 2024 •

edited

Loading