Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal support for training/inference #147

Open
almondai opened this issue Oct 17, 2023 · 4 comments
Open

Metal support for training/inference #147

almondai opened this issue Oct 17, 2023 · 4 comments

Comments

@almondai
Copy link

Is there a timeline for adding metal backend to the Taichi training code? I have other GPUs but apple silicon macs have a lot of unified memory and are very energy efficient. I think it would be a good long term platform for experimenting with Gaussian splats.

@wanmeihuali
Copy link
Owner

@almondai the latest Taichi version(please see their website to compile the latest version) does run on Metal, but it's not fully optimized yet, causing low fps rates during inference and slower training times. While memory isn't a bottleneck for 3D Gaussian splats (usually under 6GB), the Metal backend on Apple Silicon may require specific optimizations for improved performance, e.g. tune tile/share memory size.

If you're keen, diving into these optimizations could be a fascinating challenge, though quite time-consuming.

@almondai
Copy link
Author

@wanmeihuali thanks, I will check out the Taichi repo and hopefully share some numbers on training runtimes. Also I am interested in your code's lower memory usage (<8gb) while the original repo from the authors suggested at least 24gb of VRAM for highest fidelity?

@wanmeihuali
Copy link
Owner

Hi @almondai , I don't think the official implementation need that much VRAM... Anyway, the GPU memory usage are highly depends on image resolution and num of Gaussian points. For the current truck scene running on AWS sagemaker T4(16GB VRAM), the GPU Memory Utilization is around 21%(3.4GB).
image

@almondai
Copy link
Author

I was able to run it on Apple silicon/metal after compiling Taichi from source (v1.7.0) to link with the metal3 API. Then I trained the truck dataset on a m2 ultra with 60 gpu cores. It was pretty slow, but I also noticed errors in the output:

Here is a screenshot of the Truck scene trained on Metal (and rendered via Metal)

mac_metal_Screenshot 2023-11-21 at 9 47 47 PM

So I ran the same training parameters using a CUDA/Nvidia backend and here is the screenshot of the same scene but trained on CUDA (rendered on Linux)

Linux_cuda_Screenshot from 2023-11-21 21-45-15

Linux_cuda_Screenshot from 2023-11-21 21-44-48

and the same parquet file (CUDA) rendered on a Mac with metal backend.

mac_cuda_Screenshot 2023-11-21 at 9 44 24 PM

mac_cuda_Screenshot 2023-11-21 at 9 43 58 PM

It seems to me the forward pass on metal is mostly correct. However the backward pass is not working correctly although the training did complete all 30,000 iterations without crashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants