-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metal support for training/inference #147
Comments
@almondai the latest Taichi version(please see their website to compile the latest version) does run on Metal, but it's not fully optimized yet, causing low fps rates during inference and slower training times. While memory isn't a bottleneck for 3D Gaussian splats (usually under 6GB), the Metal backend on Apple Silicon may require specific optimizations for improved performance, e.g. tune tile/share memory size. If you're keen, diving into these optimizations could be a fascinating challenge, though quite time-consuming. |
@wanmeihuali thanks, I will check out the Taichi repo and hopefully share some numbers on training runtimes. Also I am interested in your code's lower memory usage (<8gb) while the original repo from the authors suggested at least 24gb of VRAM for highest fidelity? |
Hi @almondai , I don't think the official implementation need that much VRAM... Anyway, the GPU memory usage are highly depends on image resolution and num of Gaussian points. For the current truck scene running on AWS sagemaker T4(16GB VRAM), the GPU Memory Utilization is around 21%(3.4GB). |
I was able to run it on Apple silicon/metal after compiling Taichi from source (v1.7.0) to link with the metal3 API. Then I trained the truck dataset on a m2 ultra with 60 gpu cores. It was pretty slow, but I also noticed errors in the output: Here is a screenshot of the Truck scene trained on Metal (and rendered via Metal) So I ran the same training parameters using a CUDA/Nvidia backend and here is the screenshot of the same scene but trained on CUDA (rendered on Linux) and the same parquet file (CUDA) rendered on a Mac with metal backend. It seems to me the forward pass on metal is mostly correct. However the backward pass is not working correctly although the training did complete all 30,000 iterations without crashing. |
Is there a timeline for adding metal backend to the Taichi training code? I have other GPUs but apple silicon macs have a lot of unified memory and are very energy efficient. I think it would be a good long term platform for experimenting with Gaussian splats.
The text was updated successfully, but these errors were encountered: