Metal support for training/inference #147

almondai · 2023-10-17T15:46:31Z

Is there a timeline for adding metal backend to the Taichi training code? I have other GPUs but apple silicon macs have a lot of unified memory and are very energy efficient. I think it would be a good long term platform for experimenting with Gaussian splats.

wanmeihuali · 2023-10-17T16:04:15Z

@almondai the latest Taichi version(please see their website to compile the latest version) does run on Metal, but it's not fully optimized yet, causing low fps rates during inference and slower training times. While memory isn't a bottleneck for 3D Gaussian splats (usually under 6GB), the Metal backend on Apple Silicon may require specific optimizations for improved performance, e.g. tune tile/share memory size.

If you're keen, diving into these optimizations could be a fascinating challenge, though quite time-consuming.

almondai · 2023-10-18T02:03:23Z

@wanmeihuali thanks, I will check out the Taichi repo and hopefully share some numbers on training runtimes. Also I am interested in your code's lower memory usage (<8gb) while the original repo from the authors suggested at least 24gb of VRAM for highest fidelity?

wanmeihuali · 2023-10-18T02:21:12Z

Hi @almondai , I don't think the official implementation need that much VRAM... Anyway, the GPU memory usage are highly depends on image resolution and num of Gaussian points. For the current truck scene running on AWS sagemaker T4(16GB VRAM), the GPU Memory Utilization is around 21%(3.4GB).

almondai · 2023-11-22T03:16:50Z

I was able to run it on Apple silicon/metal after compiling Taichi from source (v1.7.0) to link with the metal3 API. Then I trained the truck dataset on a m2 ultra with 60 gpu cores. It was pretty slow, but I also noticed errors in the output:

Here is a screenshot of the Truck scene trained on Metal (and rendered via Metal)

So I ran the same training parameters using a CUDA/Nvidia backend and here is the screenshot of the same scene but trained on CUDA (rendered on Linux)

and the same parquet file (CUDA) rendered on a Mac with metal backend.

It seems to me the forward pass on metal is mostly correct. However the backward pass is not working correctly although the training did complete all 30,000 iterations without crashing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal support for training/inference #147

Metal support for training/inference #147

almondai commented Oct 17, 2023

wanmeihuali commented Oct 17, 2023

almondai commented Oct 18, 2023

wanmeihuali commented Oct 18, 2023

almondai commented Nov 22, 2023

Metal support for training/inference #147

Metal support for training/inference #147

Comments

almondai commented Oct 17, 2023

wanmeihuali commented Oct 17, 2023

almondai commented Oct 18, 2023

wanmeihuali commented Oct 18, 2023

almondai commented Nov 22, 2023