diff --git a/post/2024-10-07-metal-1.4.md b/post/2024-10-07-metal-1.4.md new file mode 100644 index 0000000..f7612be --- /dev/null +++ b/post/2024-10-07-metal-1.4.md @@ -0,0 +1,85 @@ ++++ +title = "Metal.jl 1.4: Improved random numbers" +author = "Christian Guinard" +abstract = """ + Metal.jl 1.4 adds higher-quality random number generators from the Metal Performance + Shaders library. Some limitations apply, with a fallback to the current implementation + in those situations.""" ++++ +{{abstract}} + + +## `Metal.rand` and friends + +Using functionality provided by the Metal Performance Shaders (MPS) library, Metal.jl now +comes with much improved GPU random number generators. Uniform distributions using +`Metal.rand` (and its in-place variant `Metal.rand!`) are available for all Metal-supported +integer types and `Float32`. However, due to [Metal API +limitations](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400767-copyfrombuffer?language=objc), +8-bit and 16-bit integers may fall back to the lower-quality GPUArrays.jl random number +generator if their size in bytes is not a multiple of 4. Normally distributed `Float32` +values can be generated for with `Metal.randn` and `Metal.randn!`, while `Float16` is not +supported by the MPS library and will always fall back to the GPUArrays implementation. + +The easiest way to use these is to use the Metal convenience functions `Metal.rand[n][!]` as +you would the usual functions from the Random.jl standard library: + +```julia-repl +julia> a = Metal.rand(Float32, 2) +2-element MtlVector{Float32, Metal.PrivateStorage}: + 0.95755994 + 0.7110207 + +julia> Metal.randn!(a) +2-element MtlVector{Float32, Metal.PrivateStorage}: + 1.7230463 + 0.55636907 +``` + +However, the Random.jl methods can also be used by providing the appropriate `RNG` either +from `MPS.default_rng()` or `MPS.RNG()` to the standard `Random.rand[n][!]` functions: + + +```julia-repl +julia> using Random + +julia> rng = MPS.RNG(); + +julia> Random.rand(rng, 2) +2-element MtlVector{Float32, Metal.PrivateStorage}: + 0.8941469 + 0.67628527 +``` + +Seeding is done by calling `Metal.seed!` for the global RNG, or `Random.seed!` when working +with an explicit `RNG` object. + + +## Other improvements since the last blog post + +- Since v0.5: `MtlArray` storage mode has been parameterized, allowing one to create a + shared storage `MtlArray` by calling `MtlArray{eltype, ndims, Metal.SharedStorage}(...)`. +- Since v0.3: MPS-accelerated decompositions were added. +- Various performance improvements +- *Many* bug fixes. + + +## Future work + +Although Metal.jl is now in v1, there is still work to be done to make it as fast and +feature-complete as possible. In particular: + +- Metal.jl is now using native ObjectiveC FFI for wrapping Metal APIs. However, these + wrappers have to be written manually for every piece of Objective-C code. *We are looking + for help with improving Clang.jl and ObjectiveC.jl* to [enable the automatic generation of + these wrappers](https://github.com/JuliaInterop/ObjectiveC.jl/issues/41); +- The MPS wrappers are incomplete, automatic wrapper generation would greatly help with full + MPS support; +- To implement a full-featured KernelAbstractions.jl back-end, Metal atomic operations need + to [be hooked up to Atomix](https://github.com/JuliaGPU/Metal.jl/issues/218); +- [Full support for BFloat16 values](https://github.com/JuliaGPU/Metal.jl/issues/298), which + has been supported since Metal 3.1 (macOS 14), is not yet available in Metal.jl. There is, + however, a [draft PR](https://github.com/JuliaGPU/Metal.jl/pull/446) in the works. Check + it out if you're interested in helping out; +- Some functionality present in CUDA.jl [could be ported to Metal.jl to improve + usability](https://github.com/JuliaGPU/Metal.jl/issues/443).