Skip to content

Commit

Permalink
Minor changes.
Browse files Browse the repository at this point in the history
  • Loading branch information
maleadt committed Oct 7, 2024
1 parent e839afc commit 217e5e1
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 55 deletions.
55 changes: 0 additions & 55 deletions post/2024-10-02-metal-1.4.md

This file was deleted.

57 changes: 57 additions & 0 deletions post/2024-10-07-metal-1.4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
+++
title = "Metal.jl 1.4: Improved random numbers"
author = "Christian Guinard"
abstract = """
Metal.jl 1.4 adds higher-quality random number generators from the Metal Performance
Shaders library. Some limitations apply, with a fallback to the current implementation
in those situations."""
+++
{{abstract}}


## `Metal.rand` and friends

Using functionality provided by the Metal Performance Shaders (MPS) library, Metal.jl now
comes with much improved GPU random number generators. Uniform distributions using
`Metal.rand` (and its in-place variant `Metal.rand!`) are available for all Metal-supported
integer types and `Float32`. However, due to [Metal API
limitations](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400767-copyfrombuffer?language=objc),
8-bit and 16-bit integers may fall back to the lower-quality GPUArrays.jl random number
generator if their size in bytes is not a multiple of 4. Normally distributed `Float32`
values can be generated for with `Metal.randn` and `Metal.randn!`, while `Float16` is not
supported by the MPS library and will always fall back to the GPUArrays implementation.

The easiest way to use these is to use the Metal convenience functions `Metal.rand[n][!]` as
you would the usual functions from the Random.jl standard library. However, the Random.jl
methods can also be used by providing the appropriate `RNG` either from `MPS.default_rng()`
or `MPS.RNG()` to the standard `Random.rand[n][!]` functions.


## Other improvements since the last blog post

- Since v0.5: `MtlArray` storage mode has been parameterized, allowing one to create a
shared storage `MtlArray` by calling `MtlArray{eltype, ndims, Metal.SharedStorage}(...)`.
- Since v0.3: MPS-accelerated decompositions were added.
- Various performance improvements
- *Many* bug fixes.


## Future work

Although Metal.jl is now in v1, there is still work to be done to make it as fast and
feature-complete as possible. In particular:

- Metal.jl is now using native ObjectiveC FFI for wrapping Metal APIs. However, these
wrappers have to be written manually for every piece of Objective-C code. *We are looking
for help with improving Clang.jl and ObjectiveC.jl* to [enable the automatic generation of
these wrappers](https://github.com/JuliaInterop/ObjectiveC.jl/issues/41);
- The MPS wrappers are incomplete, automatic wrapper generation would greatly help with full
MPS support;
- To implement a full-featured KernelAbstractions.jl back-end, Metal atomic operations need
to [be hooked up to Atomix](https://github.com/JuliaGPU/Metal.jl/issues/218);
- [Full support for BFloat16 values](https://github.com/JuliaGPU/Metal.jl/issues/298), which
has been supported since Metal 3.1 (macOS 14), is not yet available in Metal.jl. There is,
however, a [draft PR](https://github.com/JuliaGPU/Metal.jl/pull/446) in the works. Check
it out if you're interested in helping out;
- Some functionality present in CUDA.jl [could be ported to Metal.jl to improve
usability](https://github.com/JuliaGPU/Metal.jl/issues/443).

0 comments on commit 217e5e1

Please sign in to comment.