Merge pull request #3 from chengchingwen/main

Enhancement
FluxML · Apr 20, 2024 · 35235fe · 35235fe
2 parents d0568a8 + 26165d2
commit 35235fe
Show file tree

Hide file tree

Showing 8 changed files with 564 additions and 110 deletions.
diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml
@@ -24,10 +24,12 @@ jobs:
           - 'nightly'
         os:
           - ubuntu-latest
+          - macOS-latest
+          - windows-latest
         arch:
           - x64
     steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
       - uses: julia-actions/setup-julia@v1
         with:
           version: ${{ matrix.version }}

diff --git a/Project.toml b/Project.toml
@@ -4,10 +4,17 @@ authors = ["pevnak <[email protected]> and contributors"]
 version = "1.0.0"
 
 [deps]
+BFloat16s = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
+DLFP8Types = "f4c16678-4a16-415b-82ef-ed337c5d6c7c"
 JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
+MappedArrays = "dbb5928d-eab1-5f90-85c2-b9b0edb7c900"
+Mmap = "a63ad114-7e13-5084-954f-fe012c677804"
 
 [compat]
+BFloat16s = "0.5"
+DLFP8Types = "0.1"
 JSON3 = "1"
+MappedArrays = "0.4"
 julia = "1.6"
 
 [extras]

diff --git a/README.md b/README.md
@@ -3,18 +3,14 @@
 
 [![Build Status](https://github.com/FluxML/SafeTensors.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/FluxML/SafeTensors.jl/actions/workflows/CI.yml?query=branch%3Amain)
 
-This packages loads data stored in [safetensor format](https://huggingface.co/docs/safetensors/index). 
+This packages loads data stored in [safetensor format](https://huggingface.co/docs/safetensors/index).
 Since Python is row-major and Julia is column-major, the dimensions are permuted such the tensor has the same shape as in python, but everything is correctly ordered. This includes a performance penalty in sense that we cannot be completely copy-free.
 
-The list of dependencies is kept minimal to `JSON3` for parsing the header.
-
-The package does not allow to save the data.
-
 The main function is `load_safetensors` which returns a `Dict{String,V}` where keys are names of tensors and values are tensors. An example from `runtests` is as follows
 ```julia
 julia> using SafeTensors
 
-julia> d = load_safetensors("model.safetensors")
+julia> d = load_safetensors("test/model.safetensors")
 Dict{String, Array} with 27 entries:
   "int32_357"   => Int32[0 7 … 21 28; 35 42 … 56 63; 70 77 … 91 98;;; 1 8 … 22 29…
   "uint8_3"     => UInt8[0x00, 0x01, 0x02]
@@ -45,9 +41,73 @@ Dict{String, Array} with 27 entries:
   "float64_3"   => [0.0, 1.0, 2.0]
 ```
 
-It is also possible to load just header using unexported `load_header` as 
+It can also perform a lazy loading with `SafeTensors.deserialize("model.safetensors")` which `mmap` the file and return a `Dict`-like object:
 ```julia
-julia> d = SafeTensors.load_header("model.safetensors")
+julia> tensors = SafeTensors.deserialize("test/model.safetensors"; mmap = true #= default to `true`=#);
+
+julia> tensors["float32_35"]
+3×5 mappedarray(ltoh, PermutedDimsArray(reshape(reinterpret(Float32, view(::Vector{UInt8}, 0x0000000000000ef5:0x0000000000000f30)), 5, 3), (2, 1))) with eltype Float32:
+  0.0   1.0   2.0   3.0   4.0
+  5.0   6.0   7.0   8.0   9.0
+ 10.0  11.0  12.0  13.0  14.0
+```
+
+Serialization is also supported:
+
+```julia
+julia> using Random, BFloat16s
+
+julia> weights = Dict("W"=>randn(BFloat16, 3, 5), "b"=>rand(BFloat16, 3))
+Dict{String, Array{BFloat16}} with 2 entries:
+  "W" => [0.617188 0.695312 … 0.390625 -2.0; -0.65625 -0.617188 … 0.652344 0.244141; 0.226562 2.70312 … -0.174805 -0.7773…
+  "b" => [0.111816, 0.566406, 0.283203]
+
+julia> f = tempname();
+
+julia> SafeTensors.serialize(f, weights)
+
+julia> loaded = SafeTensors.deserialize(f);
+
+julia> loaded["W"] ≈ weights["W"]
+true
+
+julia> SafeTensors.serialize(f, weights, Dict("Package"=>"SafeTensors.jl", "version"=>"1"))
+
+julia> loaded = SafeTensors.deserialize(f);
+
+julia> loaded.metadata
+Dict{String, String} with 2 entries:
+  "Package" => "SafeTensors.jl"
+  "version" => "1"
 ```
 
+Working with gpu:
+```julia
+julia> loaded["W"]
+3×5 mappedarray(ltoh, PermutedDimsArray(reshape(reinterpret(BFloat16, view(::Vector{UInt8}, 0x00000000000000b9:0x00000000000000d6)), 5, 3), (2, 1))) with eltype BFloat16:
+  0.542969    0.201172   1.38281    -0.255859  -1.55469
+  0.172852   -0.949219   0.0561523  -1.34375   -0.206055
+ -0.0854492   1.17969   -0.265625   -0.871094   2.25
+
+julia> using CUDA; CUDA.allowscalar(false)
 
+julia> CuArray(loaded["W"])
+3×5 CuArray{BFloat16, 2, CUDA.Mem.DeviceBuffer}:
+  0.542969    0.201172   1.38281    -0.255859  -1.55469
+  0.172852   -0.949219   0.0561523  -1.34375   -0.206055
+ -0.0854492   1.17969   -0.265625   -0.871094   2.25
+
+julia> gpu_weights = Dict("W"=>CuArray(loaded["W"]), "b"=>CuArray(loaded["b"]))
+Dict{String, CuArray{BFloat16, N, CUDA.Mem.DeviceBuffer} where N} with 2 entries:
+  "W" => [0.542969 0.201172 … -0.255859 -1.55469; 0.172852 -0.949219 … -1.34375 -0.206055; -0.0854492 1.17969 … -0.871094…
+  "b" => BFloat16[0.871094, 0.773438, 0.703125]
+
+julia> f = tempname();
+
+julia> SafeTensors.serialize(f, gpu_weights)
+
+julia> SafeTensors.deserialize(f)
+SafeTensors.SafeTensor{SubArray{UInt8, 1, Vector{UInt8}, Tuple{UnitRange{UInt64}}, true}} with 2 entries:
+  "W" => BFloat16[0.542969 0.201172 … -0.255859 -1.55469; 0.172852 -0.949219 … -1.34375 -0.206055; -0.0854492 1.17969 … -…
+  "b" => BFloat16[0.871094, 0.773438, 0.703125]
+```