How to properly use quimb with jax #219

stfnmangini · 2024-02-19T15:12:46Z

stfnmangini
Feb 19, 2024

I'd like to know what are the best practices and suggested ways to use quimb in combination with jax, ideally being general enough so that one could use custom update rules and operations apart from those implementable with TNOptimizer.

Consider the minimal example below of minimising the energy of a state. I have two main questions:

Is this the "most" correct way to use jax to compute gradients of quimb's tensors?
Is qtn.interface.jax_register_pytree() the proper way to make jax compatible with quimb's objects, and then use jax-native functionalities out-of-the box? Related to How does `vectorized_value_and_grad` work? #179 and Differentiate MPS-type calculations #183, I'd like to double-check.
Look at the different timings when using different combinations of no jitting, jitting and jitting with static_argnums keywords. How come that the fastest execution happens without jitting the function? It seems like jax is recompiling the function every time it is called inside the optimisation loop, something that can be partly stopped using the static_argnums argument, but not really solved.

import jax
import numpy as np
import quimb.tensor as qtn

qtn.interface.jax_register_pytree()

# Expectation value
def loss_fn(state, observable):
    s1, obs, s2 = qtn.tensor_network_align(state, observable, state.H)
    return (s1 | obs | s2).contract()

# Initial random state and Pauli-Z observable on all sites
num_sites = 5
mps = qtn.MPS_rand_state(num_sites, 1)
mpo = qtn.MPO_product_operator(np.array([[[1,0],[0,-1]],]*num_sites))

# Jax-ify the functions in different ways
nojit_fn = jax.value_and_grad(loss_fn)
jit_fn1 = jax.jit(jax.value_and_grad(loss_fn))
jit_fn2 = jax.jit(jax.value_and_grad(loss_fn), static_argnums=[0, 1])
print(f"Initial loss = {loss_fn(mps, mpo)}\n")

# Run these so that they are compiled, if needed
nojit_fn(mps, mpo)
jit_fn1(mps, mpo) 
jit_fn2(mps, mpo)

def optimize(update_function):
    """
    Basic gradient descent update rule.
    """
    psi = mps.copy()
    for i in range(30):
        psi = psi.multiply(1 / psi.norm(), spread_over='all')
        val, grad = update_function(psi, mpo)

        # Update parameters
        new_params = jax.tree_map(lambda x, y: x - 0.1 * y, psi.get_params(), grad.get_params())
        psi.set_params(new_params)
        print(f"Step: {i} — Loss: {val}", end = "\r")
    return psi.multiply(1 / psi.norm(), spread_over='all')

import time

print("No jit function")
start = time.time()
optimize(nojit_fn)
end = time.time()
print("\nExecution time [s]:", end - start)
print("")

print("Jit function")
start = time.time()
optimize(jit_fn1)
end = time.time()
print("\nExecution time [s]:", end - start)
print("")

print("Jit function with static_argnums")
start = time.time()
optimize(jit_fn2)
end = time.time()
print("\nExecution time [s]:", end - start)

Here is the output:

>>> Initial loss = 0.03858750129380334

>>> No jit function
>>> Step: 29 — Loss: -0.99956581221671596
>>> Execution time [s]: 0.40987300872802734

>>> Jit function
>>> Step: 29 — Loss: -0.99956581221671596
>>> Execution time [s]: 1.5457868576049805

>>> Jit function with static_argnums
>>> Step: 29 — Loss: -0.99956581221671596
>>> Execution time [s]: 0.7832067012786865

Note that if we don't redefine psi to account for the normalisation, jit_fn1 remains slow, hence it is probably recompiled at every iteration. On the other hand, jit_fn2 becomes very fast but never changes value, probably because both arguments were set as static (static_argnums = [0,1]), so it is compiled once and never updated again.

Similar findings are obtained by measuring runtimes using %timeit nojit_fn(mps, mpo), and similarly for the other grad functions jit_fn1/2.

The code above is just a simple example to compare the performances of the single gradient computation instructions (with or without jitting). Of course, the proper way to speed-up the whole optimization would be to properly write and jit the whole training loop.

Thank you so much!

Answered by jcmgray

Feb 27, 2024

Hi @stfnmangini, sorry to be slow getting to this and thanks for the detailed examples! Indeed I get the same results when I run them.

Yes high level the aim is allow both the "jax in quimb" (TNOptimizer) approach for simple things and the "quimb in jax" (where quimb just orchestrates various array operations) approach for detailed jax things.

Certainly my understanding of the jax_register_pytree functionality was that it should enable jittable functions to accept/return quimb structures. However I have actually not looked much into this direction and so am not aware if this re-compilation thing is a bug or some misunderstanding of how pytrees work in jax - I can try and look into it but …

View full answer

stfnmangini · 2024-02-20T08:24:21Z

stfnmangini
Feb 20, 2024
Author

I've explored a little bit deeper the issue, and I've seen that jit speedup is recovered by using quimb's pack and unpack functions, and declare all the loss functions in terms of the trainable parameters (and not of the tensor networks, as above).

See the modified example here:

import jax
import numpy as np
import quimb.tensor as qtn

# Expectation value
def loss_fn(params, skeleton):
    state = qtn.unpack(params, skeleton)
    s1, obs, s2 = qtn.tensor_network_align(state, observable, state.H)
    return (s1 | obs | s2).contract()

# Initial random state and Pauli-Z observable on all sites
num_sites = 5
state = qtn.MPS_rand_state(num_sites, 1)
params, skeleton = qtn.pack(state)

observable = qtn.MPO_product_operator(np.array([[[1,0],[0,-1]],]*num_sites))

# Jax-ify the functions in different ways
nojit_fn = jax.value_and_grad(loss_fn)
jit_fn = jax.jit(jax.value_and_grad(loss_fn), static_argnums=[1])
print(f"Initial loss = {loss_fn(params, skeleton)}\n")

# Run these so that they are compiled, if needed
print(nojit_fn(params, skeleton))
print(jit_fn(params, skeleton))
print("")

def optimize(update_function):
    """
    Basic gradient descent update rule.
    """
    psi = state.copy()
    params = psi.get_params()
    for i in range(50):
        # Normalize state
        psi.set_params(params)
        psi = psi.multiply(1 / psi.norm(), spread_over='all')
        params = psi.get_params()

        val, grad = update_function(params, skeleton)

        # Update parameters
        params = jax.tree_map(lambda x, y: x - 0.1 * y, params, grad)
        print(f"Step: {i} — Loss: {val}", end = "\r")
    return psi.multiply(1 / psi.norm(), spread_over='all')

import time

print("No jit function")
start = time.time()
optimize(nojit_fn)
end = time.time()
print("\nExecution time [s]:", end - start)
print("")

print("Jit function")
start = time.time()
optimize(jit_fn)
end = time.time()
print("\nExecution time [s]:", end - start)
print("")

Output:

>>> Initial loss = -0.02904531122882376

>>> (Array(-0.02904531, dtype=float32), {0: ..., ...})
>>> (Array(-0.02904531, dtype=float32), {0: ..., ...})

>>> No jit function
>>> Step: 49 — Loss: -1.09999964237213135
>>> Execution time [s]: 0.622642993927002

>>> Jit function
>>> Step: 49 — Loss: -1.09999964237213135
>>> Execution time [s]: 0.04969906806945801

I thus confirm my concerns that jax.jit functions are, for some reason which I don't understand, recompiled every time they are fed with quimb's objects as inputs, provided that qtn.interface.jax_register_pytree() was used. Is this an intended/known behaviour?

Given these observations, I would then conclude:

Don't use directly qtn.interface.jax_register_pytree() to make jax's automatic-differentiation compatible with quimb, as there may be problems with degraded performances when using jit. Rather, in simpler optimisation cases, just stick with TNOptimizer and let it do the magic under the hood;
When more control on the optimisation process is needed, then define every loss function just in terms of the trainable parameters, and then use quimb with pack and unpack functions to build the necessary tensor networks structures and define the contractions. So, for the sake of simpler implementations, would this low-level approach be favourable over defining a learning model with flax, as suggested in quimb's documentation?
https://quimb.readthedocs.io/en/latest/examples/ex_quimb_within_jax_flax_optax.html

Do you agree?

Sorry for the long and probably rambling questions, thank you so much in advance!

0 replies

jcmgray · 2024-02-27T21:27:25Z

jcmgray
Feb 27, 2024
Maintainer

Hi @stfnmangini, sorry to be slow getting to this and thanks for the detailed examples! Indeed I get the same results when I run them.

Yes high level the aim is allow both the "jax in quimb" (TNOptimizer) approach for simple things and the "quimb in jax" (where quimb just orchestrates various array operations) approach for detailed jax things.

Certainly my understanding of the jax_register_pytree functionality was that it should enable jittable functions to accept/return quimb structures. However I have actually not looked much into this direction and so am not aware if this re-compilation thing is a bug or some misunderstanding of how pytrees work in jax - I can try and look into it but don't have much time at the moment...

I think for the moment your conclusions are what I would also suggest:

use TNOptimizer when possible. It would be very nice and certainly is on the not-too-hard-to-do list to enable the target optimized object (currently just a single TN) to be a pytree itself, which would make a lot of things much more convenient.
write "array functions" using pack/unpack or really any way you like of inserting arrays into the TN structures. Then everything generalizes to torch etc as well. You can either use these as the low level building blocks, or sometimes it is convenient to wrap them into the 'Model' style approach since this is what other high level libraries like netket expect.

2 replies

stfnmangini Feb 29, 2024
Author

Thanks @jcmgray, this was very helpful!

Unfortunately I don't know much either regarding registering pytrees and jax, so I cannot really help here.

Nonetheless, on the practical side, I happy to know that I was not misusing the code, and that you agree that one should go either with the 'jax in quimb' or 'quimb in jax' solutions depending on the use case.

Thanks again!

jcmgray Feb 29, 2024
Maintainer

Yes to be clear, I don't mean using jax pytrees specifically, just the general concept of arbitrary nested containers of tuple, list, dict etc. (see quimb.utils.tree_map and friends).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to properly use quimb with jax #219

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How to properly use quimb with jax #219

stfnmangini Feb 19, 2024

Replies: 2 comments · 2 replies

stfnmangini Feb 20, 2024 Author

jcmgray Feb 27, 2024 Maintainer

stfnmangini Feb 29, 2024 Author

jcmgray Feb 29, 2024 Maintainer

stfnmangini
Feb 19, 2024

Replies: 2 comments 2 replies

stfnmangini
Feb 20, 2024
Author

jcmgray
Feb 27, 2024
Maintainer

stfnmangini Feb 29, 2024
Author

jcmgray Feb 29, 2024
Maintainer