Skip to content

Commit

Permalink
Add performance warning for default.tensor (PennyLaneAI#6480)
Browse files Browse the repository at this point in the history
### Before submitting

Please complete the following checklist when submitting a PR:

- [ ] All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to
the
      test directory!

- [ ] All new functions and code must be clearly commented and
documented.
If you do make documentation changes, make sure that the docs build and
      render correctly by running `make docs`.

- [X] Ensure that the test suite passes, by running `make test`.

- [ ] Add a new entry to the `doc/releases/changelog-dev.md` file,
summarizing the
      change, and including a link back to the PR.

- [X] The PennyLane source code conforms to
      [PEP8 standards](https://www.python.org/dev/peps/pep-0008/).
We check all of our code against [Pylint](https://www.pylint.org/).
      To lint modified files, simply `pip install pylint`, and then
      run `pylint pennylane/path/to/file.py`.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


------------------------------------------------------------------------------------------------------------

**Context:**
- The dependency on QUIMB does not allow us to find a direct
optimization over PennyLane. However, it is possible to increase the
performance by defining environmental variables.

**Description of the Change:**
- With `default.tensor` is possible to get the maximum performance in
large systems by defining the env var `OPENBLAS_NUM_THREAD=1` .
- Unfortunately, if `OPENBLAS_NUM_THREAD=1` is defined at the top of
`default_tensor.py` the value will not affect the device 😞
.
- An alternative to control the threads for BLAS libraries is using
[threadpoolctl](https://github.com/joblib/threadpoolctl) package which
is also [recommended by
Numpy](https://numpy.org/doc/2.0/reference/routines.linalg.html#linear-algebra-numpy-linalg)

**Benefits:**
Doing a benchmark of the following algorithm it is possible to
appreciate the performance improvement.
``` python 
def test_qft(wires):
    """Test that the device can apply a multi-qubit QFT gate."""
    method = "mps"
    dev = qml.device("default.tensor", wires=wires, method=method, max_bond_dim=128)

    def circuit(basis_state):
        qml.BasisState(basis_state, wires=range(wires))
        qml.QFT(wires=range(wires))
        return qml.state()

    result = qml.QNode(circuit, dev)(np.array([0, 1] * (wires // 2)))
    
    return result
```
Timing plot

![image](https://github.com/user-attachments/assets/f4e8d2fc-88fe-4191-9e57-7e2b5a640a51)

![image](https://github.com/user-attachments/assets/e8441ff7-f4cf-4587-8fae-327039b4b01b)

![image](https://github.com/user-attachments/assets/259137cf-d02e-462c-89f9-b9a6966b3331)

Speedup plot

![image](https://github.com/user-attachments/assets/6c08c402-856a-4d97-8b69-589610027f6c)

![image](https://github.com/user-attachments/assets/293c9b8c-c922-449b-918e-a4644b12fb1b)

![image](https://github.com/user-attachments/assets/187b40f1-bcfc-4e45-84e6-39721de7193f)


**Possible Drawbacks:**

**Related GitHub Issues:**

[sc-77383]

---------

Co-authored-by: Luis Alfredo Nuñez Meneses <[email protected]>
Co-authored-by: Thomas Germain <[email protected]>
Co-authored-by: Lee James O'Riordan <[email protected]>
Co-authored-by: Christina Lee <[email protected]>
  • Loading branch information
5 people authored Nov 8, 2024
1 parent 91913a7 commit e710ca6
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions pennylane/devices/default_tensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,10 @@ def circuit(num_qubits):
We can provide additional keyword arguments to the device to customize the simulation. These are passed to the ``quimb`` backend.
.. note::
Be aware that `quimb` uses multi-threading with `numba <https://numba.pydata.org/numba-doc/dev/user/threading-layer.html>`_ as well as for linear algebra operations with `numpy.linalg <https://numpy.org/doc/stable/reference/routines.linalg.html#linear-algebra-numpy-linalg>`_. Proper setting of the corresponding environment variables (e.g. `OMP_NUM_THREADS`, `OPENBLAS_NUM_THREADS`, `NUMBA_NUM_THREADS` etc.) depending on your hardware is highly recommended and will have a strong impact on the device's performance.
To avoid a slowdown in performance for circuits with more than 10 wires, we recommend setting the environment variable relevant for your BLAS library backend (e.g. `OMP_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=1` or `MKL_NUM_THREADS=1`), depending on your NumPy package & associated libraries. Alternatively, you can use `threadpoolctl <https://github.com/joblib/threadpoolctl>`_ to limit the threads within your executing script. For optimal performance you can adjust the number of threads to find the best fit for your workload.
.. details::
:title: Usage with MPS Method
Expand Down

0 comments on commit e710ca6

Please sign in to comment.