Add performance warning for default.tensor (PennyLaneAI#6480)

### Before submitting Please complete the following checklist when submitting a PR: - [ ] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the test directory! - [ ] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running `make docs`. - [X] Ensure that the test suite passes, by running `make test`. - [ ] Add a new entry to the `doc/releases/changelog-dev.md` file, summarizing the change, and including a link back to the PR. - [X] The PennyLane source code conforms to [PEP8 standards](https://www.python.org/dev/peps/pep-0008/). We check all of our code against [Pylint](https://www.pylint.org/). To lint modified files, simply `pip install pylint`, and then run `pylint pennylane/path/to/file.py`. When all the above are checked, delete everything above the dashed line and fill in the pull request template. ------------------------------------------------------------------------------------------------------------ **Context:** - The dependency on QUIMB does not allow us to find a direct optimization over PennyLane. However, it is possible to increase the performance by defining environmental variables. **Description of the Change:** - With `default.tensor` is possible to get the maximum performance in large systems by defining the env var `OPENBLAS_NUM_THREAD=1` . - Unfortunately, if `OPENBLAS_NUM_THREAD=1` is defined at the top of `default_tensor.py` the value will not affect the device 😞 . - An alternative to control the threads for BLAS libraries is using [threadpoolctl](https://github.com/joblib/threadpoolctl) package which is also [recommended by Numpy](https://numpy.org/doc/2.0/reference/routines.linalg.html#linear-algebra-numpy-linalg) **Benefits:** Doing a benchmark of the following algorithm it is possible to appreciate the performance improvement. ``` python def test_qft(wires): """Test that the device can apply a multi-qubit QFT gate.""" method = "mps" dev = qml.device("default.tensor", wires=wires, method=method, max_bond_dim=128) def circuit(basis_state): qml.BasisState(basis_state, wires=range(wires)) qml.QFT(wires=range(wires)) return qml.state() result = qml.QNode(circuit, dev)(np.array([0, 1] * (wires // 2))) return result ``` Timing plot ![image](https://github.com/user-attachments/assets/f4e8d2fc-88fe-4191-9e57-7e2b5a640a51) ![image](https://github.com/user-attachments/assets/e8441ff7-f4cf-4587-8fae-327039b4b01b) ![image](https://github.com/user-attachments/assets/259137cf-d02e-462c-89f9-b9a6966b3331) Speedup plot ![image](https://github.com/user-attachments/assets/6c08c402-856a-4d97-8b69-589610027f6c) ![image](https://github.com/user-attachments/assets/293c9b8c-c922-449b-918e-a4644b12fb1b) ![image](https://github.com/user-attachments/assets/187b40f1-bcfc-4e45-84e6-39721de7193f) **Possible Drawbacks:** **Related GitHub Issues:** [sc-77383] --------- Co-authored-by: Luis Alfredo Nuñez Meneses <[email protected]> Co-authored-by: Thomas Germain <[email protected]> Co-authored-by: Lee James O'Riordan <[email protected]> Co-authored-by: Christina Lee <[email protected]>
ldi18 · Nov 8, 2024 · e710ca6 · e710ca6
1 parent 91913a7
commit e710ca6
Showing 1 changed file with 4 additions and 0 deletions.
diff --git a/pennylane/devices/default_tensor.py b/pennylane/devices/default_tensor.py
@@ -258,6 +258,10 @@ def circuit(num_qubits):
 
     We can provide additional keyword arguments to the device to customize the simulation. These are passed to the ``quimb`` backend.
 
+    .. note::
+        Be aware that `quimb` uses multi-threading with `numba <https://numba.pydata.org/numba-doc/dev/user/threading-layer.html>`_ as well as for linear algebra operations with `numpy.linalg <https://numpy.org/doc/stable/reference/routines.linalg.html#linear-algebra-numpy-linalg>`_. Proper setting of the corresponding environment variables (e.g. `OMP_NUM_THREADS`, `OPENBLAS_NUM_THREADS`, `NUMBA_NUM_THREADS` etc.) depending on your hardware is highly recommended and will have a strong impact on the device's performance.
+        To avoid a slowdown in performance for circuits with more than 10 wires, we recommend setting the environment variable relevant for your BLAS library backend (e.g. `OMP_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=1` or `MKL_NUM_THREADS=1`), depending on your NumPy package & associated libraries. Alternatively, you can use  `threadpoolctl <https://github.com/joblib/threadpoolctl>`_  to limit the threads within your executing script. For optimal performance you can adjust the number of threads to find the best fit for your workload.
+
     .. details::
             :title: Usage with MPS Method