Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add performance warning for
default.tensor
(PennyLaneAI#6480)
### Before submitting Please complete the following checklist when submitting a PR: - [ ] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the test directory! - [ ] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running `make docs`. - [X] Ensure that the test suite passes, by running `make test`. - [ ] Add a new entry to the `doc/releases/changelog-dev.md` file, summarizing the change, and including a link back to the PR. - [X] The PennyLane source code conforms to [PEP8 standards](https://www.python.org/dev/peps/pep-0008/). We check all of our code against [Pylint](https://www.pylint.org/). To lint modified files, simply `pip install pylint`, and then run `pylint pennylane/path/to/file.py`. When all the above are checked, delete everything above the dashed line and fill in the pull request template. ------------------------------------------------------------------------------------------------------------ **Context:** - The dependency on QUIMB does not allow us to find a direct optimization over PennyLane. However, it is possible to increase the performance by defining environmental variables. **Description of the Change:** - With `default.tensor` is possible to get the maximum performance in large systems by defining the env var `OPENBLAS_NUM_THREAD=1` . - Unfortunately, if `OPENBLAS_NUM_THREAD=1` is defined at the top of `default_tensor.py` the value will not affect the device 😞 . - An alternative to control the threads for BLAS libraries is using [threadpoolctl](https://github.com/joblib/threadpoolctl) package which is also [recommended by Numpy](https://numpy.org/doc/2.0/reference/routines.linalg.html#linear-algebra-numpy-linalg) **Benefits:** Doing a benchmark of the following algorithm it is possible to appreciate the performance improvement. ``` python def test_qft(wires): """Test that the device can apply a multi-qubit QFT gate.""" method = "mps" dev = qml.device("default.tensor", wires=wires, method=method, max_bond_dim=128) def circuit(basis_state): qml.BasisState(basis_state, wires=range(wires)) qml.QFT(wires=range(wires)) return qml.state() result = qml.QNode(circuit, dev)(np.array([0, 1] * (wires // 2))) return result ``` Timing plot ![image](https://github.com/user-attachments/assets/f4e8d2fc-88fe-4191-9e57-7e2b5a640a51) ![image](https://github.com/user-attachments/assets/e8441ff7-f4cf-4587-8fae-327039b4b01b) ![image](https://github.com/user-attachments/assets/259137cf-d02e-462c-89f9-b9a6966b3331) Speedup plot ![image](https://github.com/user-attachments/assets/6c08c402-856a-4d97-8b69-589610027f6c) ![image](https://github.com/user-attachments/assets/293c9b8c-c922-449b-918e-a4644b12fb1b) ![image](https://github.com/user-attachments/assets/187b40f1-bcfc-4e45-84e6-39721de7193f) **Possible Drawbacks:** **Related GitHub Issues:** [sc-77383] --------- Co-authored-by: Luis Alfredo Nuñez Meneses <[email protected]> Co-authored-by: Thomas Germain <[email protected]> Co-authored-by: Lee James O'Riordan <[email protected]> Co-authored-by: Christina Lee <[email protected]>
- Loading branch information