Skip to content

Guided Quantum Compression (GQC) network for simultaneous dimensionality reduction and classification of high-dimensional data.

License

Notifications You must be signed in to change notification settings

CERN-IT-INNOVATION/gqc

Repository files navigation

Email: patrick Email: vasilis Python: version License: version Code style: black

Guided Quantum Compression for Higgs Identification

Many data sets are too complex for currently available quantum computers. Consequently, quantum machine learning applications conventionally resort to dimensionality reduction algorithms, e.g., auto-encoders, before passing data through the quantum models. We show that using a classical auto-encoder as an independent preprocessing step can significantly decrease the classification performance of a quantum machine learning algorithm. To ameliorate this issue, we design an architecture that unifies the preprocessing and quantum classification algorithms into a single trainable model: the guided quantum compression model. The utility of this model is demonstrated by using it to identify the Higgs boson in proton-proton collisions at the LHC, where the conventional approach proves ineffective. Conversely, the guided quantum compression model excels at solving this classification problem, achieving a good accuracy. Additionally, the model developed herein shows better performance compared to the classical benchmark when using only low-level kinematic features.

This repository represents the source code of the following paper Guided quantum compression for high dimensional data classification

If you plan to use or take part of the code, please cite the usage:

@article{Belis_2024,
   title={Guided quantum compression for high dimensional data classification},
   volume={5},
   ISSN={2632-2153},
   url={http://dx.doi.org/10.1088/2632-2153/ad5fdd},
   DOI={10.1088/2632-2153/ad5fdd},
   number={3},
   journal={Machine Learning: Science and Technology},
   publisher={IOP Publishing},
   author={Belis, Vasilis and Odagiu, Patrick and Grossi, Michele and Reiter, Florentin and Dissertori, Günther and Vallecorsa, Sofia},
   year={2024},
   month=jul, pages={035010} }

Installing Dependencies

We strongly recommend using conda to install the dependencies for this repo. If you have 'conda', go into the folder with the code you want to run, then create an environment from the .yml file in that folder. Activate the environment. Now you can run the code! Go to the Running the code section. for further instructions.

If you do not want to use conda, here is a list of the packages you would need to install:

Pre-processing

  • numpy
  • pandas
  • pytables
  • matplotlib
  • scikit-learn

Auto-encoders

  • numpy
  • matplotlib
  • scikit-learn
  • pytorch (follow instruction here)
  • torchinfo
  • pykeops
    • g++ compiler version >= 7
    • cudatoolkit version >= 10
  • geomloss

Pennylane VQC

  • numpy
  • matplotlib
  • scikit-learn
  • pytorch (follow instruction here)
  • torchinfo
  • pykeops
    • g++ compiler version >= 7
    • cudatoolkit version >= 10
  • geomloss
  • pennylane
  • pennylane-qiskit
  • pennylane-lightning[gpu]
    • NVidia cuQuantum SDK

The pykeops package is required to run the Sinkhorn auto-encoder. However, it is a tricky package to manage, so make sure that you have a gcc and a g++ compiler in your path that is compatible with the version of cuda you are running. We recommend using conda for exactly this reason, since conda sets certain environment variables such that everything is configured correctly and pykeops can compile using cuda.

If you encounter any bugs, please contact us at the email addresses listed on this repository.

Running the Code

The data preprocessing scripts are ran from inside the preprocessing folder. These scripts were customised for the specific data set that the authors are using. For access to this data, please contact us.

The preprocessing scripts produce normalised numpy arrays saved to three different files for training, validation, and testing.

The scripts to launch the autoencoder training on the data are in the bin folder. Look for the run.snip files to see the basic run cases for the code and customise from there.

About

Guided Quantum Compression (GQC) network for simultaneous dimensionality reduction and classification of high-dimensional data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •