sparsePKL is a pairwise kernel learning algorithm based on nonsmooth DC (difference of two convex functions) optimization. It learns sparse models for predicting in pairwise data (e.g. drug-target interactions) by using double regularization with both L1-norm and L0-pseudonorm. The nonsmooth DC optimization problem is solved using the limited memory bundle DC algorithm (LMB-DCA). In addition, sparsePKL uses pairwise Kronecker product kernels computed via generalized vec-trick to model interactions between drug and target features. The included loss-functions for the pairwise kernel problem are:
- squared loss,
- squared epsilon-insensitive loss,
- epsilon-insensitive squared loss,
- epsilon-insensitive absolute loss,
- absolute loss.
-
sparsepkl.py
- Main python file. Includes RLScore calls.
-
pkl_utility.py
- Python utility programs.
-
sparsepkl.f95
- Main Fortran file for sparsePKL software.
-
lmbdca.f95
- LMB-DCA - the limited memory bundle DC algorithm.
-
solvedca.f95
- Limited memory bundle method for solving convex DCA-type of problems.
-
objfun.f95
- Computation of the function and subgradients values with different loss functions. Selection between loss functions is made in sparsepkl.py
-
initpkl.f95
- Initialization of parameters and variables in sparsePKL and LMB-DCA. Includes modules:
- initpkl - Initialization of parameters for pairwise learning.
- initlmbdca - Initialization of LMB-DCA.
- Initialization of parameters and variables in sparsePKL and LMB-DCA. Includes modules:
-
parameters.f95
- Parameters for Fortran. Inludes modules:
- r_precision - Precision for reals,
- param - Parameters,
- exe_time - Execution time.
- Parameters for Fortran. Inludes modules:
-
subpro.f95
- subprograms for LMB-DCA and LMBM.
-
data.py
- Contains functions to load the example data sets. Data files are assumed to be in a folder "data" that is not part of the current folder.
- Contains functions to create train-test-validation splits. Splits are created for every experimental setting S1-S4 (see the reference below).
-
Makefile
- makefile: builds a shared library to allow sparsepkl (Fortran95 code) to be called from Python. Uses f2py, Python3.7, and requires a Fortran compiler (gfortran) to be installed.
The source uses f2py and Python3.7, and requires a Fortran compiler (gfortran by default) and the RLScore to be installed.
To use the code:
- Select the data, loss function, and the desired sparsity level from sparsepkl.py file.
- Run Makefile (by typing "make") to build a shared library that allows sparsepkl (Fortran95 code) to be called from Python.
- Finally, just type "python3.7 sparsepkl.py".
The algorithm returns a csv-file with performance measures (C-index and MSE) computed in the test set under different experimental settings S1-S4. The best results are selected using a separate validation set and validated w.r.t. C-index. In addition, separate csv-files with predictions under different experimental settings S1-S4 are returned.
- sparsePKL and LMB-DCA:
- N. Karmitsa, K. Joki, A. Airola, T. Pahikkala, "Limited memory bundle DC algorithm for sparse pairwise kernel learning", 2023.
- RLScore:
- T. Pahikkala, A. Airola, "Rlscore: Regularized least-squares learners", Journal of Machine Learning Research, Vol. 17, No. 221, pp. 1-5, 2016.
- LMBM:
- N. Haarala, K. Miettinen, M.M. Mäkelä, "Globally Convergent Limited Memory Bundle Method for Large-Scale Nonsmooth Optimization", Mathematical Programming, Vol. 109, No. 1, pp. 181-205, 2007.
- M. Haarala, K. Miettinen, M.M. Mäkelä, "New Limited Memory Bundle Method for Large-Scale Nonsmooth Optimization", Optimization Methods and Software, Vol. 19, No. 6, pp. 673-692, 2004.
- Generalized vec trick and experimental settings:
- A. Airola, T. Pahikkala, "Fast kronecker product kernel methods via generalized vec trick", IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, pp. 3374–3387, 2018.
- M. Viljanen, A. Airola, T. Pahikkala, "Generalized vec trick for fast learning of pairwise kernel models", Machine Learning, Vol. 111, 543–573, 2022.
- Nonsmooth optimization:
- A. Bagirov, N. Karmitsa, M.M. Mäkelä, "Introduction to nonsmooth optimization: theory, practice and software", Springer, 2014.
The work was financially supported by the Research Council of Finland projects (Project No. #345804 and #345805) led by Antti Airola and Tapio Pahikkala.