Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement - Matrix Multiplication in make_cicero_cds #53

Open
Austin-s-h opened this issue May 21, 2020 · 0 comments
Open

Enhancement - Matrix Multiplication in make_cicero_cds #53

Austin-s-h opened this issue May 21, 2020 · 0 comments

Comments

@Austin-s-h
Copy link

Hi! I have a clunky enhancement that I made to make_cicero_cds that requires the use of Rcpp. I'm not the best at pull requests yet, so I'll report it here and if you choose to implement it, it's up to you :)

Basically, I am working with a very large set of imputed data (14,000 cells x 256,000 peaks). Normally this data is very sparse (~95%) but after imputation, less than half a percent is a 0. Previously make_cicero_cds has been able to (although not super efficiently) work with this by converting sparse matrices to dense ones within R. The specific step that was overloading my system (64 core 256GB RAM) was the Matrix multiplication on line 73.
new_exprs <- exprs_old %*% mask

So, I decided to spend some time trying to find alternatives.

Rcpp Implementation

I figured that a more efficient way of accomplishing this would be to use C++, implemented through Rcpp. I searched the internet and found this script.

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]

#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP armaMatMult(arma::mat A, arma::mat B){
    arma::mat C = A * B;
    
    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, Eigen::MatrixXd B){
    Eigen::MatrixXd C = A * B;
    
    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapMatMult(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
    Eigen::MatrixXd C = A * B;
    
    return Rcpp::wrap(C);
}

Following the article, I saved this as MatrixMtp.cpp and loaded it into my R session.
Next, I edited the make_cicero_cds function:

    #mask <- Matrix::Matrix(mask)
    #new_exprs <- exprs_old %*% mask
    mask <- as(mask, "matrix")
    new_exprs <- eigenMatMult(as(exprs_old,"matrix"), mask)

Now, this function eigenMatMult works, but the faster version eigenMapMatMult didn't, due to the error

Error in eigenMapMatMult(as(exprs_old, "matrix"), mask) : 
  Wrong R type for mapped matrix

I'm not a C++ expert so I just used the other version.

It worked! Although it took between 50-125GB of RAM to complete the function, it was able to actually make the dataset. I think there may be other people struggling with this problem, so maybe this will help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant