Breaking change in AUCell_buildRankings() on dgCMatrix objects #28

hpages · 2022-05-20T15:22:49Z

Hi,

This used to work:

library(Matrix)
library(AUCell)
exprMatrix <- cbind(cell1=100*4:0, cell2=c(500, 0, 90, 0, 750))
rownames(exprMatrix) <- sprintf("gene%02d", seq_len(nrow(exprMatrix)))
AUCell_buildRankings(as(exprMatrix, "dgCMatrix"), plotStats=FALSE, verbose=FALSE)

but with the latest AUCell (version 1.19.1), it no longer works:

AUCell_buildRankings(as(exprMatrix, "dgCMatrix"), plotStats=FALSE, verbose=FALSE)
# Error in .AUCell_buildRankings(exprMat = exprMat, featureType = featureType,  : 
#   To use a dgCMatrix as input set 'splitByBlocks=TRUE'.

Is this change intended?

This kind of change should be considered carefully as it breaks existing code, like the code used in the OSCA book. See https://bioconductor.org/checkResults/3.16/books-LATEST/OSCA.basic/nebbiolo2-buildsrc.html

If this change was introduced by mistake, I would strongly suggest (again) that you improve your unit tests to cover the various situations that AUCell_buildRankings() is expected to handle.

Thanks,
H.

The text was updated successfully, but these errors were encountered:

s-aibar · 2022-05-23T07:49:08Z

Dear @hpages ,

In the last release we have added support for sparse matrices, so that that is the current expected behaviour.

When using dense matrices (as before), the behavior should have not changed, and you can keep using the same code.

However, this was not optimal, so we have added the possibility to use sparse matrices, and run the rankings in chunks. This is more efficient and facilitates the analyses of bigger datasets or in more limited computers. Therefore, AUCell_buildRankings now requires to add the argument splitByBlocks=TRUE (see example below). Ideally, the behaviour of the fucntion would be exactly the same as with dense matrices (without requiring any difference in inputs), but I have not found the way to do so...
Sorry for the confusion and the inconvenience for the dependencies,
Sara

example:

library(Matrix)
library(AUCell)
exprMatrix <- cbind(cell1=100*4:0, cell2=c(500, 0, 90, 0, 750))
rownames(exprMatrix) <- sprintf("gene%02d", seq_len(nrow(exprMatrix)))

# With standard matrix: Same behaviour as before
AUCell_buildRankings(exprMatrix, plotStats=FALSE, verbose=FALSE)


# With sparse matrix: Neet to set the argument 'splitByBlocks'
AUCell_buildRankings(as(exprMatrix, "dgCMatrix"), plotStats=FALSE, verbose=FALSE, splitByBlocks=TRUE)

hpages · 2022-05-23T17:19:02Z

Can we please be less eager to close this?

In the last release we have added support for sparse matrices, so that that is the current expected behaviour.

There's a difference between adding support for sparse matrices and improving support for sparse matrices.

As I reported above, previous versions of the package (e.g. AUCell 1.16.0) already supported dgCMatrix objects:

library(Matrix)
library(SummarizedExperiment)
library(AUCell)
packageVersion("AUCell")
# [1] ‘1.16.0’

m1 <- cbind(cell1=100*4:0, cell2=c(500, 0, 90, 0, 750))
rownames(m1) <- sprintf("gene%02d", seq_len(nrow(m1)))
m2 <- as(m1, "dgCMatrix")
m2
# 5 x 2 sparse Matrix of class "dgCMatrix"
#        cell1 cell2
# gene01   400   500
# gene02   300     .
# gene03   200    90
# gene04   100     .
# gene05     .   750

rankings <- AUCell_buildRankings(m2, plotStats=FALSE, verbose=FALSE)
assay(rankings)
#         cells
# genes    cell1 cell2
#   gene01     1     2
#   gene02     2     5
#   gene03     3     3
#   gene04     4     4
#   gene05     5     1

Note that to make this work, the package was providing an AUCell_buildRankings method for dgCMatrix objects:

> selectMethod(AUCell_buildRankings, "dgCMatrix")
Method Definition:

function (exprMat, plotStats = TRUE, nCores = 1, mctype = c("domc")[1], 
    keepZeroesAsNA = FALSE, verbose = TRUE, ...) 
{
    .local <- function (exprMat, plotStats = TRUE, nCores = 1, 
        mctype = c("domc")[1], keepZeroesAsNA = FALSE, verbose = TRUE) 
    {
        exprMat <- as.matrix(exprMat)
        .AUCell_buildRankings(exprMat = exprMat, plotStats = plotStats, 
            nCores = nCores, mctype = mctype, keepZeroesAsNA = keepZeroesAsNA, 
            verbose = verbose)
    }
    .local(exprMat, plotStats, nCores, mctype, keepZeroesAsNA, 
        verbose, ...)
}
<bytecode: 0x55de0f23cde8>
<environment: namespace:AUCell>

Signatures:
        exprMat    
target  "dgCMatrix"
defined "dgCMatrix"

How this method was handling dgCMatrix objects was admittedly not efficient, and I can see that you decided to improve that.

However, adding the splitByBlocks argument and forcing the user to set it to TRUE on dgCMatrix objects is completely unnecessary. If the function receives a dgCMatrix object, then it should just do the right thing, and there should be no need for the end user to change the way they call AUCell_buildRankings(). In other words, ordinary (a.k.a. dense) matrices and sparse matrices should be interchangeable, in a manner that is transparent to the user.

As I said, changing the interface of the function in a way that breaks analyses that were already using the function on sparse data (like the OSCA book) should not be done lightly.

The Bioconductor project has a strong commitment to reproducibility, and Bioconductor packages are expected to avoid changes that break backward compatibility, or at least to minimize their impact when those changes can't be avoided. In your case breaking backward compatibility can and should be avoided.

Please reconsider.

Thanks,
H.

alanocallaghan · 2022-05-23T18:00:09Z

Furthermore if splitByBlocks should always be TRUE when the input is a dgCMatrix, then you could set this automatically in the argument list.

Something like this:

buildrankings <- function(matrix, splitByBlocks = inherits(matrix, "dgCMatrix")) {}

would set splitByBlocks to be TRUE as default iff matrix is a dgCMatrix.

s-aibar · 2022-05-24T07:38:08Z

Ok, thank you for the comments, I will look into it during the next weeks.

hpages · 2022-08-30T18:27:35Z

@s-aibar Just a reminder that this issue is still pending and has been breaking the OSCA book for several months: https://bioconductor.org/checkResults/3.15/books-LATEST/OSCA.basic/nebbiolo1-buildsrc.html Thanks!

@alanocallaghan Note that is() is preferred over inherits() for S4 classes so I would just use is(matrix, "dgCMatrix") here. It's also less typing 😉

alanocallaghan · 2022-10-06T20:14:27Z

Any chance this can be reviewed?

s-aibar · 2022-10-25T14:27:23Z

Hello,

I have updated the Development brach of the package to include your suggestions. They will be published with the next version of Bioconductor (scheduled for Nov 2).

Thank you for your feedback and suggestions!

hpages mentioned this issue May 20, 2022

'rand > 0.9 is not TRUE' error in BioC 3.15 and 3.16 OSCA-source/OSCA.basic#4

Closed

s-aibar closed this as completed May 23, 2022

s-aibar reopened this May 24, 2022

s-aibar self-assigned this May 24, 2022

alanocallaghan mentioned this issue Oct 4, 2022

Cell-annotation fix for AUCell OSCA-source/OSCA.basic#6

Closed

PeteHaitch mentioned this issue Oct 13, 2022

Cell type annotation chapter broken because of breaking changes in AUCell OSCA-source/OSCA.basic#8

Closed

s-aibar added the Solved? label Oct 31, 2022

ghuls mentioned this issue Apr 5, 2024

Error while running TILPRED #43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking change in AUCell_buildRankings() on dgCMatrix objects #28

Breaking change in AUCell_buildRankings() on dgCMatrix objects #28

hpages commented May 20, 2022 •

edited

Loading

s-aibar commented May 23, 2022

hpages commented May 23, 2022

alanocallaghan commented May 23, 2022 •

edited

Loading

s-aibar commented May 24, 2022

hpages commented Aug 30, 2022

alanocallaghan commented Oct 6, 2022

s-aibar commented Oct 25, 2022

Breaking change in AUCell_buildRankings() on dgCMatrix objects #28

Breaking change in AUCell_buildRankings() on dgCMatrix objects #28

Comments

hpages commented May 20, 2022 • edited Loading

s-aibar commented May 23, 2022

hpages commented May 23, 2022

alanocallaghan commented May 23, 2022 • edited Loading

s-aibar commented May 24, 2022

hpages commented Aug 30, 2022

alanocallaghan commented Oct 6, 2022

s-aibar commented Oct 25, 2022

hpages commented May 20, 2022 •

edited

Loading

alanocallaghan commented May 23, 2022 •

edited

Loading