Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cellxgene-schema must update validation for X (Matrix Layers) #1099

Open
brianraymor opened this issue Nov 14, 2024 · 5 comments
Open

cellxgene-schema must update validation for X (Matrix Layers) #1099

brianraymor opened this issue Nov 14, 2024 · 5 comments
Assignees
Labels
5.3 Next minor CELLxGENE schema version after 5.2 curation software

Comments

@brianraymor
Copy link
Contributor

Changelog

X (Matrix Layers)

  • Updated Visium Spatial Gene Expression table row to Descendants of Visium Spatial Gene Expression
  • Added matrix requirements for Visium CytAssist Spatial Gene Expression, 11mm.

Design

Assay "raw" required? "raw" location "normalized" required? "normalized" location
Descendant of Visium Spatial Gene Expression REQUIRED. Values MUST be de-duplicated molecule counts. All non-zero values MUST be positive integers stored as numpy.float32.

If uns['spatial']['is_single'] is False then each cell MUST contain at least one non-zero value.

If uns['spatial']['is_single'] is True then the unfiltered feature-barcode matrix (raw_feature_bc_matrix) MUST be used. See Space Ranger Feature-Barcode Matrices.

if assay_ontology_term_id is "EFO:0022860" for Visium CytAssist Spatial Gene Expression, 11mm, this matrix MUST contain 14336 rows; otherwise, this matrix MUST contain 4992 rows.

If the obs['in_tissue'] value is 1, then the cell MUST contain at least one non-zero value. If any obs['in_tissue'] values are 0, then at least one cell corresponding to a obs['in_tissue'] with a value of 0 MUST contain a non-zero value.
AnnData.raw.X unless no "normalized" is provided, then AnnData.X STRONGLY RECOMMENDED AnnData.X
@brianraymor brianraymor added curation software 5.3 Next minor CELLxGENE schema version after 5.2 labels Nov 14, 2024
@ejmolinelli ejmolinelli self-assigned this Nov 25, 2024
@ejmolinelli
Copy link
Contributor

@brianraymor

For this requirement:

If uns['spatial']['is_single'] is True then the unfiltered feature-barcode matrix (raw_feature_bc_matrix) MUST be used. See Space Ranger Feature-Barcode Matrices.

Is this simply a check in the matrix dimensions? How else would we know that the matrix is or is not raw_feature_bc_matrix?

@ejmolinelli
Copy link
Contributor

@brianraymor

When might a cell in a matrix have MORE than one value?

If uns['spatial']['is_single'] is False then each cell MUST contain at least one non-zero value.

@nayib-jose-gloria
Copy link
Contributor

@brianraymor

When might a cell in a matrix have MORE than one value?

If uns['spatial']['is_single'] is False then each cell MUST contain at least one non-zero value.

The "Cell" in this context refers to a biological cell; each row in our matrix represents a cell from our obs dataframe (and each col is a gene). Therefore, we must check that each row in our matrix has at least one non-zero value (that is, ensure each cell has at least one gene with a gene expression value > 0)

@ejmolinelli
Copy link
Contributor

@brianraymor
When might a cell in a matrix have MORE than one value?

If uns['spatial']['is_single'] is False then each cell MUST contain at least one non-zero value.

The "Cell" in this context refers to a biological cell; each row in our matrix represents a cell from our obs dataframe (and each col is a gene). Therefore, we must check that each row in our matrix has at least one non-zero value (that is, ensure each cell has at least one gene with a gene expression value > 0)

Thanks @nayib-jose-gloria , I was thinking of a cell as in a column/row pair in a 2D matrix. This makes a ton of sense!

@nayib-jose-gloria
Copy link
Contributor

Is this simply a check in the matrix dimensions? How else would we know that the matrix is or is not raw_feature_bc_matrix?

correct, all our validator can do is enforce the right matrix dimensions. the rest is instructions for a curator/submitter to do manually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5.3 Next minor CELLxGENE schema version after 5.2 curation software
Projects
None yet
Development

No branches or pull requests

3 participants