From c8176e521f022d3d1376f956042125524ef18911 Mon Sep 17 00:00:00 2001 From: Nirmayi Date: Mon, 28 Oct 2024 13:58:07 +0100 Subject: [PATCH] update readme --- README.md | 304 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 167 insertions(+), 137 deletions(-) diff --git a/README.md b/README.md index 9ddb773..e2f934e 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Spatial decomposition +# Spatial Decomposition file_single_cell - comp_process_dataset-->file_spatial_masked comp_process_dataset-->file_solution + comp_process_dataset-->file_spatial_masked file_single_cell---comp_control_method file_single_cell---comp_method - file_spatial_masked---comp_control_method - file_spatial_masked---comp_method file_solution---comp_control_method file_solution---comp_metric + file_spatial_masked---comp_control_method + file_spatial_masked---comp_method comp_control_method-->file_output comp_method-->file_output comp_metric-->file_score @@ -98,46 +99,43 @@ Format: -Slot description: +Data structure:
-| Slot | Type | Description | -|:-----------------------------|:----------|:--------------------------------------------------------------------------------------------------------------------| -| `obs["cell_type"]` | `string` | Cell type label IDs. | -| `obs["batch"]` | `string` | A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. | -| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. | -| `var["hvg_score"]` | `double` | A ranking of the features by hvg. | -| `obsm["X_pca"]` | `double` | (*Optional*) The resulting PCA embedding. | -| `layers["counts"]` | `integer` | Raw counts. | -| `uns["cell_type_names"]` | `string` | (*Optional*) Cell type names corresponding to values in `cell_type`. | -| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | -| `uns["dataset_name"]` | `string` | Nicely formatted name. | -| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | -| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. | -| `uns["dataset_summary"]` | `string` | Short description of the dataset. | -| `uns["dataset_description"]` | `string` | Long description of the dataset. | -| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. | +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_type"]` | `string` | Cell type label IDs. | +| `obs["batch"]` | `string` | A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. | +| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. | +| `var["hvg_score"]` | `double` | A ranking of the features by hvg. | +| `obsm["X_pca"]` | `double` | (*Optional*) The resulting PCA embedding. | +| `layers["counts"]` | `integer` | Raw counts. | +| `uns["cell_type_names"]` | `string` | (*Optional*) Cell type names corresponding to values in `cell_type`. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["dataset_name"]` | `string` | Nicely formatted name. | +| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | +| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. | +| `uns["dataset_summary"]` | `string` | Short description of the dataset. | +| `uns["dataset_description"]` | `string` | Long description of the dataset. | +| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
## Component type: Data processor -Path: -[`src/process_dataset`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/process_dataset) - A spatial decomposition dataset processor. Arguments:
-| Name | Type | Description | -|:--------------------------|:-------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--input` | `file` | A subset of the common dataset. | -| `--output_single_cell` | `file` | (*Output*) The single-cell data file used as reference for the spatial data. | -| `--output_spatial_masked` | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. | -| `--output_solution` | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. | +| Name | Type | Description | +|:---|:---|:---| +| `--input` | `file` | A subset of the common dataset. | +| `--output_single_cell` | `file` | (*Output*) The single-cell data file used as reference for the spatial data. | +| `--output_spatial_masked` | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. | +| `--output_solution` | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. |
@@ -146,7 +144,7 @@ Arguments: The single-cell data file used as reference for the spatial data Example file: -`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad` +`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad` Format: @@ -159,148 +157,139 @@ Format: -Slot description: +Data structure:
-| Slot | Type | Description | -|:-------------------------|:----------|:---------------------------------------------------------------------------------------------------------------------------------| -| `obs["cell_type"]` | `string` | Cell type label IDs. | -| `obs["batch"]` | `string` | (*Optional*) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. | -| `layers["counts"]` | `integer` | Raw counts. | -| `uns["cell_type_names"]` | `string` | Cell type names corresponding to values in `cell_type`. | -| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_type"]` | `string` | Cell type label IDs. | +| `obs["batch"]` | `string` | (*Optional*) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. | +| `layers["counts"]` | `integer` | Raw counts. | +| `uns["cell_type_names"]` | `string` | Cell type names corresponding to values in `cell_type`. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
-## File format: Spatial masked +## File format: Solution The spatial data file containing transcription profiles for each capture -location, without cell-type proportions for each spot. +location, with true cell-type proportions for each spot / capture +location. Example file: -`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad` +`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad` Format:
AnnData object - obsm: 'coordinates' + obsm: 'spatial', 'proportions_true' layers: 'counts' - uns: 'cell_type_names', 'dataset_id' + uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
-Slot description: +Data structure:
-| Slot | Type | Description | -|:-------------------------|:----------|:--------------------------------------------------------------------------| -| `obsm["coordinates"]` | `double` | XY coordinates for each spot. | -| `layers["counts"]` | `integer` | Raw counts. | -| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions_pred` in output. | -| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| Slot | Type | Description | +|:---|:---|:---| +| `obsm["spatial"]` | `double` | XY coordinates for each spot. | +| `obsm["proportions_true"]` | `double` | True cell type proportions for each spot. | +| `layers["counts"]` | `integer` | Raw counts. | +| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions`. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["dataset_name"]` | `string` | Nicely formatted name. | +| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | +| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. | +| `uns["dataset_summary"]` | `string` | Short description of the dataset. | +| `uns["dataset_description"]` | `string` | Long description of the dataset. | +| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. | +| `uns["normalization_id"]` | `string` | Which normalization was used. |
-## File format: Solution +## File format: Spatial masked The spatial data file containing transcription profiles for each capture -location, with true cell-type proportions for each spot / capture -location. +location, without cell-type proportions for each spot. Example file: -`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad` +`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad` Format:
AnnData object - obsm: 'coordinates', 'proportions_true' + obsm: 'spatial' layers: 'counts' - uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id' + uns: 'cell_type_names', 'dataset_id'
-Slot description: +Data structure:
-| Slot | Type | Description | -|:-----------------------------|:----------|:-------------------------------------------------------------------------------| -| `obsm["coordinates"]` | `double` | XY coordinates for each spot. | -| `obsm["proportions_true"]` | `double` | True cell type proportions for each spot. | -| `layers["counts"]` | `integer` | Raw counts. | -| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions`. | -| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | -| `uns["dataset_name"]` | `string` | Nicely formatted name. | -| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | -| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. | -| `uns["dataset_summary"]` | `string` | Short description of the dataset. | -| `uns["dataset_description"]` | `string` | Long description of the dataset. | -| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. | -| `uns["normalization_id"]` | `string` | Which normalization was used. | +| Slot | Type | Description | +|:---|:---|:---| +| `obsm["spatial"]` | `double` | XY coordinates for each spot. | +| `layers["counts"]` | `integer` | Raw counts. | +| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions_pred` in output. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
## Component type: Control method -Path: -[`src/control_methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/control_methods) - Quality control methods for verifying the pipeline. Arguments:
-| Name | Type | Description | -|:-------------------------|:-------|:-----------------------------------------------------------------------------------------------------------------------------------------------------| -| `--input_single_cell` | `file` | The single-cell data file used as reference for the spatial data. | -| `--input_spatial_masked` | `file` | The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. | -| `--input_solution` | `file` | The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. | -| `--output` | `file` | (*Output*) Spatial data with estimated proportions. | +| Name | Type | Description | +|:---|:---|:---| +| `--input_single_cell` | `file` | The single-cell data file used as reference for the spatial data. | +| `--input_spatial_masked` | `file` | The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. | +| `--input_solution` | `file` | The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. | +| `--output` | `file` | (*Output*) Spatial data with estimated proportions. |
## Component type: Method -Path: -[`src/methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/methods) - A spatial composition method. Arguments:
-| Name | Type | Description | -|:-------------------------|:-------|:--------------------------------------------------------------------------------------------------------------------------------| -| `--input_single_cell` | `file` | The single-cell data file used as reference for the spatial data. | +| Name | Type | Description | +|:---|:---|:---| +| `--input_single_cell` | `file` | The single-cell data file used as reference for the spatial data. | | `--input_spatial_masked` | `file` | The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. | -| `--output` | `file` | (*Output*) Spatial data with estimated proportions. | +| `--output` | `file` | (*Output*) Spatial data with estimated proportions. |
## Component type: Metric -Path: -[`src/metrics`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/metrics) - A spatial decomposition metric. Arguments:
-| Name | Type | Description | -|:-------------------|:-------|:-----------------------------------------------------------------------------------------------------------------------------------------------------| -| `--input_method` | `file` | Spatial data with estimated proportions. | +| Name | Type | Description | +|:---|:---|:---| +| `--input_method` | `file` | Spatial data with estimated proportions. | | `--input_solution` | `file` | The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. | -| `--output` | `file` | (*Output*) Metric score file. | +| `--output` | `file` | (*Output*) Metric score file. |
@@ -309,35 +298,31 @@ Arguments: Spatial data with estimated proportions. Example file: -`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad` - -Description: - -Spatial data file with estimated cell type proportions. +`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad` Format:
AnnData object - obsm: 'coordinates', 'proportions_pred' + obsm: 'spatial', 'proportions_pred' layers: 'counts' uns: 'cell_type_names', 'dataset_id', 'method_id'
-Slot description: +Data structure:
-| Slot | Type | Description | -|:---------------------------|:----------|:-----------------------------------------------------------| -| `obsm["coordinates"]` | `double` | XY coordinates for each spot. | -| `obsm["proportions_pred"]` | `double` | Estimated cell type proportions for each spot. | -| `layers["counts"]` | `integer` | Raw counts. | -| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions`. | -| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | -| `uns["method_id"]` | `string` | A unique identifier for the method. | +| Slot | Type | Description | +|:---|:---|:---| +| `obsm["spatial"]` | `double` | XY coordinates for each spot. | +| `obsm["proportions_pred"]` | `double` | Estimated cell type proportions for each spot. | +| `layers["counts"]` | `integer` | Raw counts. | +| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions`. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["method_id"]` | `string` | A unique identifier for the method. |
@@ -346,7 +331,7 @@ Slot description: Metric score file. Example file: -`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad` +`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad` Format: @@ -357,16 +342,61 @@ Format: -Slot description: +Data structure:
-| Slot | Type | Description | -|:-----------------------|:---------|:---------------------------------------------------------------------------------------------| -| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | -| `uns["method_id"]` | `string` | A unique identifier for the method. | -| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. | +| Slot | Type | Description | +|:---|:---|:---| +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["method_id"]` | `string` | A unique identifier for the method. | +| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. | | `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+## File format: Common Dataset + +A subset of the common dataset. + +Example file: +`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/simulated_dataset.h5ad` + +Format: + +
+ + AnnData object + obs: 'cell_type', 'batch' + var: 'hvg', 'hvg_score' + obsm: 'X_pca', 'spatial', 'proportions_true' + layers: 'counts' + uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_type"]` | `string` | Cell type label IDs. | +| `obs["batch"]` | `string` | A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. | +| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. | +| `var["hvg_score"]` | `double` | A ranking of the features by hvg. | +| `obsm["X_pca"]` | `double` | The resulting PCA embedding. | +| `obsm["spatial"]` | `double` | (*Optional*) XY coordinates for each spot. | +| `obsm["proportions_true"]` | `double` | (*Optional*) True cell type proportions for each spot. | +| `layers["counts"]` | `integer` | Raw counts. | +| `uns["cell_type_names"]` | `string` | (*Optional*) Cell type names corresponding to values in `cell_type`. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["dataset_name"]` | `string` | Nicely formatted name. | +| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | +| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. | +| `uns["dataset_summary"]` | `string` | Short description of the dataset. | +| `uns["dataset_description"]` | `string` | Long description of the dataset. | +| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. | + +
+