-
Moved
src/tasks/batch_integration
totask_batch_integration
(PR #910). -
Moved
src/tasks/denoising
totask_denoising
(PR #910). -
Moved
src/tasks/dimensionality_reduction
totask_dimensionality_reduction
(PR #910). -
Moved
src/tasks/label_projection
totask_label_projection
(PR #910). -
Moved
src/tasks/match_modalities
totask_match_modalities
(PR #910). -
Moved
src/tasks/predict_modality
totask_predict_modality
(PR #910). -
Moved
src/tasks/spatial_decomposition
totask_spatial_decomposition
(PR #910). -
Moved
src/tasks/spatially_variable_genes
totask_spatially_variable_genes
(PR #910).
- Update Viash to 0.9.0 (PR #911).
-
Add the CELLxGENE immune cell atlas dataset as a common test resource (PR #907).
-
Update
dataset_id
fortenx_visium
,zenodo_spatial
,zenodo_spatial_slidetags
datasets and usemouse_brain_coronal
as a test resource in thespatially_variable_genes
task (PR #908). -
Update
get_task_info
,get_method_info
andget_metrics_info
reporting components with more info and extend output (PR #915).
-
Fix extracting metadata from anndata files in the
extract_metadata
component (PR #914). -
Fix path in
touch
cmd inreporting/process_task_results/run_test.sh
(PR #915).
A major update to the OpenProblems framework, switching from a Python-based framework to a Viash + Nextflow-based framework. This update features the same concepts as the previous version, but with a new implementation that is more flexible, scalable, and maintainable.
Most relevant parts of the overall structure:
-
src/tasks
: Benchmarking tasks:batch_integration
: Batch integrationdenoising
: Denoisingdimensionality_reduction
: Dimensionality reductionmatch_modalities
: Match modalitiespredict_modality
: Predict modalityspatial_decomposition
: Spatial decompositionspatially_variable_genes
: Spatially variable genes
-
src/datasets
: Components for creating common datasets. Loaders:cellxgene_census
: Query cells from a CellxGene Censusopenproblems_neurips2021_bmmc
: Fetch a dataset from the OpenProblems NeurIPS2021 competitionopenproblems_neurips2022_pbmc
: Fetch a dataset from the OpenProblems NeurIPS2022 competitionopenproblems_v1
: Fetch a legacy OpenProblems v1 datasetopenproblems_v1_multimodal
: Fetch a legacy OpenProblems v1 multimodal datasettenx_visium
: Fetch a and convert 10x Visium datasetzenodo_spatial
: Fetch and process an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.zenodo_spatial_slidetags
: Download a compressed file containing gene expression matrix and spatial locations from zenodo.
-
src/common
: Common components used by all tasks.check_dataset_schema
: Check whether an h5ad dataset adheres to a dataset schemacheck_yaml_schema
: Check whether a YAML adheres to a JSON schemacomp_tests
: Reusable component unit testscreate_component
: Create a component Viash component.create_task_readme
: Create a README for an OpenProblems task.extract_metadata
: Extract the.uns
metadata from an h5ad file.helper_functions
: Commonly used helper functions in Python or in R,process_task_results
: Process the raw tasks results (containing raw logs, unprocessed component configs, and various metrics) into nicely formatted task results.schemas
: JSON schemas for YAML files in the repositorysync_test_resources
: Synchronise the test resources from s3 to resources_test
For more information related to the structure of this repository, see the documentation.
Note: This changelog was automatically generated from the git log.
- Added
cell2location
to thespatial_decomposition
task. - Added nearest-neighbor ranking matrix computation to
_utils
. - Datasets now store nearest-neighbor ranking matrix in
adata.obsm["X_ranking"]
. - Added support for parsing Nextflow output and generating benchmark results for the website.
- Added
max_samples
parameter toqlocal
,qglobal
,qnn_auc
,lcmc
,qnn
, andcontinuity
metrics to allow for subsampling of data for faster computation. - Added new scArches based methods:
scarches_scanvi_xgb_all_genes
andscarches_scanvi_xgb_hvg
. - Added
prediction_method
parameter to_scanvi_scarches
to specify prediction method. - Added
_pred_xgb
function to perform XGBoost prediction based on latent representations. - Added
obsm
parameter to_xgboost
function to allow specifying the embedding space for XGBoost training.
- Updated
scvi-tools
to version0.20
in both Python and R environments. - Updated datasets to include nearest-neighbor ranking matrix.
- Modified dimensionality reduction task to include nearest-neighbor ranking matrix computation in dataset generation.
- The website update workflow was refactored to use a new workflow using json instead of markdown.
- Updated the website generation process to remove duplicate BibTex entries.
- Added a new
parse_metadata.py
script for generating metadata for the website. - Added a new function to
openproblems.utils.py
to get the member ID of a task, dataset, method or metric. - Removed the redundant computation and storage of the nearest-neighbor ranking matrix in datasets.
- Updated method names to be shorter and more consistent across tasks.
- Improved method summaries for clarity.
- Updated JAX and JAXlib versions to 0.4.6.
- Updated dependencies to support new versions of Snakemake and GitPython.
- Removed code related to "nbt2022-reproducibility" repo and merged it into the main website.
- Updated the schema for benchmark results to include submission time, code version, and resource usage metrics.
- Improved error handling and added logging to the parsing script.
- Removed the "raw.json" file from the results directory and merged all data into a single "results.json" file.
- Updated the workflow to upload the final results to the website's results directory instead of the data directory.
- Removed unnecessary code and refactored the parsing script for better readability.
- Added unit tests for the new parsing script.
- Updated the
run_tests
workflow to skip testing on thetest_website
branch. - Updated the
run_tests
workflow to skip testing on thetest_process
branch. - Updated the
create-pull-request
step to set the author for the pull request. - Updated the
run_tests
workflow to skip testing on pull request reviews. - Updated the
update_website_content
workflow to update the website on themain
branch. - Updated the
main.bib
file to fix a typo. - Removed extraneous headings from task README files.
- Updated
generate_test_matrix.py
to use the newopenproblems.utils.get_member_id
function. - Updated the website generation process to copy BibTex files to the correct location.
- Updated the
process_requires
section insetup.py
to includegitpython
. - Updated git commit hash generation for openproblems functions.
- Modified
_xgboost
to allow for specifyingtree_method
. - Modified
_scanvi_scarches
to consistently useunlabeled_category
. - Modified
_scanvi_scarches
to remove unnecessary copying oflabels
. - Removed
_scanvi_scarches
functions that were redundant with_scanvi_scarches
. - Removed unused
_scanvi
functions. - Modified
_scanvi_scarches
to allow for specifyingprediction_method
and handleunlabeled_category
consistently.
- Improved the documentation of the
auprc
metric. - Improved the documentation of the
cell2location
methods. - Document sub-stub task behaviour
- Fixed an error in
neuralee_default
where thesubsample_genes
argument could be too small. - Fixed an error in
knn_naive
where theis_baseline
argument was set toFalse
. - Fixed calculation of ranking matrix in
_utils
to include ties. - Fixed a bug in
load_tenx_5k_pbmc()
where a warning about non-unique variable names was being raised. - Removed the unused
_utils.py
file. - Removed the
X_ranking
entry from theobsm
attribute of datasets. - The
_fit()
function innn_ranking.py
now subsamples the data ifmax_samples
is specified. - The
nn_ranking
metrics now use subsampling in the_fit()
function to improve performance. - Fixed the git hash generation for openproblems functions
- Fixed a warning about
pkg_resources
being deprecated - Removed unnecessary
fetch-depth: 1
from workflow - Fixed potential issue in
_scanvi_scarches
wherelabels_pred
could be overwritten - Fixed potential issue in
_pred_xgb
wherenum_round
wasn't being used correctly - Fixed an issue where baseline methods were not being filtered correctly from the benchmark results.
- Fixed an issue where metrics with all NaN values were not being removed from the benchmark results.
- Fixed an issue where some metrics were not being parsed correctly from the Nextflow output.
- Fixed an issue where the "mean_score" field was not being calculated correctly for each method.
- Fixed an issue where the "code_version" field was not being populated correctly for each method.
- Fixed an issue where the "submission_time" field was not being populated correctly for each method.
- Fixed an issue where the resource usage metrics were not being parsed correctly from the Nextflow output.
- Updated the
run_tests
workflow to skip testing on thetest_website
branch. - Updated the
run_tests
workflow to skip testing on thetest_process
branch. - Updated the
create-pull-request
step to set the author for the pull request. - Updated the
run_tests
workflow to skip testing on pull request reviews. - Updated the `updatewebsite
Note: This changelog was automatically generated from the git log.
- Added the zebrafish_labs dataset to the dimensionality reduction task.
- Added the
diffusion_map
method to the dimensionality reduction task. - Added the
spectral_features
method to the dimensionality reduction task, which uses diffusion maps to create embedding features. - Added the
distance_correlation_spectral
metric to the dimensionality reduction task, which evaluates the similarity of the high-dimensional Laplacian eigenmaps on the full data matrix and the dimensionally-reduced matrix. - Added baseline methods for batch integration: no integration, random integration, random integration by cell type, random integration by batch.
- Added
alra_sqrt_reversenorm
,alra_log_reversenorm
methods for ALRA with reversed normalization order. - Added
celltype_random_embedding_jitter
method to randomize embedding with jitter.
- Improved the
density_preservation
metric calculation. - Updated the
distance_correlation
metric to use the newdiffusion_map
method. - Increased the default number of components used for
distance_correlation_spectral
to 1000. - Made metrics more robust by copying the AnnData object before passing it to the metric function.
- Added
is_baseline
flag toadata.uns
inmethod
decorator. - Added
is_baseline
field toadata.uns
for all methods. - Increased default values for
max_epochs_sp
andmax_epochs_sc
indestvi
method. - Changed default value of
early_stopping_monitor
toelbo_validation
fromreconstruction_loss_train
indestvi
method. - Added
train_size
andvalidation_size
arguments to thesc_model.train
call indestvi
method. - Added
batch_size
andplan_kwargs
arguments to thest_model.train
call indestvi
method. - Refactor ALRA methods for improved clarity and consistency.
- Added tests for new ALRA methods with reversed normalization order.
- Added jitter parameter to
_random_embedding
function. - Updated
celltype_random_embedding
to usejitter=None
in_random_embedding
. - Removed unnecessary parameters from the
sample_dataset
function. - Removed unnecessary checks for PCA and neighbors in the
check_dataset
function. - Updated
pytest.ini
to ignore deprecation warning related topkg_resources
. - Added permission to all workflows to read and write contents
- Added permission to write pull requests to several workflows
- Added permission to write packages to the
run_tests
workflow.
- Fixed a bug in
density_preservation
that caused it to return 0 when there were NaN values in the embedding. - Removed unused
true_features_log_cp10k
andtrue_features_log_cp10k_hvg
methods. - Removed unnecessary imports in metrics.
- Removed unnecessary
neighbors
calls in metrics. - Removed unused
_get_split
function. - Added
embedding_to_graph
andfeature_to_graph
functions for graph-based metrics. - Added
get_split
function for metrics that require splitting data into training and testing sets. - Added
feature_to_embedding
function for embedding-based metrics. - Fixed issue where baseline methods were not properly documented.
- Increased default maximum epochs for spatial models to improve performance.
- Improved training parameters for both spatial and single-cell models to improve stability and performance.
- Updated validation metric used for early stopping in spatial model to improve training quality.
- Updated documentation to clarify that the AnnData object passed to metric functions is a copy.
- Updated the documentation for batch integration tasks to reflect the change in the expected format of the dataset objects.
- Moved baseline methods from individual task modules to a common module.
- Removed redundant baseline methods from individual task modules.
- Increased default values for
max_epochs_sp
andmax_epochs_sc
indestvi
method.
Note: This changelog was automatically generated from the git log.
- Added metadata for all datasets, methods, and metrics.
- Updated nf-openproblems to v1.10.
- Added a new
docker_pull
rule to the Snakemake workflow to pull Docker images. - Added a new
docker
rule to the Snakemake workflow to build Docker images. - Changed the
pytest
command to include coverage for thetest
directory. - Added new environment variables for the TOWER_TEST_ACTION_ID and TOWER_FULL_ACTION_ID to the Snakemake workflow.
- Updated the
scripts/install_renv.R
script to increase the number of retry attempts.
Note: This changelog was automatically generated from the git log.
- Updated
scib
version to1.1.3
indocker/openproblems-r-extras/requirements.txt
anddocker/openproblems-r-pytorch/requirements.txt
.
- Added
pytest-timestamper
to test dependencies for better debugging.
Note: This changelog was automatically generated from the git log.
- Fixed an issue where pymde did not work on sparse data.
Note: This changelog was automatically generated from the git log.
- Added
hvg_unint
andn_genes_pre
to the lung batch.
Note: This changelog was automatically generated from the git log.
- Added a bibtex file
main.bib
for storing all references cited in the repository. - Added a section on adding paper references to
CONTRIBUTING.md
explaining how to add entries tomain.bib
and link to them in markdown documents. - Added new baseline methods for dimensionality reduction: "True Features (logCPM)", "True Features (logCPM, 1kHVG)".
- Added
alra_log
method, which implements ALRA with log normalization. - Added
alra_sqrt
method, which implements ALRA with square root normalization. - Added PyMDE dimensionality reduction methods
- Added citations for Chen et al. (2009) "Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis", Kraemer et al. (2018) "dimRed and coRanking - Unifying Dimensionality Reduction in R", Lee et al. (2009) "Quality assessment of dimensionality reduction: Rank-based criteria", Lueks et al. (2011) "How to Evaluate Dimensionality Reduction? - Improving the Co-ranking Matrix", Szubert et al. (2019) "Structure-preserving visualisation of high dimensional single-cell datasets", and Venna et al. (2006) "Local multidimensional scaling".
- Added
install_renv.R
script to install R packages usingrenv
with retries - Added a new metric to evaluate the conservation of highly variable genes (HVGs) after batch integration.
- Added support for lung data from Vieira Braga et al.
- Added
magic_reverse_norm
andmagic_approx_reverse_norm
methods which reverse the order of normalization and transformation in the MAGIC algorithm. - Added a new workflow to comment on pull request status.
- Updated the
openproblems
repository to cite papers using bibtex references. - Renamed
alra
method toalra_sqrt
. - Updated
spacexr
to latest version. - Added
fc_cutoff
andfc_cutoff_reg
parameters torctd
method to control minimum log-fold-change for genes in the normalization and RCTD steps. - Renamed the "multimodal_data_integration" task to "matching_modalities".
- Bumped version to 0.7.0.
- Added BibTex references to all data loaders in
openproblems/data
. - Added BibTex references to all methods in
openproblems/tasks
. - Added BibTex references to all metrics in
openproblems/tasks
. - Updated
update_website_content.yml
to copymain.bib
to the Open Problems website. - Added a BibTeX Tidy hook to
.pre-commit-config.yaml
. - Updated
scvi-tools
version to~0.19
in bothopenproblems-python-pytorch
andopenproblems-r-pytorch
dockerfiles. - Updated
cell2location
version to47c8d6dc90dd3f1ab639861e8617c6ef0b62bb89
in theopenproblems-python-pytorch
dockerfile. - Updated
bslib
to version 0.4.2. - Updated
htmltools
to version 0.5.4. - Updated the
alra_sqrt
method to use square root normalization. - Updated the
alra_log
method to use log normalization. - Updated the method names to reflect the normalization used.
- Updated dependencies for
gtfparse
andpolars
. - Added PyMDE dependency to requirements.txt
- Updated the API to specify that datasets should provide log CPM-normalized counts in `adata.X
Note: This changelog was automatically generated from the git log.
- Added
cell2location_detection_alpha_1
method, which usesdetection_alpha=1
and a hard-coded reference. - Added a new parameter
hard_coded_reference
tocell2location_detection_alpha_1
method. - Added a new baseline method for dimensionality reduction using high-dimensional Laplacian Eigenmaps.
- Added organism metadata to datasets.
- Added a new image,
openproblems-python-bedtools
, to contain packages required for runningpybedtools
andpyensembl
Python packages. - Added support for TensorFlow 2.9.0.
- Added a new schema for storing results in JSON format.
- Added a new function to parse Nextflow trace files to this JSON schema.
- Added
rmse_spectral
metric, which calculates the root mean squared error (RMSE) between high-dimensional Laplacian eigenmaps on the full (or processed) data matrix and the dimensionally-reduced matrix. - Added new methods to LIANA:
magnitude_max
,magnitude_sum
,specificity_max
, andspecificity_sum
. - Added
aggregate_how
parameter toliana
R function to allow aggregation by "magnitude" or "specificity". - Added
top_prop
parameter toodds_ratio
metric to allow specifying the proportion of interactions to consider for calculating the odds ratio.
- Removed unused
openproblems-python-batch-integration
docker image. - Moved
scanorama
,bbknn
,scVI
,mnnpy
andscib
fromopenproblems-python-batch-integration
toopenproblems-r-pytorch
. - Moved
cell2location
,molecular-cross-validation
,neuralee
,tangram
andphate
fromopenproblems-python-extras
toopenproblems-python-pytorch
. - Moved
pybedtools
,pyensembl
andscalex
fromopenproblems-python-extras
toopenproblems-python-pytorch
. - Moved
dca
andkeras
fromopenproblems-python-tf2.4
toopenproblems-python-tensorflow
. - Added
openproblems-python-bedtools
docker image. - Added
openproblems-python-tensorflow
docker image. - Added
openproblems-python-pytorch
docker image. - Moved
harmony-pytorch
fromopenproblems-r-extras
toopenproblems-r-pytorch
. - Added
openproblems-r-pytorch
docker image. - Updated
anndata2ri
version inopenproblems-r-base
. - Updated
kBET
version inopenproblems-r-extras
. - Updated
scib
version inopenproblems-r-extras
. - Updated
scvi-tools
version inopenproblems-r-pytorch
. - Updated
torch
version inopenproblems-r-pytorch
. - Moved the
codecov
action to run only on success - Updated the workflow to upload coverage reports to GitHub Actions as an artifact
- Renamed the
run_benchmark
job tosetup_benchmark
. - Added a new
run_benchmark
job that runs aftersetup_benchmark
. - Moved the benchmark running logic from the
run_benchmark
job to the newrun_benchmark
job. - Added a
setup-environment
step tosetup_benchmark
job. - Added outputs to the
setup_benchmark
job. - Renamed the
nbt2022-reproducibility
towebsite-experimental
- Updated
numpy
andscipy
dependencies in setup.py. - Updated
scikit-learn
,louvain
,python-igraph
,decorator
andcolorama
dependencies in setup.py. - Improved Docker image caching.
- Removed the
counts
layer from theimmune_cells
,pancreas
datasets, and thebatch_integration_feature
task. - Removed the
counts
layer fromgenerate_synthetic_dataset
functions in spatial decomposition datasets. - Updated the
normalize
functions to not modify the data in place. - Updated the
log_cpm_hvg
function to annotate HVGs instead of subsetting the data. - Updated the
_high_dim
function in thenn_ranking
metric to subset to HVGs. - Updated the
dimensionality_reduction
task README to clarify the role of thehighly_variable
key. - Reduced the random noise added to the one-hot embedding in the
_random_embedding
function from (-0.1, 0.1) to (-0.01, 0.01). - Removed
high_dim_pca
andhigh_dim_spectral
methods. - Updated the
random_features
method to use thecheck_version
function. - Moved raw output files from website to the NBT 2022 reproducibility repository.
- Updated the
process_results.yml
workflow to include the NBT 2022 reproducibility repository. - Updated the
run_tests.yml
workflow to skip tests when pushing to specific branches. - Removed
# ci skip
from commit message in CI workflow. - Removed redundant file deletion from
process_results.yml
workflow. - Added
update_website_content.yml
workflow to update benchmark content on the website repository. - Modified the
process_results.yml
workflow to update website content based on results. - Changed the
update_website_content.yml
workflow to trigger on both themain
andtest_website
branches. - Updated workflow to push changes to the website only if there are changes to the website content.
- Added environment variable to track changes.
- Removed unused git command.
- Decreased number of samples for testing.
- Updated
igraph
to 0.10.* insetup.py
. - Updated
anndata2ri
to 1.1.* inopenproblems-r-base/README.md
. - Updated
kBET
toa10ffea
inopenproblems-r-extras/r_requirements.txt
. - Updated
scib
tof0be826
inopenproblems-r-extras/requirements.txt
. - Updated
harmony-pytorch
to 0.1.* inopenproblems-r-pytorch/requirements.txt
. - Updated
torch
to 1.13.* inopenproblems-r-pytorch/requirements.txt
. - Updated
scanorama
to 1.7.0 inopenproblems-r-pytorch/requirements.txt
. - Updated
scvi-tools
to 0.16.* inopenproblems-r-pytorch/requirements.txt
. - Updated the
regulatory_effect_prediction
task to use
Note: This changelog was automatically generated from the git log.
- Added a new dataset: "Pancreas (inDrop)"
- Added a new function: "pancreas"
- Added a new utility function: "utils.split_data"
- Added
tabula_muris_senis_lung_random
dataset. - Added
celltype_random_embedding
baseline method for batch integration embedding. - Added
celltype_random_graph
baseline method for batch integration graph. - Added a new argument
sctransform_n_cells
to the seuratv3 function to allow users to specify the number of cells used to build the negative binomial regression in the SCTransform function. - Added a new sample dataset that is smaller and more efficient than the previous one.
- Added a "mean score" metric to the results table.
- Added support for loading the sample dataset in
load_sample_data
. - Added support for running benchmarks on pull requests.
- Added a new workflow for creating a test matrix.
- Added a new script to generate a test matrix for the
run_tester
workflow. - Added a new script for cleaning up runner diskspace.
- Added support for uploading docker images to ECR.
- Added
tabula_muris_senis
dataset toopenproblems/tasks/denoising/datasets/__init__.py
. - Updated
styler
to version 1.8.1. - Updated the method for normalizing scores to correctly account for baseline method scores.
- Improved the way NaN and infinite values are handled in the ranking calculation.
- Removed redundant code that was previously used to upload results and markdown artifacts to test.
- Removed the raw output files from the website data directory.
- Updated the list of reviewers for the pull request to include more relevant team members.
- Changed the reference to "Code" to "Library" in the JSON output to better reflect the data presented.
- Added a check to ensure that the task has a minimum number of non-baseline methods before processing results.
- Removed the check to ensure that the task has a minimum number of methods before processing results.
- Removed redundant code that was previously used to handle incomplete tasks.
- Updated the workflow to use a consistent version of Python across all jobs.
- Updated flake8 dependency to
https://github.com/pycqa/flake8
. - Improved random embedding for
celltype_random_embedding
andcelltype_random_graph
. - Removed
pip check
from Dockerfile. - Updated code to use a more consistent random number generator.
- Updated liana code to inverse the distribution of the aggregate rank.
- Improved the logic in
odds_ratio
to ensure that the numerator/denominator is not zero. - Removed unnecessary NXF_DEFAULT_DSL from
run_tester
workflow. - Increased the number of cells used to build the negative binomial regression in the SCTransform function from 3000 to 5000.
- Adjusted the default values for
n_pca
andsctransform_n_cells
in the seuratv3 function for test and non-test cases. - Updated the seuratv3_wrapper.R script to pass the
sctransform_n_cells
argument to the SCTransform function. - Moved the sample dataset from the
multimodal
folder to thesample
folder. - Refactored the sample data generation to be more efficient.
- Modified the
compute_ranking
function to calculate and add the "mean score" to thedataset_results
dictionary. - Updated the
dataset_results_to_json
function to include the "mean score" in the results table. - Updated the pull request template to reflect recent changes and improvements in the workflow.
- Updated the workflow to include a new
test_full_benchmark
branch. - Removed redundant code from the workflow.
Note: This changelog was automatically generated from the git log.
- Added a new metric, AUPRC, for evaluating cell-cell communication predictions.
- Added support for aggregating method scores using "max" and "sum" operations.
- Implemented a new method, true events, which predicts all possible interactions.
- Added a new method, random events, which randomly predicts interactions.
- Implemented LIANA, CellPhoneDB, Connectome, Log2FC, NATMI, and SingleCellSignalR methods with the option to aggregate scores using "max" or "sum."
- Added LIANA, CellPhoneDB, Connectome, Log2FC, NATMI, and SingleCellSignalR methods to the cell-cell communication ligand-target task.
- Added LIANA, CellPhoneDB, Connectome, Log2FC, NATMI, and SingleCellSignalR methods to the cell-cell communication source-target task.
- Fixed a bug where the odds ratio metric was not handling cases where the numerator or denominator was zero.
- Updated the IRkernel package version in the R base docker image to 1.3.1.
- Updated the saezlab/liana package version in the R extras docker image to 0.1.7.
- Updated the boto3 package version in the main docker image to 1.26.*.
- Added a check to the cell-cell communication dataset validation to ensure that there are no duplicate entries in the target data.
- Updated the documentation for the cell-cell communication ligand-target task.
- Updated the documentation for the cell-cell communication source-target task.
Note: This changelog was automatically generated from the git log.
- Fixed an issue where a sparse matrix was not being converted to CSR format.
- Fixed a bug in
docker_run.sh
where pip check was not being executed.
- Updated
pkgload
to version 1.3.1.
Note: This changelog was automatically generated from the git log.
- Converted sparse matrix to csr format.
Note: This changelog was automatically generated from the git log.
- Converted sparse matrices to CSR format.
Note: This changelog was automatically generated from the git log.
Note: This changelog was automatically generated from the git log.
- Fixed a bug where the bioconductor version was incorrect.
- Fixed a bug where the matrix in obs was incorrect.
- Updated the scran package to version 1.24.1.
- Updated the batchelor and scuttle packages.
Note: This changelog was automatically generated from the git log.
Note: This changelog was automatically generated from the git log.
- Updated workflow to run tests against
prod
branch.
Note: This changelog was automatically generated from the git log.
- Skip benchmark if tester fails.
Note: This changelog was automatically generated from the git log.
- Explicitly push prod images on tag
- Added short metric descriptions to README
- Added labels tests
Note: This changelog was automatically generated from the git log.
- Reverted bump of louvain to 0.8, which caused issues.
- Updated torch requirement to 1.13 in the openproblems-r-pytorch docker.
Note: This changelog was automatically generated from the git log.
- Added support for SCALEX version 1.0.2.
- Updated RcppAnnoy to version 0.0.20.
- Updated SageMaker requirement to version 2.116.*.
- Fixed a bug in the
docker_hash
function, which now returns a string instead of an integer. - Fixed a bug in the
scalex
method, which now correctly handles theoutdir
parameter.
Note: This changelog was automatically generated from the git log.
- Update rpy2 requirement from <3.5.5 to <3.5.6
- Update ragg to 1.2.4
- Don't fail job if hash fails
Note: This changelog was automatically generated from the git log.
- Updated scIB to 77ab015.
Note: This changelog was automatically generated from the git log.
- Added a new batch integration subtask for corrected feature matrices.
- Added a new sub-task for batch integration, "batch integration embed", which includes all methods that output a joint embedding of cells across batches.
- Added a new sub-task for batch integration, "batch integration graph", which includes all methods that output a cell-cell similarity graph (e.g., a kNN graph).
Note: This changelog was automatically generated from the git log.
- Fixed an issue where the
::
in branch names would cause problems. - Fixed an issue where the
check_r_dependencies.yml
workflow was not properly handling branch names with::
.
- Updated the
caret
package to version 6.0-93. - Updated the README to include information about the Open Problems team and task leaders.
- Replaced the
NuSVR
method with a faster alternative, improving performance.
- Added a new method for running Seuratv3 from a fork, allowing for more efficient use of resources.
- Added a new requirement to the
r_requirements.txt
file for thebslib
package. - Added a new requirement to the
r_requirements.txt
file for thecaret
package.
- Added a new section to the README to document the process of running Seuratv3 from a fork.
- Updated the README to include a list of all contributors to the Open Problems project.
Note: This changelog was automatically generated from the git log.
- Fix sampling and reindexing
- Fix docker unavailable error to include image name
- Require minimum celltype count for
spatial_decomposition
- Update Rcpp to 1.0.9
- Update to nf-openproblems v1.7
Note: This changelog was automatically generated from the git log.
- Fixed an issue where some cell types were missing from the output.
Note: This changelog was automatically generated from the git log.
- Fixed a bug in the rctd method where cell types with fewer than 25 cells were not being used.
Note: This changelog was automatically generated from the git log.
- Handle missing function error by catching FileNotFoundError and NoSuchFunctionError instead of just RuntimeError.
Note: This changelog was automatically generated from the git log.
- Updated
scipy
requirement from==1.8.*
to>=1.8,<1.10
. - Updated
igraph
to version1.3.4
.
- Changed the mnnpy dependency to use a patch version instead of a specific commit hash.
- Changed
docker_hash
to use the Docker API ifdocker
is not available. - Use
curl
to retrieve the Docker hash ifdocker
fails. - Fixed an issue with using
git+https
formnnpy
.
Note: This changelog was automatically generated from the git log.
- Updated several R package dependencies.
- Updated several Python package dependencies.
- Added several new methods for spatial decomposition: RCTD, DestVI, Stereoscope.
- Added a new dataset for dimensionality reduction: Mouse hematopoietic stem cell differentiation.
- Improved documentation for tasks and datasets.
- Fixed a bug where the
lintr
package was not being installed correctly. - Fixed a bug where the
BRANCH_PARSED
variable was not being properly sanitized in therun_tests.yml
workflow. - Fixed a bug in
_scanvi
and_scvi
functions where themax_epochs
parameter was not being passed to thescanvi
andscvi
functions. - Fixed a bug in
install_renv.R
causing incorrect installation of packages from R repositories. - Fixed an issue where the dependency upgrade script would fail to capture the output of the upgrade process.
- Fixed an issue where the dependency upgrade script would not correctly write updates to the requirements file.
- Fixed an issue where the
git_hash
function was not being called for external modules. - Fixed a bug in
openproblems/tasks/denoising/methods/__init__.py
that prevented DCA from being used. - Fixed a bug in
neuralee_default
where it could fail due to sparseness of data. - Fixed a bug in
scanvi_all_genes
where the code version was not being set correctly. - Fixed a bug in
scanvi_hvg
where the code version was not being set correctly. - Fixed a bug in
scarches_scanvi_all_genes
where the code version was not being set correctly. - Fixed a bug in
scarches_scanvi_hvg
where the code version was not being set correctly.
- Added a new denoising method called "DCA" based on a deep count autoencoder.
- Added
xgboost_log_cpm
andxgboost_scran
methods toopenproblems.tasks.label_projection
. - Added a new command-line interface for testing datasets, methods, and metrics.
- Added a new
install_renv.R
script to simplify the installation of therenv
package. - Added automated CI check to find and suggest available updates to R packages in docker images.
- Added a hash to the docker image that is based on the age of the code.
- Added data_reference to dataset metadata.
- Added
docker_hash
function to retrieve the docker image hash associated with an image. - Added support for retrieving the docker image hash for R functions that have a defined
__r_file__
.
- Updated contributing guide to reflect the
main
branch as the default branch. - Updated issue templates to reflect the
main
branch as the default branch. - Updated pull request template to reflect the
main
branch as the default branch.
Note: This changelog was automatically generated from the git log.
- Added a new docker image
openproblems-r-pytorch
for running Harmony in Python
- Moved
harmony
to Python-basedharmony-pytorch
- Fixed an issue where
adata.var
was not being correctly handled in_utils.py
- Updated the documentation for the
openproblems-r-extras
docker image
Note: This changelog was automatically generated from the git log.
- Added PHATE with sqrt potential
- Fixed path to R_HOME
- Fixed Dockerfile to use R 4.2
- Minor CI fixes
Note: This changelog was automatically generated from the git log.
- Run scran pooling in series, not in parallel.
Note: This changelog was automatically generated from the git log.
- Added
FastMNN
,Harmony
, andLiger
methods for batch integration. - Added
bbknn_full_unscaled
method. - Added Dependabot configuration for pip and GitHub Actions dependencies.
- Updated dependencies:
scib
,bbknn
,scanorama
,annoy
, andmnnpy
. - Improved the performance of several methods by pre-processing the data before running them.
- Fixed bugs in
fastMNN
,harmony
,liger
,scanorama
,scanvi
,scvi
,mnn
, andcombat
that caused incorrect embedding.
Note: This changelog was automatically generated from the git log.
- Added a new file
workflow/generate_website_markdown.py
to generate website markdown files for all tasks and datasets. - Updated Nextflow version to v1.5.
- Updated Nextflow version to v1.6.
- Added code version to the output of each method.
- Updated
nextflow
version tov1.3
. - Updated
nextflow
version tov1.4
. - Updated docker version to 20.10.15.
- Removed Docker setup from CI workflow.
- Updated Python version to 3.8.13.
- Updated dependencies for the Docker images.
- Updated pre-commit hooks to include
requirements-txt-fixer
. - Updated Nextflow workflow to version 1.4.
- Updated the location of method versions in the results directory.
- Updated the Tower action ID.
- Fixed a bug where Docker images were not properly pushed to Docker Hub.
- Updated
requirements.txt
files to fix dependency conflicts. - Removed unnecessary dependencies from CI workflows to reduce disk space usage on GitHub runners.
Note: This changelog was automatically generated from the git log.
- Added new integration methods: BBKNN, Combat, FastMNN feature, FastMNN embed, Harmony, Liger, MNN, Scanorama feature, Scanorama embed, Scanvi, Scvi
- Added new metrics: graph_connectivity, iso_label_f1, nmi
- Added _utils.py with functions: hvg_batch, scale_batch
- Added
run_bbknn
function. - Added a test for the trustworthiness metric, which now passes for sparse matrices.
- Added a test for the density preservation metric, which now passes against densmap for a reasonable degree of similarity.
- Added tests for all methods and metrics.
- Added a new workflow to automatically delete untagged images from the OpenProblems ECR repository.
- Added a new workflow to process results and create a PR to update the OpenProblems benchmark.
- Added support for running tests with the
process
extra insetup.py
. - Added
densmap
dimensionality reduction method. - Added
neuralee
dimensionality reduction method. - Added
alra
denoising method. - Added
scarches_scanvi
label projection method. - Added
bbknn
batch integration graph method. - Added
beta
regulatory effect prediction method. - Added a new
invite-contributors.yml
file to the repository.
- The
test_methods.py
file has been simplified by removing unused arguments. - The
test_metrics.py
file has been simplified by removing unused arguments. - The
test_utils/docker.py
file has been modified to allow specifying the docker image as a decorator argument. - Updated Nextflow version to 22.04.0.
- Modified the processing of Nextflow results to save them in a temporary directory.
- Modified
workflow/parse_nextflow.py
to parse results from Nextflow runs. - Modified
.github/workflows/run_tests.yml
to cancel previous runs when a new commit is pushed.
- Removed
.nextflow
,scratch/
,openproblems/results/
andopenproblems/work/
from.gitignore
. - Updated
CONTRIBUTING.md
- Methods should not edit
adata.obsm["train"]
oradata.obsm["test"]
. - Redirects stdout to stderr when running subcommands to ensure that output is printed correctly.
- Updated CI workflow to skip running tests on push if they failed on the
run_tester
job, unless the branch name starts withtest_benchmark
. - Refactored the Neuralee method to use a separate function for embedding.
- Improved performance by using a default value for
maxit
infine_tune_kwargs
. - Removed unnecessary code for storing raw counts in the
neuralee_default
method.
Note: This changelog was automatically generated from the git log.
- Added CeNGEN, Tabula Muris Senis, and Pancreas datasets to the label_projection task.
- Added scANVI and scArches+scANVI methods to the label_projection task.
- Added majority_vote and random_labels baseline methods to the label_projection task.
- Added new methods: densMAP, NeuralEE, scvis
- Added new metrics: NN Ranking (continuity, co-KNN size, co-KNN AUC, Local continuity meta criterion, Local property metric, Global property metric)
- Added pre-processing function: log_cpm_hvg()
- Added support for custom pre-processing functions
- Added support for variants of methods
- Added a new batch integration task.
- Added a batch integration graph subtask.
- Added a batch integration embedding subtask.
- Added a batch integration corrected feature matrix subtask.
- Added ivis method for dimensionality reduction to openproblems.
- Added self-hosted runner support for
run_benchmark
workflow using Cirun.io - Added a
--test
flag to therun
subcommand, allowing for running a test version of a method. - Added
test_load_dataset
totest/test__load_data.py
to test loading and caching of datasets. - Added
test_method
totest/test_methods.py
to test application of methods. - Added
test_trustworthiness_sparse
totest/test_metrics.py
to test trustworthiness metric on sparse data. - Added
test_density_preservation_matches_densmap
totest/test_metrics.py
to test density preservation metric against densmap. - Updated
test/utils/docker.py
to allow specifying the docker image as the last argument. - Added
--test
flag torun
subcommand to run the test version of a method. - Added Docker image building to
run_tests.yml
. - Added a new workflow to process Nextflow results
- Added a new workflow to run tests and benchmarks
- Added support for running benchmarks from tags
- Added support for running benchmarks from forks
- Added
openproblems-cli
command to run test-hash
Note: This changelog was automatically generated from the git log.
- Added support for balanced SCOT alignment.
- Updated the workflow to store benchmark results in
/tmp
.
- Fixed the parsing and committing of benchmark results on tag.
- Fixed the Github Actions badge link.
- Fixed the coverage badge.
- Fixed the benchmark commit.
- Ignored AWS warning and cleaned up S3 properly.
- Updated the workflow to continue on error for forks.
Note: This changelog was automatically generated from the git log.
- Added trustworthiness metric to the dimensionality reduction task.
- Added density preservation metric.
- Added several metrics based on nearest neighbor ranking: continuity, co-KNN size, co-KNN AUC, local continuity meta criterion, local property, global property.
- Added mouse blood data from Olsson et al. (2016) Nature to the
openproblems
dataset collection. - Added a test mode to the
load_olsson_2016_mouse_blood
function. - Added a dataset function for the
mouse_blood_olssen_labelled
dataset in theopenproblems.tasks.dimensionality_reduction.datasets
module. - Added ALRA denoising method.
- Added support for the Single Cell Optimal Transport (SCOT) method for multimodal data integration.
- SCOT implements Gromov-Wasserstein optimal transport to align single-cell multi-omics data.
- Added four variations of SCOT:
-
- sqrt CPM unbalanced
-
- sqrt CPM balanced
-
- log scran unbalanced
-
- log scran balanced
- Each variation implements different normalization strategies for the input data.
- Added
scot
method toopenproblems.tasks.multimodal_data_integration.methods
. - Added pre-processing to the
dimensionality_reduction
task. - Added pre-processing to all
dimensionality_reduction
methods. - Added Wagner_2018_zebrafish_embryo_CRISPR dataset loader
- Added PR review checklist to the pull request template.
- Added
cmake==3.18.4
to thedocker/openproblems-python-extras/requirements.txt
file. - Added
--version
flag to print the version. - Added
--test-hash
flag to print the current hash. - Added basic help message.
- Added
install_renv.R
script for installing R packages in Docker images. - Added
docker/.version
file to track Docker image version. - Added a new docker image for running GitHub Actions.
- Added a new utils.git module to determine which tasks have changed relative to base/main.
- Added support for running benchmark tests on tags.
- Added a test directory for use in the workflow.
Note: This changelog was automatically generated from the git log.
- Added chromatin potential task
- Added PHATE to the dimensional_reduction task.
- Added support for testing docker builds on a separate branch.
- Added support for building images and pushing them to docker hub.
- Added support for writing methods in R using
scprep
'sRFunction
class. - Added a CLI interface to
openproblems
. - Added
f1_micro
metric. - Added
mlp_log_cpm
andmlp_scran
methods for label projection. - Added
pancreas_batch
andpancreas_random
datasets for label projection. - Added
f1
metric for label projection. - Added metadata to methods and metrics.
- Added
openproblems.tools.decorators
for decorating methods and metrics. - Added
openproblems.tools.normalize
for common normalization functions. - Added methods for
logistic_regression
,mlp
,harmonic_alignment
,mnn
, andprocrustes
. - Added metrics for
accuracy
,f1
,knn_auc
, andmse
. - Added
openproblems.version
to provide package version. - Added
dataset
decorator for registering datasets. - Added
tools.decorators.profile
decorator to measure memory usage and runtime of methods. - Added
tools.normalize
module to provide normalization functions. - Added
tools.decorators.normalizer
decorator to normalize data prior to applying methods. - Added a new "data loader" component that loads data in a way that's formatted correctly for a given task.
- Added CITE-seq Cord Blood Mononuclear Cells dataset.
- Added snakemake support for automatic evaluation.
- Added zebrafish data to label projection task.
- Added a new task, "Link gene expression with chromatin accessibility"
- Added a new dataset, "sciCAR Mouse Kidney with cell clusters"
- Added a new method, "BETA"
- Added a new metric, "Correlation between RNA and ATAC"
- Added a new task, "Dimensional reduction"
- Added human blood dataset from Nestorowa et al. Blood. 2016
- Added 10x PBMC dataset
- Added
load_10x_5k_pbmc
function to load the 10x 5k PBMC dataset.
Note: This changelog was automatically generated from the git log.
- Added MLP method for label projection task.
- Added pancreas data loading to label projection task.
- Updated black.
- Updated test version of pancreas_batch to have test data.
- Added random pancreas train data.
- Fixed zebrafish code duplication.
- Fixed pancreas import location.
- Fixed bug in zebrafish data.
- Fixed bug in pancreas import.
- Removed normalization from loader.
- Removed dummy and cheat metrics/datasets.
- Removed excess covariates from pancreas dataset.
Note: This changelog was automatically generated from the git log.
- Added zebrafish label projection task
- Moved scIB, rpy2, harmonicalignment, and mnnpy to optional dependencies
- Improved n_components fix
- Moved URL into function for neater namespace
- Fixed n_svd for truncatedSVD
- Fixed data loader
- Fixed n_pca problem
- Scaled without mean if sparse
- Scaled data for regression
- Added check to ensure that data has nonzero size
Note: This changelog was automatically generated from the git log.
- Added a results page to the website.
- Added a new zebrafish dataset to the openproblems library.
- Added netlify.toml to deploy website.
- Updated documentation to reflect new features and datasets.
- Bumped version to 0.1.
- Improved the website's home menu link.
- Improved website links.
- Updated website's hero and social links.
- Updated website's task cards.
- Updated the website's demo.
- Improved website's frontmatter.
- Separated frontmatter from content in website's Markdown files.
- Fixed black syntax.
- Excluded website from black.
- Updated website content to display results.
- Updated the Travis CI configuration to exclude website from black.
- Fixed zebrafish data loader.
Note: This changelog was automatically generated from the git log.
- Added harmonic alignment method.
- Added scicar datasets.
- Added logistic regression methods.
- Added ability to normalize obsm.
- Added test suite.
- Added normalization tools.
- Updated documentation to reflect normalization changes.
- Migrated normalizations to openproblems.tools.normalize.
- Updated dataset specification to require normalization in methods.
- Removed zebrafish dataset.
- Moved dataset test spec.
- Removed "mode2_raw" and "raw" from datasets.
- Added test dataset spec.
- Consolidated scicar datasets.
- Migrated references to github repo.
- Improved sparse array equality test.
- Improved sparse inequality check.
- Increased test data size.
- Normalized mode2.
- Fixed decorator.
- Used uns.
- Used functools.wraps.
- Updated name of log_scran_pooling function.
- Fixed storing normalization results.
- Fixed zebrafish load caching.
- Fixed zebrafish test.
- Added normalization functions.
- Updated logistic regression function to work with anndata properly.
- Fixed cheat method.
- Fixed git upload.
- Fixed Travis CI.
- Fixed harmonic alignment import.
- Increased test coverage.
- Bugfix harmonic_alignment, closes #4.
- Bugfix harmonic alignment import.
- Normalized data inside methods, closes #19.
- Fix storing normalization results.
- Fixed zebrafish test.
- Fix zebrafish load caching.
- Fix decorator.
- Fix cheat method.
- Don't check for raw data -- we are no longer normalizing.
Note: This changelog was automatically generated from the git log.
- Added dummy dataset to
openproblems/data
- Added
load_dummy
function toopenproblems/data
- Added
loader
decorator toopenproblems/data
- Added loading functions for sciCAR datasets to
openproblems/data/scicar
- Added
scicar_cell_lines
dataset toopenproblems/tasks/multimodal_data_integration/datasets
- Added
scicar_mouse_kidney
dataset toopenproblems/tasks/multimodal_data_integration/datasets
- Added
dummy
dataset toopenproblems/tasks/label_projection/datasets
- Changed data structure for multimodal data integration tasks in
openproblems/tasks/multimodal_data_integration
- Bumped version to 0.0.2 in
openproblems/version.py
- Modified the way to run
evaluate.sh
in.travis.yml
- Added
chmod +x evaluate.sh
to.travis.yml
- Added documentation for adding a dataset to a task in
README.md
- Added documentation for dataset loading in
README.md
- Added documentation for adding a new dataset in
README.md
- Updated documentation in
openproblems/tasks/multimodal_data_integration/README.md
- Updated documentation in
openproblems/version.py
First release of OpenProblems.
methods, 1 metric)
- Multimodal data integration (2 datasets, 2 methods, 2 metrics)