Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproduce paper results #80

Open
SZ-qing opened this issue May 6, 2023 · 5 comments
Open

How to reproduce paper results #80

SZ-qing opened this issue May 6, 2023 · 5 comments

Comments

@SZ-qing
Copy link

SZ-qing commented May 6, 2023

Hi,I am reproducing the results of your article, but I have encountered some environmental problems.

As described in https://github.com/theislab/scgen-reproducibility, I recreated a virtual environment called scgen-reproduce with packages tensorflow, scanpy, numpy, matplotlib, scipy, and wget. Then follow the prompts to cd to the code path and run the following command line:
python ModelTrainer.py all [all datas have been downloaded ] ,
but an error is encountered:

_Traceback (most recent call last):
File "", line 1, in
File "/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code/scgen/init.py", line 19, in
version = get_version(file)
File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/get_version/init.py", line 280, in get_version
raise NoVersionFound(Source.all, msg)
get_version.NoVersionFound: No version found:

  • Directory name: name of directory “/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code” does not contain a valid version.
  • VCS: could not find VCS from directory “/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code”.
  • Package metadata: could not find distribution “scgen”._

What I understand is that when reproducing the results, you don't have to install scgen yourself in the python environment, right?

@SZ-qing
Copy link
Author

SZ-qing commented May 6, 2023

Or can you tell me the version of python, tensorflow you need?

@SZ-qing
Copy link
Author

SZ-qing commented May 6, 2023

What was the anndata version of the h5ad file that you created? Or the scanpy version, because I can't read the h5ad file you provided right now.
my packages version:

Package Version


absl-py 1.4.0
adjustText 0.8
aiohttp 3.8.4
aiosignal 1.3.1
anndata 0.8.0
astunparse 1.6.3
async-timeout 4.0.2
asynctest 0.13.0
cached-property 1.5.2
cachetools 5.3.0
certifi 2022.12.7
charset-normalizer 3.1.0
chex 0.1.5
cycler 0.11.0
dm-tree 0.1.8
docrep 0.3.2
dunamai 1.16.0
et-xmlfile 1.1.0
etils 0.9.0
exceptiongroup 1.1.1
flatbuffers 23.3.3
flax 0.6.4
fonttools 4.38.0
frozenlist 1.3.3
fsspec 2023.1.0
gast 0.4.0
get_version 3.5.4
google-auth 2.17.3
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.54.0
h5py 3.8.0
idna 3.4
importlib-metadata 6.6.0
iniconfig 2.0.0
jax 0.3.25
jaxlib 0.3.25
joblib 1.2.0
keras 2.11.0
kiwisolver 1.4.4
libclang 16.0.0
llvmlite 0.39.1
Markdown 3.3.4
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.5.3
mdurl 0.1.2
ml-collections 0.1.1
msgpack 1.0.5
mudata 0.2.1
multidict 6.0.4
multipledispatch 0.6.0
natsort 8.3.1
networkx 2.6.3
numba 0.56.4
numpy 1.21.6
numpyro 0.10.1
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
oauthlib 3.2.2
openpyxl 3.1.2
opt-einsum 3.3.0
optax 0.1.4
orbax 0.1.0
packaging 23.1
pandas 1.3.5
patsy 0.5.3
Pillow 9.5.0
pip 23.1.2
pluggy 1.0.0
protobuf 3.19.6
pyasn1 0.5.0
pyasn1-modules 0.3.0
pyDeprecate 0.3.2
pynndescent 0.5.7
pyparsing 3.0.9
pyro-api 0.1.2
pyro-ppl 1.8.4
pytest 7.2.2
python-dateutil 2.8.2
pytz 2023.3
requests 2.30.0
requests-oauthlib 1.3.1
rich 13.3.3
rsa 4.9
scanpy 1.9.3
scikit-learn 1.0.2
scipy 1.7.3
seaborn 0.12.2
session-info 1.0.0
setuptools 67.7.2
six 1.16.0
statsmodels 0.13.5
stdlib-list 0.8.0
tensorboard 2.11.2
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.11.0
tensorflow-estimator 2.11.0
tensorflow-io-gcs-filesystem 0.32.0
tensorstore 0.1.28
termcolor 2.3.0
threadpoolctl 3.1.0
toolz 0.12.0
torch 1.13.1+cpu
torchaudio 0.13.1+cpu
torchmetrics 0.11.4
torchvision 0.10.0a0+e04d001.dtk2210
tqdm 4.65.0
typing_extensions 4.5.0
umap-learn 0.5.3
urllib3 2.0.2
Werkzeug 2.2.3
wget 3.2
wheel 0.40.0
wrapt 1.15.0
yarl 1.8.2
zipp 3.15.0

The error message is as follows:
Traceback (most recent call last):
File "./vec_arith_pca.py", line 144, in
train("pbmc", "CD4T", "unbiased")
File "./vec_arith_pca.py", line 119, in train
ctrl_CD4T_PCA = pca.transform(adata_list[1].X)
File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/sklearn/decomposition/_base.py", line 117, in transform
X = self._validate_data(X, dtype=[np.float64, np.float32], reset=False)
File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/sklearn/base.py", line 566, in _validate_data
X = check_array(X, **check_params)
File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/sklearn/utils/validation.py", line 726, in check_array
accept_large_sparse=accept_large_sparse,
File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/sklearn/utils/validation.py", line 441, in _ensure_sparse_format
"A sparse matrix was passed, but dense "
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

@yikang613
Copy link

I met the same issue. What I did is just move the ModelTrainer.py to the main directory instead of the code directory. I think it could solve this problem.

@twytock
Copy link

twytock commented Nov 16, 2023

Hi @SZ-qing,

Based on the error message, it looks like the function pca.transform is expecting a dense array, but the data array of the Anndata object is sparse by default. It looks like changing line 119 of vec_arith_pca.py from
ctrl_CD4T_PCA = pca.transform(adata_list[1].X)
to
ctrl_CD4T_PCA = pca.transform(adata_list[1].X.toarray())
will fix this error.

@yikang613 : Did your solution work?

@printfisnotgood
Copy link

Hi,I am reproducing the results of your article, but I have encountered some environmental problems.

As described in https://github.com/theislab/scgen-reproducibility, I recreated a virtual environment called scgen-reproduce with packages tensorflow, scanpy, numpy, matplotlib, scipy, and wget. Then follow the prompts to cd to the code path and run the following command line: python ModelTrainer.py all [all datas have been downloaded ] , but an error is encountered:

_Traceback (most recent call last): File "", line 1, in File "/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code/scgen/init.py", line 19, in version = get_version(file) File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/get_version/init.py", line 280, in get_version raise NoVersionFound(Source.all, msg) get_version.NoVersionFound: No version found:

  • Directory name: name of directory “/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code” does not contain a valid version.
  • VCS: could not find VCS from directory “/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code”.
  • Package metadata: could not find distribution “scgen”._

What I understand is that when reproducing the results, you don't have to install scgen yourself in the python environment, right?

I think you could try this command
pip install get_version==2.2
to get an old version of this module since the literature was published about four years ago.
I ran into this problem too and this solution worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants