Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can physbo handle sparse features? #29

Open
UnixJunkie opened this issue Jun 7, 2021 · 7 comments
Open

Can physbo handle sparse features? #29

UnixJunkie opened this issue Jun 7, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@UnixJunkie
Copy link

from scipy import sparse
X = sparse.csr_matrix(...)
@UnixJunkie
Copy link
Author

That would be cool.
Some molecular representations are very sparse.

@yomichi
Copy link
Contributor

yomichi commented Jun 11, 2021

The present version of PHYSBO cannot directly deal with sparse matrices.

If zero in X means an exact zero, please convert to a dense matrix via X.todense().
When it takes too long time, please let us know again.

If zero in X means a missing value, replace missing numbers with some (dummy) numbers such as a mean value over samples.

@UnixJunkie
Copy link
Author

You should consider it: not all molecular representations are dense.
I will try the todense() method, but this will use a lot of memory.

@yomichi
Copy link
Contributor

yomichi commented Jun 11, 2021

but this will use a lot of memory.

Agree. This is just a workaround.
Could you show the typical number of samples, features, and nonzero features?

@yomichi yomichi added the enhancement New feature or request label Jun 11, 2021
@UnixJunkie
Copy link
Author

UnixJunkie commented Jun 11, 2021 via email

@UnixJunkie
Copy link
Author

Samples are in the thousands usually (from hundreds to about 10k per dataset).
The number of features is ~17000.
Non zero features might be in the hundreds for one molecule.

@yomichi
Copy link
Contributor

yomichi commented Jun 11, 2021

OK, we've understood that the support for sparse matrices is highly demanded in molecular science.
We want to implement it in PHYSBO in the future, but I'm sorry that I cannot promise when we will do it.
Thank you for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants