Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse matrices and parallel fit #32

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mmakowski
Copy link

@mmakowski mmakowski commented Jan 29, 2018

I have made a few changes for my own use that I think others might benefit from, hence this pull request. The new features are:

  • support for sparse matrices, which are common in NLP tasks; previously the transformation would just fail if supplied with a sparse matrix, now it should process it
  • removal of features that ended up with a single bucket only; this again speeds up processing and reduces memory pressure in very high-dimensional problems
  • parallel fit, with the customary n_jobs parameter controlling the degree of parallelism. The transform has not been parallelised yet, but since it is orders of magnitude faster that is not very important in my opinion.

All of those only required changes in the Python wrapper, not in the native part. They should be fully backwards-compatible.

If you would prefer only some of those features but not the others then I will be happy to split this PR up. I also appreciate that they might not fit with your concept of how the library should evolve; if so, no problem at all, I will just use my fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant