-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Longitudinal preprocessor parallelization #246
Conversation
2a5945a
to
ab57615
Compare
94e0ff2
to
3d2275a
Compare
3d2275a
to
04f35d6
Compare
@@ -355,7 +355,8 @@ def fit_kfold_cv(self, features, labels, censoring, C_tv_range: tuple = (), | |||
features, labels, censoring) | |||
# split the data with stratified KFold | |||
kf = StratifiedKFold(n_folds, shuffle, self.random_state) | |||
labels_interval = np.nonzero(p_labels)[1] | |||
# labels_interval = np.nonzero(p_labels)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to keep this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments about the code writing (I trust you on its validity :) )
Also, did you try to make your old tests faster?
@@ -68,14 +73,15 @@ void LongitudinalFeaturesLagger::sparse_lag_preprocessor(ArrayULong &row, | |||
ArrayULong &out_col, | |||
ArrayDouble &out_data, | |||
ulong censoring) const { | |||
// TODO: add checks here ? Or do them in Python ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks must be done in C++ to avoid segfault possibilities !
@@ -14,9 +15,14 @@ class LongitudinalPreprocessor(ABC, Base): | |||
set to the number of cores. | |||
""" | |||
|
|||
def __init__(self, n_jobs=-1): | |||
_attrinfos = {'n_jobs': {'writable': True}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This _attrinfos
is not necessary
caution. | ||
""" | ||
global _cpp_preprocessor | ||
_cpp_preprocessor = _LongitudinalFeaturesLagger(n_intervals, n_lags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a clever way (using __call__
in a class) to avoid global variable, did you give it a try?
has this been obsoleted by #373 ? |
Yep
Le ven. 12 juil. 2019 à 22:46, ♦♣♠♥ <[email protected]> a écrit :
… has this been obsoleted by #373
<#373> ?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#246?email_source=notifications&email_token=ABIEHAWIQCH76PKR6PQTAX3P7DNR5A5CNFSM4FCGDNJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ2WSDA#issuecomment-511011084>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABIEHAWPMZ2NPSYZYDV4M7LP7DNR5ANCNFSM4FCGDNJA>
.
|
Make the longitudinal preprocessors serializable and make them parallel using python.multiprocessing.
To avoid sending the whole instance of each class to spawned or forked processes, I used an initializer in the process Pool to create a cpp object instance per process. I also used higher-order functions for the same purpose.
The drawback of doing so is having the pool initializers (
_inject_cpp_object
methods) usingglobal
variables to store the cpp objects. This should not have dire consequences in the multiprocessing context, as each process possesses its own namespace. However, such methods could cause some trouble if called outside of this context (rogue/monkey user).