You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rows in the training set(s) could be composed entirely of missing values.
utilities.split implements a check to ensure that all rows selected for the training set have above some threshold of present values. However, using this function often excludes rows, thereby reducing the number of rows in the training, validation and test sets.
For the last step of the ms_imputer workflow, we want to impute missing values in the original (i.e. non-partitioned) matrix. To do this, we need to have trained on a matrix of equivalent size.
Right now, the present values check in utilities.split is turned off. This ensures that the training, validation and test matrices are the same size as the initial matrix. So the workflow completes successfully, however, there could be weirdness due to the model attempting to learn from rows that are completely np.nans.
There's a valid question of how much this actually matters. Probably the smart thing to do is to evaluate different NMF models trained with slightly different present value threshholds and see how much the reconstruction error changes.
The text was updated successfully, but these errors were encountered:
Rows in the training set(s) could be composed entirely of missing values.
utilities.split
implements a check to ensure that all rows selected for the training set have above some threshold of present values. However, using this function often excludes rows, thereby reducing the number of rows in the training, validation and test sets.For the last step of the
ms_imputer
workflow, we want to impute missing values in the original (i.e. non-partitioned) matrix. To do this, we need to have trained on a matrix of equivalent size.Right now, the present values check in
utilities.split
is turned off. This ensures that the training, validation and test matrices are the same size as the initial matrix. So the workflow completes successfully, however, there could be weirdness due to the model attempting to learn from rows that are completelynp.nan
s.There's a valid question of how much this actually matters. Probably the smart thing to do is to evaluate different NMF models trained with slightly different present value threshholds and see how much the reconstruction error changes.
The text was updated successfully, but these errors were encountered: