-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selecting the elastic net mixing parameter #56
Comments
Also, I'd rather spend more time optimizing |
Do not optimize `l1_ratio`. Instead use the default of 0.15. Search a denser grid for `alpha`. See cognoma#56
* Begin constructing a MVP machine learner * Export JSON API input for Hippo pathway * Ignore __pycache__ * classify() functioning with mock input * Save output corresponding to hippo-input.json From the `cognoml` directory, ran: ``` python analysis.py > ../data/api/hippo-output.json ``` * Export model information to JSON output Also filter zero-variance features. * Return unselected observations Unselected observations (samples in the dataset that were not selected by the user) are now returned. These observations receive predictions but are missing (-1 encoded) for fields such as `testing` and `status`. Sorted model parameters by key. * Save grid_search performance metrics * Move classifier and pipeline to it's own module * Add setup.py to make module installable * Review comments: spacing and results doc * Check whether pipeline has function before calling Meant to address https://git.io/vPvtI * Acquire data from figshare * Update for sklearn 0.18.0, Fix pipeline Fix pipeline according to: scikit-learn/scikit-learn#7536 (comment) Extract selected feature names according to: scikit-learn/scikit-learn#7536 (comment) * Semantic improvements of get_feature_df * Update API JSON files * Mention hippo-output-schema.json in docstring * Address @gwaygenomics review comments Does not address "Lasso or Ridge only?" * Grid search: optimize alpha not l1_ratio Do not optimize `l1_ratio`. Instead use the default of 0.15. Search a denser grid for `alpha`. See #56
Agree with @dhimmel about ridge/lasso trade-offs. We could ask the user how much they value sparsity vs performance if we can figure out a way that's not too confusing. |
Agreed! |
* Begin constructing a MVP machine learner * Export JSON API input for Hippo pathway * Ignore __pycache__ * classify() functioning with mock input * Save output corresponding to hippo-input.json From the `cognoml` directory, ran: ``` python analysis.py > ../data/api/hippo-output.json ``` * Export model information to JSON output Also filter zero-variance features. * Return unselected observations Unselected observations (samples in the dataset that were not selected by the user) are now returned. These observations receive predictions but are missing (-1 encoded) for fields such as `testing` and `status`. Sorted model parameters by key. * Save grid_search performance metrics * Move classifier and pipeline to it's own module * Add setup.py to make module installable * Review comments: spacing and results doc * Check whether pipeline has function before calling Meant to address https://git.io/vPvtI * Acquire data from figshare * Update for sklearn 0.18.0, Fix pipeline Fix pipeline according to: scikit-learn/scikit-learn#7536 (comment) Extract selected feature names according to: scikit-learn/scikit-learn#7536 (comment) * Semantic improvements of get_feature_df * Update API JSON files * Mention hippo-output-schema.json in docstring * Address @gwaygenomics review comments Does not address "Lasso or Ridge only?" * Grid search: optimize alpha not l1_ratio Do not optimize `l1_ratio`. Instead use the default of 0.15. Search a denser grid for `alpha`. See cognoma/machine-learning#56
If we are performing PCA on the expression matrix to create our features, then I am not sure how important sparsity is going to be in the end classifier. This is probably even more true when the number of components we chose is <= 100. |
Closed by #114 |
Thus far we've been using grid search (cross validation) to select the optimal elastic net mixing parameter. For
SGDClassifier
, this mixing parameter is set usingl1_ratio
, wherel1_ratio = 0
performs ridge regularization andl1_ratio = 1
performs lasso regularization.Here's what I'm thinking:
Grid search is not the appropriate way to select the mixing parameter. Ridge (with the optimal regularization penalty,
alpha
) will always perform better than the optimal Lasso. The reason is that there's a cost for the convenience of sparsity. Lasso makes difficult decisions about which features to select. Therefore the sparsity can aid in model interpretation, but weakens performance because identifying only the predictive features is an impossible task.For example, see our grid from this notebook (note this used MAD feature selection to select only 500 features which likely accentuates the performance deficit as
l1_ratio
increases).So my sense is that
l1_ratio
should be chosen based on what properties we want the model to have, not based on maximum CV performance. If we only care about performance, we might as well save ourselves the computation time and always go with ridge or the defaultl1_ratio = 0.15
.l1_ratio = 0.15
can still filter ~50% of features with little performance degradation. But if you want real sparsity (lasso) there's going to be a performance cost -- and the user not grid search will have to make this decision.The text was updated successfully, but these errors were encountered: