Selecting the elastic net mixing parameter #56

dhimmel · 2016-10-11T14:11:23Z

Thus far we've been using grid search (cross validation) to select the optimal elastic net mixing parameter. For SGDClassifier, this mixing parameter is set using l1_ratio, where l1_ratio = 0 performs ridge regularization and l1_ratio = 1 performs lasso regularization.

Here's what I'm thinking:

Grid search is not the appropriate way to select the mixing parameter. Ridge (with the optimal regularization penalty, alpha) will always perform better than the optimal Lasso. The reason is that there's a cost for the convenience of sparsity. Lasso makes difficult decisions about which features to select. Therefore the sparsity can aid in model interpretation, but weakens performance because identifying only the predictive features is an impossible task.

For example, see our grid from this notebook (note this used MAD feature selection to select only 500 features which likely accentuates the performance deficit as l1_ratio increases).

So my sense is that l1_ratio should be chosen based on what properties we want the model to have, not based on maximum CV performance. If we only care about performance, we might as well save ourselves the computation time and always go with ridge or the default l1_ratio = 0.15. l1_ratio = 0.15 can still filter ~50% of features with little performance degradation. But if you want real sparsity (lasso) there's going to be a performance cost -- and the user not grid search will have to make this decision.

The text was updated successfully, but these errors were encountered:

dhimmel · 2016-10-11T14:16:37Z

Also, I'd rather spend more time optimizing alpha (regularization strength). glmnet in R defaults to trying a sequence of 100 different regularization strengths.

Do not optimize `l1_ratio`. Instead use the default of 0.15. Search a denser grid for `alpha`. See cognoma#56

* Begin constructing a MVP machine learner * Export JSON API input for Hippo pathway * Ignore __pycache__ * classify() functioning with mock input * Save output corresponding to hippo-input.json From the `cognoml` directory, ran: ``` python analysis.py > ../data/api/hippo-output.json ``` * Export model information to JSON output Also filter zero-variance features. * Return unselected observations Unselected observations (samples in the dataset that were not selected by the user) are now returned. These observations receive predictions but are missing (-1 encoded) for fields such as `testing` and `status`. Sorted model parameters by key. * Save grid_search performance metrics * Move classifier and pipeline to it's own module * Add setup.py to make module installable * Review comments: spacing and results doc * Check whether pipeline has function before calling Meant to address https://git.io/vPvtI * Acquire data from figshare * Update for sklearn 0.18.0, Fix pipeline Fix pipeline according to: scikit-learn/scikit-learn#7536 (comment) Extract selected feature names according to: scikit-learn/scikit-learn#7536 (comment) * Semantic improvements of get_feature_df * Update API JSON files * Mention hippo-output-schema.json in docstring * Address @gwaygenomics review comments Does not address "Lasso or Ridge only?" * Grid search: optimize alpha not l1_ratio Do not optimize `l1_ratio`. Instead use the default of 0.15. Search a denser grid for `alpha`. See #56

cgreene · 2016-10-11T16:47:34Z

Agree with @dhimmel about ridge/lasso trade-offs. We could ask the user how much they value sparsity vs performance if we can figure out a way that's not too confusing.

gwaybio · 2016-10-17T18:58:30Z

So my sense is that l1_ratio should be chosen based on what properties we want the model to have, not based on maximum CV performance.

Agreed!

* Begin constructing a MVP machine learner * Export JSON API input for Hippo pathway * Ignore __pycache__ * classify() functioning with mock input * Save output corresponding to hippo-input.json From the `cognoml` directory, ran: ``` python analysis.py > ../data/api/hippo-output.json ``` * Export model information to JSON output Also filter zero-variance features. * Return unselected observations Unselected observations (samples in the dataset that were not selected by the user) are now returned. These observations receive predictions but are missing (-1 encoded) for fields such as `testing` and `status`. Sorted model parameters by key. * Save grid_search performance metrics * Move classifier and pipeline to it's own module * Add setup.py to make module installable * Review comments: spacing and results doc * Check whether pipeline has function before calling Meant to address https://git.io/vPvtI * Acquire data from figshare * Update for sklearn 0.18.0, Fix pipeline Fix pipeline according to: scikit-learn/scikit-learn#7536 (comment) Extract selected feature names according to: scikit-learn/scikit-learn#7536 (comment) * Semantic improvements of get_feature_df * Update API JSON files * Mention hippo-output-schema.json in docstring * Address @gwaygenomics review comments Does not address "Lasso or Ridge only?" * Grid search: optimize alpha not l1_ratio Do not optimize `l1_ratio`. Instead use the default of 0.15. Search a denser grid for `alpha`. See cognoma/machine-learning#56

patrick-miller · 2017-06-16T03:23:33Z

If we are performing PCA on the expression matrix to create our features, then I am not sure how important sparsity is going to be in the end classifier. This is probably even more true when the number of components we chose is <= 100.

rdvelazquez · 2017-09-26T22:14:02Z

Closed by #114

dhimmel added a commit to dhimmel/machine-learning that referenced this issue Oct 11, 2016

Grid search: optimize alpha not l1_ratio

10308e0

Do not optimize `l1_ratio`. Instead use the default of 0.15. Search a denser grid for `alpha`. See cognoma#56

dhimmel mentioned this issue Oct 11, 2016

Create the cognoml package to implement an MVP API #51

Merged

dhimmel mentioned this issue Nov 4, 2016

Marginal gain of gene expression data over covariates #67

Merged

dhimmel mentioned this issue Dec 20, 2016

Improved the performance of the SGD classifier on sparse mutations by reducing the noise #71

Merged

dhimmel mentioned this issue Jun 16, 2017

Add PCA on expressions only to the CV pipeline #100

Merged

This was referenced Jul 9, 2017

Add Gene Expression Coefficients for Individual Genes #105

Merged

Selecting the number of components returned by PCA #106

Closed

rdvelazquez mentioned this issue Sep 15, 2017

[WIP] Number of PCA Components to Keep #113

Merged

rdvelazquez closed this as completed Sep 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selecting the elastic net mixing parameter #56

Selecting the elastic net mixing parameter #56

dhimmel commented Oct 11, 2016

dhimmel commented Oct 11, 2016

cgreene commented Oct 11, 2016

gwaybio commented Oct 17, 2016

patrick-miller commented Jun 16, 2017

rdvelazquez commented Sep 26, 2017

Selecting the elastic net mixing parameter #56

Selecting the elastic net mixing parameter #56

Comments

dhimmel commented Oct 11, 2016

dhimmel commented Oct 11, 2016

cgreene commented Oct 11, 2016

gwaybio commented Oct 17, 2016

patrick-miller commented Jun 16, 2017

rdvelazquez commented Sep 26, 2017