Skip to content

Commit

Permalink
feat: 补充岭回归和lasso回归的参数选择方法
Browse files Browse the repository at this point in the history
  • Loading branch information
tianxuzhang committed Dec 3, 2023
1 parent 33e2370 commit 12fd94c
Showing 1 changed file with 93 additions and 1 deletion.
94 changes: 93 additions & 1 deletion docs/回归/正则化线性回归.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,53 @@
"可以看到,Lasso回归最终会趋于一条直线,原因就在于好多θ值已经均为0。LASSO回归使用L1范数作为惩罚项,具有稀疏性,即可以将某些系数压缩到零。这使得LASSO回归在**特征选择和变量筛选**方面非常有用。"
]
},
{
"cell_type": "markdown",
"id": "278e4b5b",
"metadata": {},
"source": [
"### lasso回归参数选择\n",
"\n",
"在Scikit-learn中,你可以使用LassoCV和LassoLarsCV来通过交叉验证来设置Lasso回归的alpha参数。\n",
"\n",
"* LassoCV:基于交叉验证的Lasso回归方法。它可以自动选择最佳的 $\\alpha$ 值,并进行模型拟合。LassoCV利用交叉验证来评估不同 $\\alpha$ 值下的性能,并选择使得模型性能达到最优的alpha值。对于具有许多线性回归的高维数据集,常常使用LassoCV。\n",
"\n",
"* LassoLarsCV:基于交叉验证的Lasso回归方法,并采用了最小角回归算法(LARS)来寻找最佳的alpha参数值。在样本数量比特征数量少得多的情况下,LassoLarsCV通常比LassoCV更快速且具有更好的性能。\n",
"\n",
"下面是使用LassoCV和LassoLarsCV的示例代码:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b2dfc4f6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.17465020215558855\n",
"0.1817744842195405\n"
]
}
],
"source": [
"from sklearn.linear_model import LassoCV, LassoLarsCV\n",
"\n",
"# 使用LassoCV进行Lasso回归及alpha参数选择\n",
"lasso_cv = LassoCV(cv=5)\n",
"lasso_cv.fit(X, y)\n",
"best_alpha_lasso = lasso_cv.alpha_\n",
"print(best_alpha_lasso)\n",
"\n",
"# 使用LassoLarsCV进行Lasso回归及alpha参数选择\n",
"lasso_lars_cv = LassoLarsCV(cv=5)\n",
"lasso_lars_cv.fit(X, y)\n",
"best_alpha_lassolars = lasso_lars_cv.alpha_\n",
"print(best_alpha_lassolars)\n"
]
},
{
"cell_type": "markdown",
"id": "12cc34bc",
Expand All @@ -115,7 +162,9 @@
"\n",
"$$\\underset{w}{min} || Xw - y||_2^2 + \\alpha ||w||_2^2$$\n",
"\n",
"其中,$X$ 是输入数据矩阵,$y$ 是对应的观测值向量,$w$ 是待估计的参数向量,$\\alpha$ 是一个控制正则化强度的超参数。\n",
"其中,$X$ 是输入数据矩阵,$y$ 是对应的观测值向量,$w$ 是待估计的参数向量,$\\alpha$ 是一个控制正则化强度的超参数, $||w||_2 ^ 2$ 是L2范数的平方,也被称为岭项。\n",
"\n",
"L2范数的正则化效果,在岭回归中表现出一个典型的\"\"形状,即在目标函数中形成一个凸形的曲线。这个曲线在系数空间中的形状类似于山脊或岭,因此得名为岭回归。\n",
"\n",
"通过引入 $\\alpha$ 的值,岭回归可以对参数向量进行限制,使其不会过大,从而减小过拟合的风险。较大的 $\\alpha$ 值会增加正则化的程度,从而更加平衡模型的复杂度和拟合残差之间的权衡。"
]
Expand Down Expand Up @@ -179,6 +228,49 @@
"source": [
"从图中可以看出,岭回归的正则化参数变化对系数的影响是比较平滑的。另外不会变成0,所以计算量是比较大的。"
]
},
{
"cell_type": "markdown",
"id": "ea28b24e",
"metadata": {},
"source": [
"\n",
"\n",
"### 岭回归参数选择\n",
"在Ridge回归中,可以使用广义交叉验证(Generalized Cross-Validation,GCV)来选择最优的正则化参数。Sklearn库中的RidgeCV类提供了自动进行岭回归和正则化参数选择的功能。\n",
"\n",
"RidgeCV类使用方法类似于GridSearchCV,可以指定一组候选的 $\\alpha$ 参数值。通过对这些参数值进行交叉验证,RidgeCV会选择具有最佳性能的 $\\alpha$ 值,并将其作为Ridge模型的正则化参数。\n",
"\n",
"以下是一个示例:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f6eb6765",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.1\n"
]
}
],
"source": [
"from sklearn import linear_model\n",
"\n",
"# 定义一组候选的alpha参数值\n",
"alphas = [0.1, 1.0, 10.0]\n",
"\n",
"# 创建RidgeCV对象并进行拟合\n",
"reg = linear_model.RidgeCV(alphas=alphas)\n",
"reg.fit([[0, 0], [0, 0], [1, 1]], [0, 0.1, 1])\n",
"\n",
"# 输出选择的最优alpha值\n",
"print(reg.alpha_)"
]
}
],
"metadata": {
Expand Down

0 comments on commit 12fd94c

Please sign in to comment.