Skip to content

Commit

Permalink
feat: omp算法
Browse files Browse the repository at this point in the history
  • Loading branch information
tianxuzhang committed Dec 6, 2023
1 parent 46e9ed2 commit db84e2d
Show file tree
Hide file tree
Showing 2 changed files with 224 additions and 185 deletions.
185 changes: 0 additions & 185 deletions docs/回归/正则化线性回归.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -484,191 +484,6 @@
"print(\"模型系数:\")\n",
"print(multi_task_elastic_net.coef_)"
]
},
{
"cell_type": "markdown",
"id": "42f8c990",
"metadata": {},
"source": [
"## 最小角回归\n",
"最小角回归(Least Angle Regression,简称LARS)是一种用于线性回归和特征选择的迭代算法。它通过一系列步骤逐渐构建回归模型,并根据变量与目标变量之间的相关性来选择特征。\n",
"\n",
"LARS算法的主要思想是**每次选择与目标变量具有最大相关性的特征**,并沿着该方向移动。在每个步骤中,LARS将当前的解投影到残差向量上,然后确定一个新变量进入解集,并决定如何移动。\n",
"\n",
"> 残差:实际观测值与模型预测值之间的差异 $y - ŷ$\n",
"\n",
"算法步骤:\n",
"* 初始化:设置所有系数$β$为$0$,残差$r$为$y$,其中$y$是目标变量。\n",
"\n",
"* 在每一步中重复以下步骤:\n",
" * a. 计算预测变量$x$与残差$r$的相关性,选择与残差最相关的变量作为当前活动变量。\n",
" * b. 沿着与当前活动变量的相关方向移动,更新系数$β$。\n",
" * c. 更新残差$r$。\n",
"\n",
"* 重复步骤2直到达到所需的特征数量或满足其他停止准则(例如残差最小化,交叉验证等)。\n",
"\n",
"以下是LARS算法的伪代码表示:\n",
"\n",
"------\n",
"\n",
"* 输入:特征矩阵X,目标变量向量y\n",
"* 输出:系数估计值β\n",
"\n",
"* 初始化:$β = 0, r = y$\n",
"* while (某个停止准则未被满足) do:\n",
" * 计算与残差$r$的相关性:$c = X^T * r$\n",
" * 选择与残差最相关的特征:$j = argmax(|c|)$\n",
" * 选择$j$对应的特征向量:$x_j = X[:, j]$\n",
" * 计算沿着$x_j$方向的步长:$s = X^T * (r - X * β) / (n * ||x_j - X * β||_2)$\n",
" * 更新系数:$β = β + s * x_j$\n",
" * 更新残差:$r = y - X * β$\n",
"* 其中,$X$是特征矩阵,$y$是目标变量向量,$β$是系数估计值,$r$是残差,$n$是样本数量。$||.||_2$表示$L2$范数。\n",
"\n",
"------\n",
"\n",
"LARS算法可以用于拟合线性回归模型并进行特征选择。它能够处理**高维数据集**,同时具有较低的计算复杂度。\n",
"\n",
"其主要的优点有:\n",
"\n",
"* 特别适合于特征维度n 远高于样本数m的情况。\n",
"* 算法的最坏计算复杂度和最小二乘法类似,但是其计算速度几乎和前向选择算法一样\n",
"* 可以产生分段线性结果的完整路径,这在模型的交叉验证中极为有用。\n",
"\n",
"主要的缺点是:\n",
"\n",
"* 由于LARS的迭代方向是根据目标的残差而定,所以该算法对样本的噪声极为敏感。\n",
"\n",
"在Scikit-learn中,你可以使用Lars类来实现最小角回归。例如:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "6ed5133b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"模型系数:\n",
"[ 0.02913523 -0.00313884 -0.03431681 0.29380121 -0.13901173 -0.08586778\n",
" 0.11000887 0.00837283 0.08400827 -0.11083981]\n"
]
}
],
"source": [
"from sklearn.linear_model import Lars\n",
"\n",
"# 生成样本数据\n",
"np.random.seed(0)\n",
"n_samples, n_features = 100, 10\n",
"X = np.random.randn(n_samples, n_features)\n",
"Y = np.random.randn(n_samples, 5) # 生成5个相关联的目标变量\n",
"\n",
"# 创建Lars对象并进行拟合\n",
"lars = Lars()\n",
"lars.fit(X, y)\n",
"\n",
"# 输出模型系数\n",
"print(\"模型系数:\")\n",
"print(lars.coef_)"
]
},
{
"cell_type": "markdown",
"id": "36d1ba79",
"metadata": {},
"source": [
"### LARS Lasso\n",
"\n",
"LARS Lasso(Least Angle Regression Lasso)是一种结合了最小角回归(LARS)和Lasso回归的正则化方法。\n",
"\n",
"与传统的LASSO回归相比,LARS Lasso具有更高的计算效率,因为它使用了逐步向前的方式来确定特征的顺序,而不需要像坐标下降或拟梯度等方法那样进行迭代优化。\n",
"\n",
"在Scikit-learn中,你可以使用LassoLars类来实现LARS Lasso回归。以下是一个示例代码:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "a99de35e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"模型系数:\n",
"[ 0. 0. 0. 0.21248063 -0.00992147 0.\n",
" 0.00512876 0. 0.00215299 -0.00265399]\n"
]
}
],
"source": [
"from sklearn.linear_model import LassoLars\n",
"\n",
"# 创建LassoLars对象并进行拟合\n",
"lasso_lars = LassoLars(alpha=0.1)\n",
"lasso_lars.fit(X, y)\n",
"\n",
"# 输出模型系数\n",
"print(\"模型系数:\")\n",
"print(lasso_lars.coef_)"
]
},
{
"cell_type": "markdown",
"id": "2ce359a1",
"metadata": {},
"source": [
"## 正交匹配追踪法\n",
"了解即可\n",
"\n",
"正交匹配追踪法(Orthogonal Matching Pursuit,简称OMP)是一种用于稀疏信号恢复和特征选择的迭代算法。它通过逐步选择与残差具有最大相关性的原子(基向量),并将其投影到残差上,以逐步逼近原始信号或寻找最佳特征子集。\n",
"\n",
"OMP算法的主要思想是在每个迭代步骤中选择一个原子,并将其添加到表示信号的索引集合中。在每次迭代中,OMP会计算残差与各个原子之间的相关性,并选择与残差具有最大相关性的原子。然后,它使用这个原子来更新当前估计值,并将其投影到残差上。该过程重复执行,直到满足预定的停止准则或达到所需的稀疏度。\n",
"\n",
"以下是OMP算法的基本步骤:\n",
"\n",
"* 初始化:设置估计系数为零向量,设置残差为输入信号。\n",
"* 选择原子:计算残差与每个原子之间的相关性,并选择相关性最高的原子。\n",
"* 更新估计值:将选定的原子添加到表示信号的索引集合中,并使用最小二乘法更新估计系数。\n",
"* 更新残差:将当前估计值投影到残差上,得到新的残差。\n",
"* 重复步骤2-4,直到满足停止准则(如残差的能量降低到某个阈值)或达到所需的稀疏度。\n",
"\n",
"OMP算法可以用于恢复稀疏信号、特征选择和压缩感知等领域。它是一种简单而高效的迭代算法,适用于处理高维数据和大规模问题。\n",
"\n",
"在Python中,你可以使用sklearn.linear_model.OrthogonalMatchingPursuit类来实现OMP算法。以下是一个示例代码:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "e66662e4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"模型系数:\n",
"[ 0. 0. 0. 0.30219523 -0.13606972 -0.10775375\n",
" 0. 0. 0.09119663 -0.11162544]\n"
]
}
],
"source": [
"from sklearn.linear_model import OrthogonalMatchingPursuit\n",
"\n",
"# 创建OrthogonalMatchingPursuit对象并进行拟合\n",
"omp = OrthogonalMatchingPursuit(n_nonzero_coefs=5) # 设置估计系数的非零数量\n",
"omp.fit(X, y)\n",
"\n",
"# 输出模型系数\n",
"print(\"模型系数:\")\n",
"print(omp.coef_)"
]
}
],
"metadata": {
Expand Down
Loading

0 comments on commit db84e2d

Please sign in to comment.