added markdown headings for quick navigation

raun1997 · Apr 1, 2023 · e25b59e · e25b59e
1 parent ecbd88b
commit e25b59e
Show file tree

Hide file tree

Showing 3 changed files with 125 additions and 45 deletions.
diff --git a/.ipynb_checkpoints/Linear Regression From Scratch-checkpoint.ipynb b/.ipynb_checkpoints/Linear Regression From Scratch-checkpoint.ipynb
diff --git a/.ipynb_checkpoints/practice-checkpoint.ipynb b/.ipynb_checkpoints/practice-checkpoint.ipynb
@@ -0,0 +1,6 @@
+{
+ "cells": [],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/Linear Regression From Scratch.ipynb b/Linear Regression From Scratch.ipynb
@@ -1,5 +1,13 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c33de807-3f38-4b78-b474-48f45c674e20",
+   "metadata": {},
+   "source": [
+    "# What on earth is this Linear Regression?"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "5f0a1fc6-ff76-4e0f-b563-4d5af6d8d007",
@@ -173,6 +181,14 @@
     "The number of rows corresponds to the total number of data samples. "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "dd434bf8-3cb6-4073-93bc-bde8875ab105",
+   "metadata": {},
+   "source": [
+    "# Mean Square Error"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "ae6fe76b",
@@ -233,10 +249,48 @@
   },
   {
    "cell_type": "markdown",
-   "id": "90e9ecad",
+   "id": "adc034bd-8729-4eaf-a6c7-aa6ccf44e64a",
    "metadata": {},
    "source": [
-    "We get a loss of around `183`. As I wrote earlier, our aim in Linear Regression is to minimise this value. Hence, we perform the elegant [Gradient Descent](https://developer.ibm.com/learningpaths/learning-path-machine-learning-for-developers/learn-regression-algorithms/?mhsrc=ibmsearch_a&mhq=regression) Algorithm. Extensively used in Machine Learning, this process involves finding the **local minimum** of a function. The sole idea behind this algorithm is to, provided a certain learning rate $ \\gamma $, take iterative steps (slowly) in the direction of $ -\\nabla{E} $ (negative gradient) in order to minimise the error rate. Each iteration of gradient descent updates the `m` and the `c` (collectively denoted by $\\theta$) according to,</br>\n",
+    "We get a loss of around `183`. As I wrote earlier, our aim in Linear Regression is to minimise this value. Hence, we perform the elegant [Gradient Descent](https://developer.ibm.com/learningpaths/learning-path-machine-learning-for-developers/learn-regression-algorithms/?mhsrc=ibmsearch_a&mhq=regression) Algorithm. Extensively used in Machine Learning, this process involves finding the **local minimum** of a function. The sole idea behind this algorithm is akin to that of a blind, old man trying to go down a hill.</br>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aaa1dfc9-5cfb-474a-a72b-c191a3ca4d4c",
+   "metadata": {},
+   "source": [
+    "# Intuition Behind Gradient Descent"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "abceb545-541d-41a8-aaf6-a5667f646172",
+   "metadata": {},
+   "source": [
+    "Let's consider a scenario where a blind man, equipped only with a blind cane, is standing at a hill's summit at some point. If we wishes to descent, he should take miniature steps towards the bottom of the hill lest he stumble. He will extensively use the local neighbourhood information to plan his next step carefully. He can stop when he is very close to the bottom of the hill. This bottom, is analogous to the **global minimizers** of any function. If he feels a flat surface or a small valley with his cane, he might deduce that he reached the bottom, when in fact he didn't. Such a flat surface is analogous to stationary points of a function, that are definitely not the **global minimizers**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddb6f7ef-0a4c-472a-9156-5f235ebdae0c",
+   "metadata": {},
+   "source": [
+    "# So What Is Gradient Descent?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2d14f69-b4b6-4ae1-99bb-82f7a8815331",
+   "metadata": {},
+   "source": [
+    "Similarly, the basic storyline of Gradient Descent is to\n",
+    "* start at some point (also known as the initialization),\n",
+    "* somehow move down slowly (with small step-size) using the local neighborhood\n",
+    "information towards the minimizer/stationary point,\n",
+    "* and when very close to or at the minimizer/stationary point, we stop.\n",
+    "\n",
+    "Mathematically, provided a certain learning rate $ \\gamma $, take iterative steps (slowly) in the direction of $ -\\nabla{E} $ (negative gradient) in order to minimise the error rate. Each iteration of gradient descent updates the `m` and the `c` (collectively denoted by $\\theta$) according to,</br>\n",
     "$$ \\theta^{t+1} = \\theta^{t} - \\gamma \\frac{\\partial{E}(X, \\theta)}{\\partial{\\theta}} $$ where $ \\theta^t $ denotes the weights (slope) and the bias (intercept) at iteration $t$.\n",
     "<!-- MORE EXPLANATION SHOULD BE PROVIDED --></br>"
    ]
@@ -415,6 +469,14 @@
     "Having obtained the final values for both `m` and `c`, we now perform visualization using `matplotlib`.</br>"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "b250cefa-5b99-44c0-9e65-670d1548152e",
+   "metadata": {},
+   "source": [
+    "# Data Visualization"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 123,