Code
import pandas as pd
@@ -364,7 +364,7 @@ 'Gross national income per capita, Atlas method: $: 2016':'gni'})
wb.head()
import seaborn as sns
import matplotlib.pyplot as plt
@@ -552,7 +552,7 @@
We’ll explain what a “kernel” is momentarily.
To make things simpler, let’s construct a KDE for a small, artificially generated dataset of 5 datapoints: \([2.2, 2.8, 3.7, 5.3, 5.7]\). In the plot below, each vertical bar represents one data point.
-
+
Code
= [2.2, 2.8, 3.7, 5.3, 5.7]
@@ -573,7 +573,7 @@ data
Our goal is to create the following KDE curve, which was generated automatically by sns.kdeplot
.
-
+
Code
@@ -590,30 +590,15 @@ sns.kdeplot(data)
-
-8.1.2.1 Step 1: Place a Kernel at Each Data Point
-To begin generating a density curve, we need to choose a kernel and bandwidth value (\(\alpha\)). What are these exactly?
-A kernel is a density curve. It is the mathematical function that attempts to capture the randomness of each data point in our sampled data. To explain what this means, consider just one of the datapoints in our dataset: \(2.2\). We obtained this datapoint by randomly sampling some information out in the real world (you can imagine \(2.2\) as representing a single measurement taken in an experiment, for example). If we were to sample a new datapoint, we may obtain a slightly different value. It could be higher than \(2.2\); it could also be lower than \(2.2\). We make the assumption that any future sampled datapoints will likely be similar in value to the data we’ve already drawn. This means that our kernel – our description of the probability of randomly sampling any new value – will be greatest at the datapoint we’ve already drawn but still have non-zero probability above and below it. The area under any kernel should integrate to 1, representing the total probability of drawing a new datapoint.
-A bandwidth value, usually denoted by \(\alpha\), represents the width of the kernel. A large value of \(\alpha\) will result in a wide, short kernel function, while a small value with result in a narrow, tall kernel.
-Below, we place a Gaussian kernel, plotted in orange, over the datapoint \(2.2\). A Gaussian kernel is simply the normal distribution, which you may have called a bell curve in Data 8.
-
+Alternatively, we can use sns.histplot
. This plot also visualizes the underlying bins as a histogram.
+
Code
-def gaussian_kernel(x, z, a):
-# We'll discuss where this mathematical formulation came from later
- return (1/np.sqrt(2*np.pi*a**2)) * np.exp((-(x - z)**2 / (2 * a**2)))
-
-# Plot our datapoint
-2.2], height=0.3)
- sns.rugplot([
-# Plot the kernel
-= np.linspace(-3, 10, 1000)
- x 2.2, 1))
- plt.plot(x, gaussian_kernel(x,
-"Data")
- plt.xlabel("Density")
- plt.ylabel(-3, 10)
- plt.xlim(0, 0.5); plt.ylim(
+=2, kde=True, stat="density", kde_kws=dict(cut=3, bw_method=0.65))
+ sns.histplot(data, bins
+"Data")
+ plt.xlabel(-3, 10)
+ plt.xlim(0, 0.5); plt.ylim(
@@ -623,45 +608,30 @@
+
+8.1.2.1 Step 1: Place a Kernel at Each Data Point
+To begin generating a density curve, we need to choose a kernel and bandwidth value (\(\alpha\)). What are these exactly?
+A kernel is a density curve. It is the mathematical function that attempts to capture the randomness of each data point in our sampled data. To explain what this means, consider just one of the datapoints in our dataset: \(2.2\). We obtained this datapoint by randomly sampling some information out in the real world (you can imagine \(2.2\) as representing a single measurement taken in an experiment, for example). If we were to sample a new datapoint, we may obtain a slightly different value. It could be higher than \(2.2\); it could also be lower than \(2.2\). We make the assumption that any future sampled datapoints will likely be similar in value to the data we’ve already drawn. This means that our kernel – our description of the probability of randomly sampling any new value – will be greatest at the datapoint we’ve already drawn but still have non-zero probability above and below it. The area under any kernel should integrate to 1, representing the total probability of drawing a new datapoint.
+A bandwidth value, usually denoted by \(\alpha\), represents the width of the kernel. A large value of \(\alpha\) will result in a wide, short kernel function, while a small value with result in a narrow, tall kernel.
+Below, we place a Gaussian kernel, plotted in orange, over the datapoint \(2.2\). A Gaussian kernel is simply the normal distribution, which you may have called a bell curve in Data 8.
+
Code
-# You will work with the functions below in Lab 4
-def create_kde(kernel, pts, a):
-# Takes in a kernel, set of points, and alpha
- # Returns the KDE as a function
- def f(x):
- = 0
- output for pt in pts:
- += kernel(x, pt, a)
- output return output / len(pts) # Normalization factor
- return f
+ def gaussian_kernel(x, z, a):
+# We'll discuss where this mathematical formulation came from later
+ return (1/np.sqrt(2*np.pi*a**2)) * np.exp((-(x - z)**2 / (2 * a**2)))
+
+# Plot our datapoint
+2.2], height=0.3)
+ sns.rugplot([
+# Plot the kernel
+= np.linspace(-3, 10, 1000)
+ x 2.2, 1))
plt.plot(x, gaussian_kernel(x,
-def plot_kde(kernel, pts, a):
-# Calls create_kde and plots the corresponding KDE
- = create_kde(kernel, pts, a)
- f = np.linspace(min(pts) - 5, max(pts) + 5, 1000)
- x = [f(xi) for xi in x]
- y ;
- plt.plot(x, y)
- def plot_separate_kernels(kernel, pts, a, norm=False):
-# Plots individual kernels, which are then summed to create the KDE
- = np.linspace(min(pts) - 5, max(pts) + 5, 1000)
- x for pt in pts:
- = kernel(x, pt, a)
- y if norm:
- /= len(pts)
- y
- plt.plot(x, y)
- ;
- plt.show()
- -3, 10)
- plt.xlim(0, 0.5)
- plt.ylim("Data")
- plt.xlabel("Density")
- plt.ylabel(
-= 1) plot_separate_kernels(gaussian_kernel, data, a
+"Data")
+ plt.xlabel("Density")
+ plt.ylabel(-3, 10)
+ plt.xlim(0, 0.5); plt.ylim(
@@ -671,21 +641,45 @@
-8.1.2.2 Step 2: Normalize Kernels to Have a Total Area of 1
-
Above, we said that each kernel has an area of 1. Earlier, we also said that our goal is to construct a KDE curve using these kernels with a total area of 1. If we were to directly sum the kernels as they are, we would produce a KDE curve with an integrated area of (5 kernels) \(\times\) (area of 1 each) = 5. To avoid this, we will normalize each of our kernels. This involves multiplying each kernel by \(\frac{1}{\#\:\text{datapoints}}\).
-In the cell below, we multiply each of our 5 kernels by \(\frac{1}{5}\) to apply normalization.
-
+To begin creating our KDE, we place a kernel on each datapoint in our dataset. For our dataset of 5 points, we will have 5 kernels.
+
Code
--3, 10)
- plt.xlim(0, 0.5)
- plt.ylim("Data")
- plt.xlabel("Density")
- plt.ylabel(
-# The `norm` argument specifies whether or not to normalize the kernels
-= 1, norm = True) plot_separate_kernels(gaussian_kernel, data, a
+# You will work with the functions below in Lab 4
+def create_kde(kernel, pts, a):
+# Takes in a kernel, set of points, and alpha
+ # Returns the KDE as a function
+ def f(x):
+ = 0
+ output for pt in pts:
+ += kernel(x, pt, a)
+ output return output / len(pts) # Normalization factor
+ return f
+
+def plot_kde(kernel, pts, a):
+# Calls create_kde and plots the corresponding KDE
+ = create_kde(kernel, pts, a)
+ f = np.linspace(min(pts) - 5, max(pts) + 5, 1000)
+ x = [f(xi) for xi in x]
+ y ;
+ plt.plot(x, y)
+ def plot_separate_kernels(kernel, pts, a, norm=False):
+# Plots individual kernels, which are then summed to create the KDE
+ = np.linspace(min(pts) - 5, max(pts) + 5, 1000)
+ x for pt in pts:
+ = kernel(x, pt, a)
+ y if norm:
+ /= len(pts)
+ y
+ plt.plot(x, y)
+ ;
+ plt.show()
+ -3, 10)
+ plt.xlim(0, 0.5)
+ plt.ylim("Data")
+ plt.xlabel("Density")
+ plt.ylabel(
+= 1) plot_separate_kernels(gaussian_kernel, data, a
@@ -696,10 +690,11 @@
-8.1.2.3 Step 3: Sum the Normalized Kernels
-
Our KDE curve is the sum of the normalized kernels. Notice that the final curve is identical to the plot generated by sns.kdeplot
we saw earlier!
-
+
+8.1.2.2 Step 2: Normalize Kernels to Have a Total Area of 1
+Above, we said that each kernel has an area of 1. Earlier, we also said that our goal is to construct a KDE curve using these kernels with a total area of 1. If we were to directly sum the kernels as they are, we would produce a KDE curve with an integrated area of (5 kernels) \(\times\) (area of 1 each) = 5. To avoid this, we will normalize each of our kernels. This involves multiplying each kernel by \(\frac{1}{\#\:\text{datapoints}}\).
+In the cell below, we multiply each of our 5 kernels by \(\frac{1}{5}\) to apply normalization.
+
Code
-3, 10)
@@ -707,8 +702,8 @@ plt.xlim("Data")
"Density")
plt.ylabel(
-=0.65)
- sns.kdeplot(data, bw_method="density", bins=2); sns.histplot(data, stat
plt.xlabel(
+# The `norm` argument specifies whether or not to normalize the kernels
+= 1, norm = True) plot_separate_kernels(gaussian_kernel, data, a
@@ -718,8 +713,11 @@
+
+
+8.1.2.3 Step 3: Sum the Normalized Kernels
+Our KDE curve is the sum of the normalized kernels. Notice that the final curve is identical to the plot generated by sns.kdeplot
we saw earlier!
+
Code
-3, 10)
@@ -727,7 +725,7 @@ plt.xlim("Data")
"Density")
plt.ylabel(
-=2, kde=True, stat="density", kde_kws=dict(cut=3, bw_method=0.65)) sns.histplot(data, bins
plt.xlabel(
+=1) plot_kde(gaussian_kernel, data, a
@@ -829,7 +827,7 @@
The boxcar kernel is seldom used in practice – we include it here to demonstrate that a kernel function can take whatever form you would like, provided it integrates to 1 and does not output negative values.
-
+
Code
def boxcar_kernel(alpha, x, z):
@@ -876,7 +874,7 @@ . Note that here we’ve specified stat = density
to normalize the histogram such that the area under the histogram is equal to 1.
-
+
=wb,
sns.displot(data="gni",
x="hist",
@@ -891,7 +889,7 @@ kind!
-
+
=wb,
sns.displot(data="gni",
x='kde')
@@ -905,7 +903,7 @@ kind.
-
+
=wb,
sns.displot(data="gni",
x='ecdf')
@@ -926,7 +924,7 @@ kind8.3.0.1 Scatter Plots
Scatter plots are one of the most useful tools in representing the relationship between pairs of quantitative variables. They are particularly important in gauging the strength, or correlation, of the relationship between variables. Knowledge of these relationships can then motivate decisions in our modeling process.
In matplotlib
, we use the function plt.scatter
to generate a scatter plot. Notice that, unlike our examples of plotting single-variable distributions, now we specify sequences of values to be plotted along the x-axis and the y-axis.
-
+
"per capita: % growth: 2016"], \
plt.scatter(wb['Adult literacy rate: Female: % ages 15 and older: 2005-14'])
wb[
@@ -942,7 +940,7 @@
In seaborn
, we call the function sns.scatterplot
. We use the x
and y
parameters to indicate the values to be plotted along the x and y axes, respectively. By using the hue
parameter, we can specify a third variable to be used for coloring each scatter point.
-
+
= wb, x = "per capita: % growth: 2016", \
sns.scatterplot(data = "Adult literacy rate: Female: % ages 15 and older: 2005-14",
y = "Continent")
@@ -965,7 +963,7 @@ hue Jittering is the process of adding a small amount of random noise to all x and y values to slightly shift the position of each datapoint. By randomly shifting all the data by some small distance, we can discern individual points more clearly without modifying the major trends of the original dataset.
In the cell below, we first jitter the data using np.random.uniform
, then re-plot it with smaller markers. The resulting plot is much easier to interpret.
-
+
# Setting a seed ensures that we produce the same plot each time
# This means that the course notes will not change each time you access them
150)
@@ -999,7 +997,7 @@ np.random.seed(8.3.0.2 lmplot
and jointplot
seaborn
also includes several built-in functions for creating more sophisticated scatter plots. Two of the most commonly used examples are sns.lmplot
and sns.jointplot
.
sns.lmplot
plots both a scatter plot and a linear regression line, all in one function call. We’ll discuss linear regression in a few lectures.
-
+
= wb, x = "per capita: % growth: 2016", \
sns.lmplot(data = "Adult literacy rate: Female: % ages 15 and older: 2005-14")
y
@@ -1013,7 +1011,7 @@
sns.jointplot
creates a visualization with three components: a scatter plot, a histogram of the distribution of x values, and a histogram of the distribution of y values.
-
+
= wb, x = "per capita: % growth: 2016", \
sns.jointplot(data = "Adult literacy rate: Female: % ages 15 and older: 2005-14")
y
@@ -1034,7 +1032,7 @@ For datasets with a very large number of datapoints, jittering is unlikely to fully resolve the issue of overplotting. In these cases, we can attempt to visualize our data by its density, rather than displaying each individual datapoint.
Hex plots can be thought of as two-dimensional histograms that show the joint distribution between two variables. This is particularly useful when working with very dense data. In a hex plot, the x-y plane is binned into hexagons. Hexagons that are darker in color indicate a greater density of data – that is, there are more data points that lie in the region enclosed by the hexagon.
We can generate a hex plot using sns.jointplot
modified with the kind
parameter.
-
+
= wb, x = "per capita: % growth: 2016", \
sns.jointplot(data = "Adult literacy rate: Female: % ages 15 and older: 2005-14", \
y = "hex")
@@ -1055,7 +1053,7 @@ kind 8.3.0.4 Contour Plots
Contour plots are an alternative way of plotting the joint distribution of two variables. You can think of them as the 2-dimensional versions of KDE plots. A contour plot can be interpreted in a similar way to a topographic map. Each contour line represents an area that has the same density of datapoints throughout the region. Contours marked with darker colors contain more datapoints (a higher density) in that region.
sns.kdeplot
will generate a contour plot if we specify both x and y data.
-
+
= wb, x = "per capita: % growth: 2016", \
sns.kdeplot(data = "Adult literacy rate: Female: % ages 15 and older: 2005-14", \
y = True)
@@ -1077,7 +1075,7 @@ fill Much of this was done to uncover insights in data, which will prove necessary when we begin building models of data later in the course. A strong graphical correlation between two variables hints at an underlying relationship that we may want to study in greater detail. However, relying on visual relationships alone is limiting - not all plots show association. The presence of outliers and other statistical anomalies makes it hard to interpret data.
Transformations are the process of manipulating data to find significant relationships between variables. These are often found by applying mathematical functions to variables that “transform” their range of possible values and highlight some previously hidden associations between data.
To see why we may want to transform data, consider the following plot of adult literacy rates against gross national income.
-
+
Code
# Some data cleaning to help with the next example
@@ -1121,7 +1119,7 @@ \(\log{(100)} = 4.61\) and \(\log{(10)} = 2.3\)).
In Data 100 (and most upper-division STEM classes), \(\log\) is used to refer to the natural logarithm with base \(e\).
-
+
# np.log takes the logarithm of an array or Series
"inc"]), df["lit"])
plt.scatter(np.log(df[
@@ -1144,7 +1142,7 @@ \(2^4 = 16\) and \(200^4 = 1600000000\)).
-
+
# Apply a log transformation to the x values and a power transformation to the y values
"inc"]), df["lit"]**4)
plt.scatter(np.log(df[
@@ -1165,7 +1163,7 @@ \[y^4 = m(\log{x}) + b\]
Where \(m\) represents the slope of the linear fit, while \(b\) represents the intercept.
The cell below computes \(m\) and \(b\) for our transformed data. We’ll discuss how this code was generated in a future lecture.
-
+
Code
# The code below fits a linear regression model. We'll discuss it at length in a future lecture
@@ -1203,7 +1201,7 @@ \(x\) and \(y\).
\[y = [m(\log{x}) + b]^{(1/4)}\]
When we plug in the values for \(m\) and \(b\) computed above, something interesting happens.
-
+
Code
# Now, plug the values for m and b into the relationship between the untransformed x and y
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-10-output-1.png b/docs/visualization_2/visualization_2_files/figure-html/cell-10-output-1.png
index 27468a46..16876c4d 100644
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-10-output-1.png and b/docs/visualization_2/visualization_2_files/figure-html/cell-10-output-1.png differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-11-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-11-output-2.png
deleted file mode 100644
index e4a89e13..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-11-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-12-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-12-output-2.png
deleted file mode 100644
index c5644115..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-12-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-13-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-13-output-2.png
deleted file mode 100644
index c802680f..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-13-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-18-output-1.png b/docs/visualization_2/visualization_2_files/figure-html/cell-18-output-1.png
index 7d55dffe..e33716b1 100644
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-18-output-1.png and b/docs/visualization_2/visualization_2_files/figure-html/cell-18-output-1.png differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-18-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-18-output-2.png
deleted file mode 100644
index 9d20a774..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-18-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-19-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-19-output-2.png
deleted file mode 100644
index cf460dcf..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-19-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-20-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-20-output-2.png
deleted file mode 100644
index 916ff06f..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-20-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-24-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-24-output-2.png
deleted file mode 100644
index c416ca30..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-24-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-25-output-1.png b/docs/visualization_2/visualization_2_files/figure-html/cell-25-output-1.png
deleted file mode 100644
index 2ad64eb3..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-25-output-1.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-3-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-3-output-2.png
deleted file mode 100644
index d90bc5f6..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-3-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-4-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-4-output-2.png
deleted file mode 100644
index 04b3e63e..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-4-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-5-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-5-output-2.png
deleted file mode 100644
index d45c62e3..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-5-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-6-output-1.png b/docs/visualization_2/visualization_2_files/figure-html/cell-6-output-1.png
index 1e3539b2..27468a46 100644
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-6-output-1.png and b/docs/visualization_2/visualization_2_files/figure-html/cell-6-output-1.png differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-6-output-2.png b/docs/visualization_2/visualization_2_files/figure-html/cell-6-output-2.png
deleted file mode 100644
index 1e3539b2..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-6-output-2.png and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-7-output-1.png b/docs/visualization_2/visualization_2_files/figure-html/cell-7-output-1.png
index 9ba547f2..1e3539b2 100644
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-7-output-1.png and b/docs/visualization_2/visualization_2_files/figure-html/cell-7-output-1.png differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-8-output-1.png b/docs/visualization_2/visualization_2_files/figure-html/cell-8-output-1.png
index 35bf23c0..9ba547f2 100644
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-8-output-1.png and b/docs/visualization_2/visualization_2_files/figure-html/cell-8-output-1.png differ
diff --git a/docs/visualization_2/visualization_2_files/figure-html/cell-9-output-1.png b/docs/visualization_2/visualization_2_files/figure-html/cell-9-output-1.png
index a6f13e27..35bf23c0 100644
Binary files a/docs/visualization_2/visualization_2_files/figure-html/cell-9-output-1.png and b/docs/visualization_2/visualization_2_files/figure-html/cell-9-output-1.png differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-10-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-10-output-1.pdf
deleted file mode 100644
index d6d1e7e5..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-10-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-11-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-11-output-1.pdf
deleted file mode 100644
index 408e75a9..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-11-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-11-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-11-output-2.pdf
deleted file mode 100644
index bda8cdf0..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-11-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-12-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-12-output-1.pdf
deleted file mode 100644
index a7a63844..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-12-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-12-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-12-output-2.pdf
deleted file mode 100644
index 882d332b..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-12-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-13-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-13-output-1.pdf
deleted file mode 100644
index a75e6c3f..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-13-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-13-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-13-output-2.pdf
deleted file mode 100644
index 788a3158..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-13-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-14-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-14-output-1.pdf
deleted file mode 100644
index c0cb322a..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-14-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-15-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-15-output-1.pdf
deleted file mode 100644
index 595bbf18..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-15-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-16-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-16-output-1.pdf
deleted file mode 100644
index 85c7cfd8..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-16-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-17-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-17-output-1.pdf
deleted file mode 100644
index 7e9a6959..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-17-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-18-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-18-output-1.pdf
deleted file mode 100644
index b2f5a4b2..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-18-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-18-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-18-output-2.pdf
deleted file mode 100644
index 9888e1bf..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-18-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-19-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-19-output-1.pdf
deleted file mode 100644
index 613781de..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-19-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-19-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-19-output-2.pdf
deleted file mode 100644
index b6c6327e..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-19-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-20-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-20-output-1.pdf
deleted file mode 100644
index 0488ae23..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-20-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-20-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-20-output-2.pdf
deleted file mode 100644
index 23c9dff9..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-20-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-21-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-21-output-1.pdf
deleted file mode 100644
index c657f3ef..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-21-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-22-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-22-output-1.pdf
deleted file mode 100644
index 8d6649fc..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-22-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-23-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-23-output-1.pdf
deleted file mode 100644
index 97f795d1..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-23-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-24-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-24-output-1.pdf
deleted file mode 100644
index 9bcedb5f..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-24-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-24-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-24-output-2.pdf
deleted file mode 100644
index 39d611fc..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-24-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-25-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-25-output-1.pdf
deleted file mode 100644
index 82ee7eb4..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-25-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-25-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-25-output-2.pdf
deleted file mode 100644
index 30720779..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-25-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-26-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-26-output-1.pdf
deleted file mode 100644
index 99efadda..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-26-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-3-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-3-output-1.pdf
deleted file mode 100644
index 6d22cde7..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-3-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-3-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-3-output-2.pdf
deleted file mode 100644
index 6bec915d..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-3-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-4-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-4-output-1.pdf
deleted file mode 100644
index b8c3370a..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-4-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-4-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-4-output-2.pdf
deleted file mode 100644
index a93e0bef..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-4-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-5-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-5-output-1.pdf
deleted file mode 100644
index 6c1fd3af..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-5-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-5-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-5-output-2.pdf
deleted file mode 100644
index b28e4d7d..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-5-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-6-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-6-output-1.pdf
deleted file mode 100644
index 2bb1512a..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-6-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-6-output-2.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-6-output-2.pdf
deleted file mode 100644
index 47a09495..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-6-output-2.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-7-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-7-output-1.pdf
deleted file mode 100644
index fb8aff75..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-7-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-8-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-8-output-1.pdf
deleted file mode 100644
index 188bdbfc..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-8-output-1.pdf and /dev/null differ
diff --git a/docs/visualization_2/visualization_2_files/figure-pdf/cell-9-output-1.pdf b/docs/visualization_2/visualization_2_files/figure-pdf/cell-9-output-1.pdf
deleted file mode 100644
index c6bbff68..00000000
Binary files a/docs/visualization_2/visualization_2_files/figure-pdf/cell-9-output-1.pdf and /dev/null differ
diff --git a/visualization_2/visualization_2.qmd b/visualization_2/visualization_2.qmd
index 5861eeed..1531766c 100644
--- a/visualization_2/visualization_2.qmd
+++ b/visualization_2/visualization_2.qmd
@@ -107,6 +107,17 @@ plt.xlim(-3, 10)
plt.ylim(0, 0.5);
```
+Alternatively, we can use `sns.histplot`. This plot also visualizes the underlying bins as a histogram.
+
+```{python}
+#| code-fold: true
+sns.histplot(data, bins=2, kde=True, stat="density", kde_kws=dict(cut=3, bw_method=0.65))
+
+plt.xlabel("Data")
+plt.xlim(-3, 10)
+plt.ylim(0, 0.5);
+```
+
#### Step 1: Place a Kernel at Each Data Point
To begin generating a density curve, we need to choose a **kernel** and **bandwidth value ($\alpha$)**. What are these exactly?
@@ -207,20 +218,7 @@ plt.ylim(0, 0.5)
plt.xlabel("Data")
plt.ylabel("Density")
-sns.kdeplot(data, bw_method=0.65)
-sns.histplot(data, stat="density", bins=2);
-```
-
-An alternative method to generate the above KDE is shown below, this time using `sns.histplot`'s arguments.
-
-```{python}
-#| code-fold: true
-plt.xlim(-3, 10)
-plt.ylim(0, 0.5)
-plt.xlabel("Data")
-plt.ylabel("Density")
-
-sns.histplot(data, bins=2, kde=True, stat="density", kde_kws=dict(cut=3, bw_method=0.65))
+plot_kde(gaussian_kernel, data, a=1)
```
### Kernel Functions and Bandwidths