Making changes to Matplotlib document

TheDataMine · Jan 2, 2025 · 5789d3f · 5789d3f
1 parent d4a52af
commit 5789d3f
Show file tree

Hide file tree

Showing 5 changed files with 129 additions and 67 deletions.
diff --git a/tools-appendix/modules/python/images/matplot-histogram-aa.png b/tools-appendix/modules/python/images/matplot-histogram-aa.png
diff --git a/tools-appendix/modules/python/images/matplot-scatterplot-aa.png b/tools-appendix/modules/python/images/matplot-scatterplot-aa.png
diff --git a/tools-appendix/modules/python/pages/filtering-and-selecting.adoc b/tools-appendix/modules/python/pages/filtering-and-selecting.adoc
@@ -246,20 +246,21 @@ myDF[['ResidentStatus', 'Age']]
 The output of selecting multiple columns using the double brackets is a pandas `DataFrame`:
 
 ----
-ResidentStatus 	Age
-0 	1 	87
-1 	1 	58
-2 	1 	75
-3 	1 	74
-4 	1 	64
-... 	... 	...
-2631166 	3 	84
-2631167 	3 	74
-2631168 	3 	7
-2631169 	4 	49
-2631170 	3 	39
-
-2631171 rows × 2 columns
+   ResidentStatus  Age
+0               1   87
+1               1   58
+2               1   75
+3               1   74
+4               1   64
+...           ...  ...
+2631166         3   84
+2631167         3   74
+2631168         3    7
+2631169         4   49
+2631170         3   39
+
+[2631171 rows x 2 columns]
+
 ----
 
 == The iloc function 
@@ -295,9 +296,10 @@ AgeType                    1
 Age                       87
 AgeSubstitutionFlag        0
 Name: 0, dtype: object
+
 ----
 
-We can also use `iloc[]` to select the first row (index 0) and all columns using (:):
+We can also use `iloc[]` to select the first row (index 0) and all columns using `(:)` :
 [source,python]
 ----
 myDF.iloc[0, :]
@@ -379,16 +381,17 @@ filtered_myDF
 
 
 ----
-        Id 	ResidentStatus 	Sex 	Age 	Race 	MaritalStatus
-0 	    1 	        1 	    M 	    87 	    1 	        M
-1 	    2 	        1 	    M 	    58 	    1 	        D
-2 	    3 	        1 	    F 	    75 	    1 	        W
-3 	    4 	        1 	    M 	    74 	    1 	        D  
-4 	    5 	        1 	    M 	    64 	    1 	        D
-... 	... 	    ... 	... 	...     ...         ... 
+   Id  ResidentStatus Sex  Age  Race MaritalStatus
+0   1              1   M   87     1             M
+1   2              1   M   58     1             D
+2   3              1   F   75     1             W
+3   4              1   M   74     1             D
+4   5              1   M   64     1             D
+... ...            ... ...  ...   ...           ...
+
 ----
 
-Finally, let's try selecting multiple rows and multiple columns at the same time. When selecting multiple rows and multiple columns using iloc, the output is a subset of the DataFrame that contains the specified rows and all the columns. In this example, myDF.iloc[[0, 7, 9, 10], :] specifies the selection of rows 0, 7, 9, and 10 and all columns:
+Finally, let's try selecting multiple rows and multiple columns at the same time. When selecting multiple rows and multiple columns using iloc, the output is a subset of the DataFrame that contains the specified rows and all the columns. In this example, `myDF.iloc[[0, 7, 9, 10], :]` specifies the selection of rows 0, 7, 9, and 10 and all columns:
 
 [source,python]
 ----
@@ -397,11 +400,12 @@ filtered_myDF.iloc[[0, 7, 9, 10], :]
 
 
 ----
-    Id 	ResidentStatus 	Sex 	Age 	Race 	MaritalStatus
-0 	1 	    1 	        M 	    87 	    1 	        M
-7 	8 	    1 	        M 	    55 	    2 	        S
-9 	10 	    1 	        M 	    23 	    1       	S
-10 	11 	    1 	        F 	    79 	    1 	        W
+    Id  ResidentStatus Sex  Age  Race MaritalStatus
+0    1              1   M   87     1             M
+7    8              1   M   55     2             S
+9   10              1   M   23     1             S
+10  11              1   F   79     1             W
+
 ----
 
 == The loc function 
@@ -426,13 +430,14 @@ filtered_myDF.loc[:, filtered_myDF.columns != 'Race']
 ----
 
 ----
-        Id 	ResidentStatus 	Sex 	Age 	MaritalStatus
-0 	    1 	        1 	    M 	    87 	        M
-1 	    2 	        1 	    M 	    58 	        D
-2 	    3 	        1 	    F 	    75 	        W
-3 	    4 	        1 	    M 	    74 	        D
-4 	    5 	        1 	    M 	    64 	        D
-... 	... 	    ... 	...    ... 	       ...
+   Id  ResidentStatus Sex  Age MaritalStatus
+0   1              1   M   87             M
+1   2              1   M   58             D
+2   3              1   F   75             W
+3   4              1   M   74             D
+4   5              1   M   64             D
+... ...            ... ...  ...           ...
+
 ----
 
 
@@ -493,15 +498,15 @@ filtered_myDF[filtered_myDF['Sex'] == "F"]
 ----
 
 ----
-    Id 	ResidentStatus 	Sex 	Age 	Race 	MaritalStatus
-2 	3 	    1 	        F 	    75 	    1 	        W
-5 	6 	    1 	        F 	    93 	    1 	        W
-8 	9 	    1 	        F 	    86 	    1 	        W
-10 	11  	1 	        F 	    79 	    1 	        W
-12 	13 	    1 	        F 	    85 	    1 	        W
-    ... 	... 	    ... 	... 	... 	    ... 	
+         Id  ResidentStatus Sex  Age  Race MaritalStatus
+2         3              1   F   75     1             W
+5         6              1   F   93     1             W
+8         9              1   F   86     1             W
+10       11              1   F   79     1             W
+12       13              1   F   85     1             W
+...     ...            ... ...  ...   ...           ...
+[1299710 rows × 6 columns]
 
-1299710 rows × 6 columns
 ----
 
 We can also use `.loc` for filtering for females. 
@@ -512,13 +517,15 @@ filtered_myDF.loc[filtered_myDF['Sex'] == "F"]
 ----
 
 ----
-    Id 	ResidentStatus 	Sex 	Age 	Race 	MaritalStatus
-2 	3 	    1 	        F 	    75 	    1 	        W
-5 	6 	    1 	        F 	    93 	    1 	        W
-8 	9 	    1 	        F 	    86 	    1 	        W
-10 	11  	1 	        F 	    79 	    1 	        W
-12 	13 	    1 	        F 	    85 	    1 	        W
-    ... 	... 	    ... 	... 	... 	    ... 		
+         Id  ResidentStatus Sex  Age  Race MaritalStatus
+2         3              1   F   75     1             W
+5         6              1   F   93     1             W
+8         9              1   F   86     1             W
+10       11              1   F   79     1             W
+12       13              1   F   85     1             W
+...     ...            ... ...  ...   ...           ...
+[1299710 rows × 6 columns]
+	
 ----
 
 Now let's filter for two things. Let's filter for Females who are 114 years old. Suprisingly, some people do live that long based on our dataset!
@@ -529,12 +536,13 @@ filtered_myDF[(filtered_myDF['Sex'] == "F") & (filtered_myDF['Age'] == 114)]
 ----
 
 ----
-            Id 	    ResidentStatus 	Sex 	Age 	Race 	MaritalStatus
-265482 	    265483 	    1 	        F 	    114 	1 	        W
-1304830 	1304831 	1 	        F 	    114 	1 	        W
-1372655 	1372656 	1 	        F 	    114 	2 	        W
-1981235 	1981236 	1 	        F 	    114 	2 	        W
-2407245 	2407246 	1 	        F 	    114 	4 	        M	
+          Id  ResidentStatus Sex  Age  Race MaritalStatus
+265482  265483              1   F  114     1             W
+1304830 1304831             1   F  114     1             W
+1372655 1372656             1   F  114     2             W
+1981235 1981236             1   F  114     2             W
+2407245 2407246             1   F  114     4             M
+
 ----
 
 Another method that would get us the same results: 
@@ -546,12 +554,13 @@ filtered_myDF.loc[(filtered_myDF['Sex'] == "F") & (filtered_myDF['Age'] == 114)]
 ----
 
 ----
-            Id 	    ResidentStatus 	Sex 	Age 	Race 	MaritalStatus
-265482 	    265483 	    1 	        F 	    114 	1 	        W
-1304830 	1304831 	1 	        F 	    114 	1 	        W
-1372655 	1372656 	1 	        F 	    114 	2 	        W
-1981235 	1981236 	1 	        F 	    114 	2 	        W
-2407245 	2407246 	1 	        F 	    114 	4 	        M	
+          Id  ResidentStatus Sex  Age  Race MaritalStatus
+265482  265483              1   F  114     1             W
+1304830 1304831             1   F  114     1             W
+1372655 1372656             1   F  114     2             W
+1981235 1981236             1   F  114     2             W
+2407245 2407246             1   F  114     4             M
+
 ----
 
 === Filtering and Modifying the Dataset

diff --git a/tools-appendix/modules/python/pages/index.adoc b/tools-appendix/modules/python/pages/index.adoc
@@ -18,6 +18,7 @@ Python is largely known for its readability and versatility. Its design philosop
 * xref:plotly-examples.adoc[Data Visualization with plotly]
 * xref:writing-functions.adoc[Writing Functions in Python]
 * xref:writing-scripts.adoc[Writing Scripts in Python]
+* xref:pandas-series.adoc[Pandas Series]
 * xref:pandas-dates-and-times.adoc[Handling Dates and Times in pandas]
 * xref:pandas-aggregate-functions.adoc[Applying Aggregate Functions in pandas]
 * xref:pandas-reshaping.adoc[Reshaping Data in pandas]

diff --git a/tools-appendix/modules/python/pages/matplotlib.adoc b/tools-appendix/modules/python/pages/matplotlib.adoc
@@ -1,11 +1,13 @@
-= matplotlib
+= Matplotlib
 
 When starting with Python, the most common plotting package is often `matplotlib`. It is an easy and straightforward plotting tool, with a surprising amount of depth. Like any package, it also has pluses and minuses. 
 
 Importing `matplotlib` for use in a project is pretty straightforward: 
 
-* <<barplot, barplot>>
-* <<boxplot, boxplot>>
+* <<Barplots Using Matplotlib, Barplots Using Matplotlib>>
+* <<Boxplots Using Matplotlib, Boxplots Using Matplotlib>>
+* <<Histograms Using Matplotlib, Histograms Using Matplotlib>>
+* <<Scatterplots Using Matplotlib, Scatterplots Using Matplotlib>>
 
 [source,python]
 ----
@@ -26,7 +28,7 @@ For those of us who aren't familiar with MATLAB the `pyplot` functionality creat
 
 {sp}+
 
-== barplot
+== Barplots Using Matplotlib
 
 Barplots can take many forms. They are most often utilized when comparing change over time or comparisons between categories for a data set. As with many of the plotting types `matplotlib` has the built-in `barplot` function to create the visualizations. 
 
@@ -289,7 +291,7 @@ plt.close()
 
 This just starts to scratch the surface of what is possible with `matplotlib` but it does show the deep customization that is possible via the package.
 
-== boxplot
+== Boxplots Using Matplotlib
 
 `boxplot` is a function that creates a https://en.wikipedia.org/wiki/Box_plot[boxplot]. While that may not be very surprising, it is surprising how helpful boxplots can be in summarizing your data. Boxplots show a number of different measures related to the data such as quartiles, upper and lower bounds, and potential outliers. They can also he helpful to identify general trends between groups or over time. However, it should be noted there may be better plots for specific use cases. 
 
@@ -466,3 +468,53 @@ plt.close()
 image::box_6.png[Boxplot with better color, width=792, height=500, loading=lazy, title="Boxplot with better color"]
 
 Now we have a good looking boxplot! Hopefully this demonstration showed how helpful boxplots can be when interpreting data. It also shows how `matplotlib` plots can be further customized, to fit the needs of the visualization!
+
+== Histograms Using Matplotlib
+
+A histogram is a way to visualize the distribution of numerical data. In Python, it groups data points into intervals (called bins) and uses bars to represent the frequency of data falling within each interval. The height of each bar shows how many data points are in that range. 
+
+Let's visualize the precipitation data in our dataset by plotting a histogram with Matplotlib. 
+
+
+[source,python]
+----
+myDF = pd.read_csv("/anvil/projects/tdm/data/precip/precip.csv")
+plt.hist(myDF['precip'], bins=10, edgecolor='black')
+plt.title('Histogram of Precipitation')
+plt.xlabel('Precipitation (inches)')
+plt.ylabel('Frequency')
+plt.show()
+----
+
+
+image::matplot-histogram-aa.png[Plotting a histogram, width=792, height=500, loading=lazy, title="Histogram in Matplotlib"]
+
+
+
+== Scatterplots Using Matplotlib
+
+A scatter plot is a way to visualize the relationship between two variables. In Python, it uses individual points plotted on a Cartesian plane, where the position of each point is determined by its values for the two variables. Scatter plots are useful for identifying patterns, trends, or correlations in the data.
+
+Let's visualize the precipitation data in our dataset by plotting a scatter plot with Matplotlib.
+
+[source,python]
+----
+import pandas as pd
+import matplotlib.pyplot as plt
+
+myDF = pd.read_csv("/anvil/projects/tdm/data/precip/precip.csv")
+plt.scatter(myDF['place'].iloc[:10], myDF['precip'].iloc[:10], color='blue')
+
+plt.title("Scatter Plot of Precipitation (Top 10 Places)")
+plt.xlabel("Place")
+plt.ylabel("Precipitation (inches)")
+
+plt.xticks(rotation=45)
+plt.tight_layout()
+plt.show()
+----
+
+When creating plots, it's improtant to try to understand the overall trends they reveal. From the plot, we observe that among the first 10 places, Mobile, Phoenix, and Little Rock have the highest precipitation levels.
+
+image::matplot-scatterplot-aa.png[Plotting a scatterplot, width=792, height=500, loading=lazy, title="Scatterplot in Matplotlib"]
+