diff --git a/src/notebooks/541-waffle-chart-with-additionnal-grouping.ipynb b/src/notebooks/541-waffle-chart-with-additionnal-grouping.ipynb new file mode 100644 index 0000000000..3a5c1d7985 --- /dev/null +++ b/src/notebooks/541-waffle-chart-with-additionnal-grouping.ipynb @@ -0,0 +1,220 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Libraries\n", + "\n", + "This post relies on the [PyWaffle library](https://github.com/gyli/PyWaffle), that is definitely the best way to create a waffle chart with Python.\n", + "\n", + "The very first thing to do is to install the library:\n", + "\n", + "`pip install pywaffle`\n", + "\n", + "Then, we just have to import the following libraries:\n", + "- `pandas` for creating a dataframe with our data\n", + "- `matplotlib` for customizing the chart\n", + "- `pywaffle` for the **waffle** type figure" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "from pywaffle import Waffle\n", + "import pandas as pd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataset\n", + "\n", + "We create a simple dataset with the **number of cars**, broken down **by factory and car type** (`car`, `truck` or `motorcycle`). Also, we define the index of this dataset using the `set_index()` function and specify that it is the `label` variable." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "data = {'labels': ['Car', 'Truck', 'Motorcycle'],\n", + " 'Factory A': [32384, 13354, 5245],\n", + " 'Factory B': [22147, 6678, 2156],\n", + " 'Factory C': [8932, 3879, 896],\n", + " }\n", + "df = pd.DataFrame(data).set_index('labels')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Simple waffle chart\n", + "\n", + "First, let's create a **simple waffle chart** to see what it looks like. We have to define the properties of our chart and then pass it to the `figure()` function from [matplotlib](https://python-graph-gallery.com/matplotlib/)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot = {'values': [value/1000 for value in data['Factory A']], # Convert actual number to a reasonable block number\n", + " 'labels': [f\"{index} ({value})\" for index, value in zip(df['Factory A'],df.index)],\n", + " 'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 8},\n", + " 'title': {'label': 'Vehicle Production of Factory A', 'loc': 'left', 'fontsize': 12}\n", + " }\n", + "\n", + "fig = plt.figure(\n", + " FigureClass=Waffle,\n", + " plots={311: plot},\n", + " rows=5, # Outside parameter\n", + " cmap_name=\"Accent\", # Change color with cmap\n", + " rounding_rule='ceil', # Change rounding rule, so value less than 1000 will still have at least 1 block\n", + " figsize=(8, 6)\n", + ")\n", + "\n", + "# Display the chart\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Waffle chart with grouping\n", + "\n", + "This example can be found in the [original documentation of PyWaffle](https://github.com/gyli/PyWaffle) (this one here is slighly different). \n", + "\n", + "The very first thing to do is to create our `plot dictionnaries`. In our case, since we want 3 different charts, we create 3 plot dictionnaries with the **values and properties** we want them to have. We divide the values by 1000 only because we want to **reduce the size** of them but it's optionnal. \n", + "\n", + "Then, we just have to add these dictionnaries into the `plot` argument of the `figure()` function from matplotlib. " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plot1 = {'values': [value/1000 for value in data['Factory A']], # Convert actual number to a reasonable block number\n", + " 'labels': [f\"{index} ({value})\" for index, value in zip(df['Factory A'],df.index)],\n", + " 'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 8},\n", + " 'title': {'label': 'Vehicle Production of Factory A', 'loc': 'left', 'fontsize': 12}\n", + " }\n", + "\n", + "plot2 = {'values': [value/1000 for value in data['Factory B']],\n", + " 'labels': [f\"{index} ({value})\" for index, value in zip(df['Factory B'],df.index)],\n", + " 'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.2, 1), 'fontsize': 8},\n", + " 'title': {'label': 'Vehicle Production of Factory B', 'loc': 'left', 'fontsize': 12}\n", + " }\n", + "\n", + "plot3 = {'values': [value/1000 for value in data['Factory C']],\n", + " 'labels': [f\"{index} ({value})\" for index, value in zip(df['Factory C'],df.index)],\n", + " 'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.3, 1), 'fontsize': 8},\n", + " 'title': {'label': 'Vehicle Production of Factory C', 'loc': 'left', 'fontsize': 12}\n", + " }\n", + "\n", + "fig = plt.figure(\n", + " FigureClass=Waffle,\n", + " plots={\n", + " 311: plot1,\n", + " 312: plot2,\n", + " 313: plot3,\n", + " },\n", + " rows=5, # Outside parameter applied to all subplots, same as below\n", + " cmap_name=\"Accent\", # Change color with cmap\n", + " rounding_rule='ceil', # Change rounding rule, so value less than 1000 will still have at least 1 block\n", + " figsize=(8, 6)\n", + ")\n", + "\n", + "# Add a title and a small detail at the bottom\n", + "fig.suptitle('Vehicle Production by Vehicle Type', fontsize=14, fontweight='bold')\n", + "fig.supxlabel('1 block = 1000 vehicles',\n", + " fontsize=8,\n", + " x=0.14, # position at the 14% axis\n", + " )\n", + "fig.set_facecolor('#EEEDE7')\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Going further\n", + "\n", + "This post explains how to create a [waffle chart](https://python-graph-gallery.com/waffle-chart/) with grouping.\n", + "\n", + "For more examples of **how to create or customize** your waffle chart, see the [waffle section](https://python-graph-gallery.com/waffle-chart/). You may also be interested in how to [change the icons](https://python-graph-gallery.com/503-waffle-chart-introduction/)." + ] + } + ], + "metadata": { + "chartType": "waffle", + "description": "A waffle chart is a graphical representation of data points in a dataset, where individual data points are represented by small squares on a two-dimensional grid. This type of plot allows us to visualize the distribution of categorical data by showing the proportion or count of each category within the grid.
Matplotlib and Pywaffle allows us to create waffle charts easily. You can check this introduction to waffle charts. In this post, we will explore how to leverage Matplotlib to customize waffle charts with a grouping variable.", + "family": "partOfAWhole", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "keywords": "waffle, grouping, matplotlib, chart, plot", + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + }, + "seoDescription": "How to create a waffle chart with additional grouping", + "slug": "541-waffle-chart-with-additionnal-grouping", + "title": "Waffle chart with additionnal grouping" + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/src/notebooks/547-stacked-barplots-with-pandas.ipynb b/src/notebooks/547-stacked-barplots-with-pandas.ipynb new file mode 100644 index 0000000000..8121ff4ffd --- /dev/null +++ b/src/notebooks/547-stacked-barplots-with-pandas.ipynb @@ -0,0 +1,199 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Libraries\n", + "\n", + "[Pandas](https://python-graph-gallery.com/pandas/) is a popular open-source Python library used for data manipulation and analysis. It provides data structures and functions that make working with structured data, such as tabular data (like `Excel` spreadsheets or `SQL` tables), easy and intuitive.\n", + "\n", + "To install [Pandas](https://python-graph-gallery.com/pandas/), you can use the **following command** in your command-line interface (such as `Terminal` or `Command Prompt`):\n", + "\n", + "`pip install pandas`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Matplotlib](https://python-graph-gallery.com/matplotlib/) functionalities have been **integrated into the pandas** library, facilitating their use with `dataframes` and `series`. For this reason, you might also need to **import the [matplotlib](https://python-graph-gallery.com/matplotlib/) library** when building charts with [Pandas](https://python-graph-gallery.com/pandas/).\n", + "\n", + "This also means that they use the **same functions**, and if you already know [Matplotlib](https://python-graph-gallery.com/matplotlib/), you'll have no trouble learning plots with [Pandas](https://python-graph-gallery.com/pandas/)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Dataset\n", + "\n", + "In order to create graphics with [Pandas](https://python-graph-gallery.com/pandas/), we need to use **pandas objects**: `Dataframes` and `Series`. A dataframe can be seen as an `Excel` table, and a series as a `column` in that table. This means that we must **systematically** convert our data into a format used by pandas.\n", + "\n", + "We generate 3 variables: 2 quantitative using `np.random.uniform()` and `np.random.normal()` functions and one qualitative, whose values **depend** on the values of the first qualitative variable. " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "data = {\n", + " \"Product\": [\"Product A\", \"Product A\", \"Product A\", \"Product B\", \"Product B\", \"Product B\"],\n", + " \"Segment\": [\"Segment 1\", \"Segment 2\", \"Segment 3\", \"Segment 1\", \"Segment 2\", \"Segment 3\"],\n", + " \"Amount_sold\": [100, 120, 120, 80, 160, 150]\n", + "}\n", + "\n", + "df = pd.DataFrame(data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Simple stacked barplot\n", + "\n", + "Once we've opened our dataset, we'll now **create the graph**. \n", + "\n", + "This dataset represents sales data for different products (`Product A` and `Product B`) across various segments (`Segment 1`, `Segment 2`, and `Segment `3). The `\"Amount_sold\"` column represents the **quantity of each product sold** within each segment.\n", + "\n", + "The `pivot()` function is used in this context to **reshape the original DataFrame** into a format suitable for creating a grouped [barplot](https://python-graph-gallery.com/barplot/). In a grouped [barplot](https://python-graph-gallery.com/barplot/), you typically want each category (in this case, each `product`) to have its own set of bars grouped by another categorical variable (in this case, the `segments`).\n", + "\n", + "Then, we put `stacked=True` in order to specify that we want the bars on top of each others" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Pivot the data to have 'Product' as columns and 'Segment' as the index\n", + "pivot_df = df.pivot(index='Segment',\n", + " columns='Product',\n", + " values='Amount_sold')\n", + "\n", + "# Create a grouped barplot\n", + "pivot_df.plot.bar(stacked=True,\n", + " grid=True)\n", + "\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Stacked barplot 100%\n", + "\n", + "In order to make the graph **100% stacked**, we have to modify the original dataset so that everything has the **same scale** and the sum faces 100%. We use the `div()` function from [pandas](https://python-graph-gallery.com/pandas/). " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Pivot the data to have 'Product' as columns and 'Segment' as the index\n", + "pivot_df = df.pivot(index='Segment',\n", + " columns='Product',\n", + " values='Amount_sold')\n", + "\n", + "# New dataframe with values on a 100% scale\n", + "pivot_df_percentage = pivot_df.div(pivot_df.sum(axis=1), axis=0) * 100\n", + "\n", + "# Create a grouped barplot\n", + "pivot_df_percentage.plot.bar(stacked=True,\n", + " grid=True)\n", + "\n", + "# Add a legend\n", + "plt.legend(bbox_to_anchor=(1.04, 1), # shift the legend 4% on the right\n", + " loc='upper left')\n", + "\n", + "# Display the plot\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Going further\n", + "\n", + "This post explains how to create a stacked barplot with [pandas](https://python-graph-gallery.com/pandas/).\n", + "\n", + "For more examples of **how to create or customize** your barplots, see the [barplot section](https://python-graph-gallery.com/barplot/). You may also be interested in how to [customize your barplot with pandas](https://python-graph-gallery.com/539-customizing-barplot-with-pandas/)." + ] + } + ], + "metadata": { + "chartType": "barplot", + "description": "A barplot is a graphical representation of data points in a dataset, where individual data points are represented by rectangular bars on a two-dimensional coordinate system. This type of plot allows us to visualize the distribution of categorical data by showing the frequency or count of each category along the plot.
Pandas, a powerful data manipulation library in Python, allow us to create easily barplots: check this introduction to barplots with pandas. In this post, we will explore how to leverage Pandas to create a stacked barplot.", + "family": "ranking", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "keywords": "barplot, stacked, pandas, matplotlib, chart", + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + }, + "seoDescription": "How to create a stacked barplot with pandas", + "slug": "547-stacked-barplots-with-pandas", + "title": "Stacked Barplot with Pandas" + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/src/pages/barplot.js b/src/pages/barplot.js index ee958536df..0f71127bf2 100644 --- a/src/pages/barplot.js +++ b/src/pages/barplot.js @@ -164,6 +164,16 @@ export default function Barplot() { caption="Highly customized barplot with colors, legend, labels and more." linkTo="/10-barplot-with-number-of-observation" /> + + diff --git a/src/pages/waffle-chart.js b/src/pages/waffle-chart.js index b4d734cbdd..992113b9e5 100644 --- a/src/pages/waffle-chart.js +++ b/src/pages/waffle-chart.js @@ -122,6 +122,11 @@ export default function Waffle() { caption="Waffle chart with proportions in the legend." linkTo="/503-waffle-chart-introduction" /> + diff --git a/static/graph/541-waffle-chart-with-additionnal-grouping.png b/static/graph/541-waffle-chart-with-additionnal-grouping.png new file mode 100644 index 0000000000..659c4cf758 Binary files /dev/null and b/static/graph/541-waffle-chart-with-additionnal-grouping.png differ diff --git a/static/graph/547-stacked-barplots-with-pandas-1.png b/static/graph/547-stacked-barplots-with-pandas-1.png new file mode 100644 index 0000000000..ce8bf5ed79 Binary files /dev/null and b/static/graph/547-stacked-barplots-with-pandas-1.png differ diff --git a/static/graph/547-stacked-barplots-with-pandas-2.png b/static/graph/547-stacked-barplots-with-pandas-2.png new file mode 100644 index 0000000000..9991111c15 Binary files /dev/null and b/static/graph/547-stacked-barplots-with-pandas-2.png differ