From 121cfce66fc455b91f1c2b13fbd540f78d66c208 Mon Sep 17 00:00:00 2001 From: Robin Engler Date: Mon, 25 Sep 2023 15:52:03 +0200 Subject: [PATCH] notebooks 0,1,3: cosmetic changes to standardize notebooks --- notebooks/00_jupyter_setup.ipynb | 9 +- notebooks/01_python_basics.ipynb | 38 +++-- notebooks/03_reading_writing_files.ipynb | 204 +++++++++++------------ 3 files changed, 126 insertions(+), 125 deletions(-) diff --git a/notebooks/00_jupyter_setup.ipynb b/notebooks/00_jupyter_setup.ipynb index 379b201..f092bc7 100644 --- a/notebooks/00_jupyter_setup.ipynb +++ b/notebooks/00_jupyter_setup.ipynb @@ -28,7 +28,7 @@ "    [exercise 0.1](#5.1) \n", "    [exercise 0.2](#5.2) \n", "\n", - "**[Additional material](#6)** \n", + "**[Additional Material](#6)** \n", "    [Restarting the Jupyter Notebook kernel](#6.1) \n", "    [Configuring Jupyter](#6.2) " ] @@ -299,9 +299,14 @@ "\n", "[Back to ToC](#toc)\n", "\n", + "
\n", + "\n", "# Additional material \n", "------------------------------\n", "\n", + "
\n", + "\n", + "\n", "
\n", "\n", "## Restarting the Jupyter Notebook kernel \n", @@ -400,7 +405,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/notebooks/01_python_basics.ipynb b/notebooks/01_python_basics.ipynb index e39926c..1ae33c3 100644 --- a/notebooks/01_python_basics.ipynb +++ b/notebooks/01_python_basics.ipynb @@ -50,7 +50,7 @@ "\n", "[**Exercises 1.1 - 1.4**](#27)\n", "\n", - "[**Additional Theory**](#28) \n", + "[**Additional Material**](#28) \n", "    [Mutability of objects in Python](#29) \n", "    [A solution: explicit deep copy](#30) " ] @@ -2304,22 +2304,26 @@ "\n", "## Exercises 1.1 - 1.5 \n", "\n", + "If you have time, feel free to try the **additional exercises** for module 1.\n", + "\n", "
\n", "
\n", "
\n", "\n", "[Back to ToC](#toc)\n", "\n", - "# Additional Theory \n", - "-----------------------------\n", + "
\n", "\n", - "If you have time, feel free to try the **additional exercises** for module 1.\n", + "# Additional Material \n", + "-------------------------------------\n", + "\n", + "
\n", "\n", "
\n", "\n", "### Mutability of objects in Python \n", "\n", - "All objects in Python can be either **mutable** or **immutable**. This is an important notation that newcomers to Python need to be aware of, which otherwise can lead to serious bugs in our codes.\n", + "All objects in Python can be either **mutable** or **immutable**. This is an important notion that newcomers to Python need to be aware of, which otherwise can lead to serious bugs in our codes.\n", "\n", "What do we mean by *mutable*? We learnt earlier that **everything in Python is an object** and every variable holds an instance of an object. Once its type is set at runtime it can never change. A list is always a list, an integer is always an integer. However its value can be modified if it is mutable.\n", "\n", @@ -2365,7 +2369,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's try to modify an element (an individual char) in a string: it raises a **`TypeError`** because a string in an immutable type." + "Let's try to modify an element (an individual char) in a string: it raises a **`TypeError`** because a string in an **immutable type**." ] }, { @@ -2387,7 +2391,7 @@ } }, "source": [ - "Let's try to modify an element in a list: this is possible, because a list is a mutable type." + "Let's try to modify an element in a list: this is possible, because a list is a **mutable type**." ] }, { @@ -2409,7 +2413,7 @@ } }, "source": [ - "However, the *immutable* cousin of list, the tuple, does not allow assignment:" + "However, the *immutable* cousin of `list`, the `tuple`, does not allow assignment:" ] }, { @@ -2459,9 +2463,10 @@ "metadata": {}, "outputs": [], "source": [ - "# Now let's modify my_dict\n", + "# Let's now modify my_dict...\n", "my_dict[\"list\"][0] = \"P\"\n", - "# and see what happens to both dictionaries\n", + "\n", + "# ... and see what happens to both dictionaries.\n", "print(\"my_dict:\", my_dict)\n", "print(\"another_dict:\", another_dict)" ] @@ -2504,6 +2509,7 @@ "outputs": [], "source": [ "my_dict[\"str\"] = \"Zython\"\n", + "\n", "print(\"my_dict:\", my_dict)\n", "print(\"another_dict:\", another_dict)" ] @@ -2517,6 +2523,7 @@ "a_third_dict = my_dict.copy()\n", "my_dict[\"str\"] = \"Kython\"\n", "my_dict[\"list\"][0] = \"K\"\n", + "\n", "print(\"my_dict:\", my_dict)\n", "print(\"third_dict:\", a_third_dict)" ] @@ -2589,17 +2596,17 @@ "source": [ "#### Python memory management: interned vs non-interned values\n", "\n", - "> Integer values from -5 to 256 are \"interned\", which means they are created once\n", + "> Integer values from -5 to 256 are **\"interned\"**, which means that they are created once\n", "> and then re-used over the entire runtime of the python program/session.\n", "\n", "```py\n", " a = 256\n", " b = 256\n", - " print(\"Are 'a' and 'b' pointing to the same object in memory:\", a is b)\n", + " print(\"Are 'a' and 'b' pointing to the same object in memory?:\", a is b)\n", " print(\"Memory locations of the 2 objects:\", id(a), id(b), sep=\"\\n\", end=\"\\n\\n\")\n", "```\n", "```text\n", - " Are 'a' and 'b' pointing to the same object in memory: True\n", + " Are 'a' and 'b' pointing to the same object in memory?: True\n", " Memory locations of the 2 objects:\n", " 9801248\n", " 9801248\n", @@ -2671,7 +2678,7 @@ "
\n", "\n", "### Benchmarking: looping speed of tuples vs lists\n", - "* As can be tested below, there is no speed difference between lists and tuples.\n", + "* As can be tested below, there is no speed difference between `lists` and `tuples`.\n", "* Generators are faster (probably because they skip the step where elements of the sequence must\n", " be stored in memory)." ] @@ -2683,6 +2690,7 @@ "outputs": [], "source": [ "# Functions that do nothing but loop through a list, tuple or generator.\n", + "\n", "loop_replicates = 1000000\n", "\n", "def loop_range():\n", @@ -2720,7 +2728,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/notebooks/03_reading_writing_files.ipynb b/notebooks/03_reading_writing_files.ipynb index 4def96b..91ddb51 100644 --- a/notebooks/03_reading_writing_files.ipynb +++ b/notebooks/03_reading_writing_files.ipynb @@ -4,36 +4,33 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Table of Content \n", - "\n", - "\n", - "    [Module 3 - Reading and writing files](#0)\n", - "\n", - "        [Where is my file](#1)\n", - "\n", - "        [File opening modes](#2)\n", - "\n", - "        [Reading from files](#3)\n", + "# Module 3 - Reading and writing files \n", + "--------------------------------------------------------\n", "\n", - "            [Reading lines manually](#4)\n", "\n", - "            [End-of-line characters](#5)\n", + "## Table of Content \n", "\n", - "        [Writing to files](#7)\n", "\n", - "        [Exercises 3.1, 3.2 and 3.3](#8)\n", + "    [**Introduction**](#0)\n", "\n", - "    [Additional Theory](#9)\n", + "    [**Where is my file**](#1)\n", "\n", - "        [Reading a file's entire content at once](#readlines)\n", + "    [**File opening modes**](#2)\n", "\n", - "        [Easier reading of .csv formatted file](#6)\n", + "    [**Reading from files**](#3) \n", + "        [Reading lines manually](#3.1) \n", + "        [End-of-line characters](#3.2) \n", "\n", - "        [Opening files without context managers](#10)\n", + "    [**Writing to files**](#4)\n", "\n", - "        [Reading files using a while loop](#11)\n", + "    [**Exercises 3.1 and 3.2**](#5)\n", "\n", - "            [Some new cool syntax for Python >= 3.8](#12)" + "    [**Additional Material**](#6) \n", + "        [Reading a file's entire content at once](#6.1) \n", + "        [Easier reading of .csv formatted file](#6.2) \n", + "        [Opening files without context managers](#6.3) \n", + "        [Reading files using a while loop](#6.4) \n", + "        [Some new cool syntax for Python >= 3.8](#6.5) " ] }, { @@ -42,21 +39,18 @@ "source": [ "
\n", "\n", - "# Module 3 - Reading and writing files \n", - "--------------------------------------------------------\n", "\n", + "## Introduction \n", "\n", "In many use cases, you will want your python code to **read/write from/to files** stored on your local hard drive. \n", "Here are a few important points to consider when working with files:\n", "\n", "* Where is my file ?\n", - "\n", "* Do I need to read the entire dataset/file into memory?\n", " * remember that accessing the hard drive is among the slower operations a computer performs. \n", " Reading an entire file when you only need the first few lines will be costly\n", " * if you are reading a very large file, then having the entire file in memory at \n", - " once may overburden your computer\n", - " \n", + " once may overburden your computer. \n", "* Are there concurrency issues?\n", " * if another software (or even your code if you have messed up) writes to a file you \n", " are currently reading, you could run into trouble.\n", @@ -79,14 +73,9 @@ "\n", "```\n", "\n", - "\n", - "
\n", - "\n", - "[back to the toc](#toc)\n", - "\n", "
\n", "\n", - "## Where is my file \n", + "## Where is my file ? \n", "\n", "This is the very first step. \n", "Without a good idea of\n", @@ -101,18 +90,20 @@ "\n", "If the file is the same folder as the code, then you can just use the name of the file, no need for further modification.\n", "\n", - "If the file is elsewhere, you will have to specify a path to the file, either:\n", - " * absolute path : from the root of the computer to the file. eg,\n", - " - `'C:\\Users\\JohnDoe\\Desktop\\ProjectP\\myFile.txt'` (Windows)\n", - " - `'/home/JaneDoe/Documents/ProjectP/data/myFile.txt'` (Linux,Mac)\n", - " * relative path : from you code to the file. eg,\n", - " - `'data/myFile.txt'` (the file is in subfolder data)\n", - " - `'../otherProject/myFile.txt'` (more complex, the file is in a subfolder of the parent folder)\n", + "If the file is elsewhere, you will have to specify a path to the file. This can be either a:\n", + " \n", + " * **Absolute path:** from the root of the computer to the file. eg,\n", + " - `\"C:\\Users\\JohnDoe\\Desktop\\ProjectP\\myFile.txt\"` (Windows)\n", + " - `\"/home/JaneDoe/Documents/ProjectP/data/myFile.txt\"` (Linux,Mac)\n", + " \n", + " * **Relative path:** from you code to the file. eg,\n", + " - `\"data/myFile.txt\"` (the file is in subfolder data)\n", + " - `\"../otherProject/myFile.txt\"` (more complex, the file is in a subfolder of the parent folder)\n", "\n", + "
\n", "\n", "This last case depict a situation like this :\n", "```\n", - "\n", "parentFolder:\n", " | \n", " |- ProjectA:\n", @@ -122,19 +113,18 @@ " |- otherProject:\n", " |\n", " |- myFile.txt\n", - "\n", - "```\n", - "\n" + "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "[back to the toc](#toc)\n", - "\n", + "
\n", "
\n", "\n", + "[Back to ToC](#toc)\n", + "\n", "## File opening modes \n", "When using the `open()` function, a **mode** can be passed as argument to the function. This specifies the type of access you will have on the file. For instance, the `'r'` mode will only allow to read the content of a file, and will not allow writing to it (this is useful to avoid accidental writing to the file).\n", "\n", @@ -153,22 +143,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "
\n", "
\n", "\n", + "[Back to ToC](#toc)\n", "\n", - "[back to toc](#toc)\n", - "\n", - "
\n", "\n", "## Reading from files \n", "To start reading a file, one creates a **file object** using `open` function with `mode='r'` . \n", "\n", - "\n", - "[back to the toc](#toc)\n", - "\n", "
\n", "\n", - "### Reading lines manually \n", + "### Reading lines manually \n", "\n", "When reading a file with python, you have to consider your **file object** a little bit like a cursor which starts at the very beginning of your file, and progresses toward the end of the file (it can go backward, but it is often a bit hacky to do so).\n", "\n", @@ -207,33 +193,33 @@ "metadata": {}, "outputs": [], "source": [ - "with open(\"data/fresh_fruits.txt\" , mode='r') as reading_handle:\n", + "with open(\"data/fresh_fruits.txt\" , mode=\"r\") as reading_handle:\n", "\n", " line = reading_handle.readline() # This function reads a single line from the file.\n", " print(line) # Print the line to the screen.\n", "\n", " line = reading_handle.readline()\n", - " print('line:') \n", + " print(\"line:\")\n", " print(line) \n", " line = reading_handle.readline()\n", - " print('line:') \n", + " print(\"line:\")\n", " print(line) \n", " line = reading_handle.readline()\n", - " print('line:') \n", + " print(\"line:\")\n", " print(line) \n", " line = reading_handle.readline()\n", - " print('line:') \n", + " print(\"line:\")\n", " print(line) \n", " # problem : how many time should I do this ?\n", "\n", " line = reading_handle.readline()\n", - " print('line:') \n", + " print(\"line:\")\n", " print(line) \n", " line = reading_handle.readline()\n", - " print('line:') \n", + " print(\"line:\")\n", " print(line) \n", " line = reading_handle.readline()\n", - " print('line:') \n", + " print(\"line:\")\n", " print(line) \n" ] }, @@ -271,13 +257,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", "
\n", - "
\n", - "\n", - "[back to toc](#toc)\n", "\n", - "### End-of-line characters \n", + "### End-of-line characters \n", "As you can see in the example above, there are additional empty lines in between our prints. This is because the lines are read from the file with their **end-of-line** characters, which generally is `\\n` . \n", "\n", "To avoid this issue, one typically uses the **`.strip()`** method of strings, which removes any whitespace or end-of-line character at the start or end of the string.\n", @@ -327,9 +309,9 @@ "
\n", "\n", "\n", - "[back to the toc](#toc)\n", + "[Back to ToC](#toc)\n", "\n", - "## Writing to files \n", + "## Writing to files \n", "Writing to a file is achieved in pretty much the same way as reading from it, but the opening mode is now **`\"w\"`**. \n", "And instead of reading lines, we now `print()` them to the file." ] @@ -387,11 +369,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Additional info\n", + "
\n", + "\n", + "#### Additional material\n", "\n", "You might sometimes see some Python code - especially older ones - that uses the **`.write()`** method of the **file object**.\n", "\n", - "There are some differences between the `print()` method and `.write()`; the most important one is that `.write()` will not do any formatting and even the end-of-line (carriage return) characters need to be manually written." + "There are some differences between the `print()` method and `.write()`; the most important one is that `.write()` will not do any formatting and even the end-of-line (carriage return) characters need to be manually written.\n", + "\n", + "
" ] }, { @@ -440,7 +426,7 @@ "
\n", "
\n", "\n", - "## Exercises 3.1 and 3.2 \n" + "## Exercises 3.1 and 3.2 \n" ] }, { @@ -451,20 +437,22 @@ "
\n", "
\n", "\n", - "[back to toc](#toc)\n", + "[Back to ToC](#toc)\n", "\n", + "
\n", "\n", - "# Additional Theory \n", - "-----------------------------" + "# Additional Material \n", + "-------------------------------------\n", + "\n", + "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "
\n", "\n", - "### Reading a file's entire content at once \n", + "### Reading a file's entire content at once \n", "\n", "Here is another way to read the fruity content of our file: the **`readlines()`** function (note the \"s\" in the name).\n", "\n", @@ -505,12 +493,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "
\n", "
\n", "\n", - "[back to the toc](#toc)\n", - "\n", - "### Easier reading of .csv formatted file \n", + "### Easier reading of .csv formatted file \n", "\n", "`csv` (**C**omma **S**eparated **V**alue) is one of the most common file format when it comes to storing tabular data. In this format, each line contain a fixed number of values (columns), separated by a specific character (typically `','`).\n", "\n", @@ -528,22 +513,23 @@ "source": [ "data = []\n", "\n", - "with open('data/titanic_head.csv') as IN:\n", + "with open(\"data/titanic_head.csv\") as IN:\n", " \n", " line = IN.readline()\n", - " # the column names are in the first line\n", + " \n", + " # The column names are in the first line\n", " columnNames = line.strip().split(',') # .split(',') is our best ally here : it cuts a str into a list \n", " \n", " for line in IN:\n", - " sl = line.strip().split(',') ## split the line in its different fields\n", + " sl = line.strip().split(',') ## Split the line in its different fields.\n", " \n", - " # now we map the fields onto their constituent columns\n", + " # Now we map the fields onto their constituent columns.\n", " row = {}\n", " for i in range( len(sl) ):\n", " row[ columnNames[i] ] = sl[i]\n", "\n", "\n", - " data.append(row) # store the row dictionary\n", + " data.append(row) # Store the row dictionary.\n", " \n", "\n", "print('full data:')\n", @@ -561,7 +547,7 @@ "source": [ "OK, this works, but it is a bit tedious to write.\n", "\n", - "Because csv is such a classical format, python actually contains things that can help us out:" + "Because csv is such a classical format, python comes with functions that that can help us out:" ] }, { @@ -576,10 +562,9 @@ "\n", "with open('data/titanic_head.csv') as f:\n", " \n", - " reader = csv.DictReader(f) # returns a DictReader object.\n", + " reader = csv.DictReader(f) # Returns a DictReader object.\n", " for row in reader:\n", - " ## row is a dictionary whose keys correspond to the columns!\n", - " data.append(row)\n", + " data.append(row) # Row is a dictionary whose keys correspond to the columns!\n", "\n", "for row in data:\n", " print(row)" @@ -591,13 +576,13 @@ "metadata": {}, "outputs": [], "source": [ - "print('full data:')\n", + "print(\"full data:\"\")\n", "for row in data:\n", " print(row)\n", "\n", - "print('***')\n", - "print('Name of passenger 4 : ' , data[4]['Name'])\n", - "print('Age of passenger 4 : ' , data[4]['Age'])" + "print(\"***\")\n", + "print(\"Name of passenger 4 : \" , data[4][\"Name\"])\n", + "print(\"Age of passenger 4 : \" , data[4][\"Age\"])" ] }, { @@ -622,25 +607,20 @@ "metadata": {}, "outputs": [], "source": [ - "import pandas as pd #ignore this, we'll talk about it in the next notebook\n", + "import pandas as pd # ignore this, we'll talk about it in the next notebook.\n", "\n", - "df = pd.read_csv( 'data/titanic_head.csv' ) \n", - "# reading the csv file as a pandas.DataFrame, their custom type for tabular data\n", + "# Reading the csv file as a pandas.DataFrame, their custom type for tabular data.\n", + "df = pd.read_csv(\"data/titanic_head.csv\") \n", "df" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Opening files without context managers \n", + "
\n", + "\n", + "### Opening files without context managers \n", "\n", "Rather than creating a specific block using a `with` statement, you can `open()` a file directly:" ] @@ -651,7 +631,7 @@ "metadata": {}, "outputs": [], "source": [ - "fileHandle = open(\"data/fresh_fruits.txt\", 'r')\n", + "fileHandle = open(\"data/fresh_fruits.txt\", \"r\")\n", "for i, line in enumerate(fileHandle):\n", " print(\"line\", i, \":\", line.strip())\n", " \n", @@ -672,7 +652,7 @@ "source": [ "
\n", "\n", - "### Reading files using a while loop \n", + "### Reading files using a while loop \n", "\n", "Here is an example of file reading where, instead of a for loop, we use a while loop and `.readine()`." ] @@ -683,7 +663,7 @@ "metadata": {}, "outputs": [], "source": [ - "reading_handle = open(\"data/fresh_fruits.txt\", 'r')\n", + "reading_handle = open(\"data/fresh_fruits.txt\", \"r\")\n", "i = 0\n", "line = reading_handle.readline()\n", "\n", @@ -705,7 +685,7 @@ "source": [ "
\n", "\n", - "### Some new cool syntax for Python >= 3.8 \n", + "### The walrus operator: a new syntax for Python >= 3.8 \n", "\n", "Starting with Python 3.8, a new operator **`:=`** (a.k.a, the **walrus operator**) allows to do a variable assignment (`line` in the example below), while at the same time evaluating an expression.\n", "\n", @@ -720,13 +700,21 @@ }, "outputs": [], "source": [ - "\n", "with open(\"data/fresh_fruits.txt\", \"r\") as f:\n", " i = 0\n", " while (line := f.readline()): # := assigns values to variables as part of a larger expression. \n", " print(\"line\", i, \":\", line.strip()) # It is known as the \"walrus operator” and it works really well\n", " i += 1 # together with the while-loop\n" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "[Back to ToC](#toc)" + ] } ], "metadata": { @@ -745,7 +733,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.10.12" } }, "nbformat": 4,