diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/01-introduction.html b/01-introduction.html new file mode 100644 index 0000000..c91ff79 --- /dev/null +++ b/01-introduction.html @@ -0,0 +1,648 @@ + +Python for Official Statistics: Introduction +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Introduction

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What is programming?
  • +
  • How do I document code?
  • +
  • How do I find reliable and safe resources or code online?
  • +
+
+
+
+
+
+

Objectives

+
  • identify basic concepts in programming
  • +
+
+
+
+
+

Programming in Python +

+

In most general terms, programming is the process of writing +instructions for a computer. In this course we will be using Python as +the language to communicate with the computer.

+
+

Strictly speaking, Python is an interpreted language, rather than a +compiled language, meaning we are not communicating directly with the +computer when we use Python. When we run Python code, our Python source +code is first translated into byte code, which is then executed by the +Python virtual machine.

+
+

Programming is a wide topic including a variety of techniques and +tools. In this course we’ll be focusing on programming for statistical +analysis.

+
+

IDEs

+

IDE stands for Integrated Development Environment. IDEs are where you +will write, edit, and debug python scripts, so you want to choose one +that makes you feel comfortable and includes the functionality that you +need. Some open-source IDEs for Python include JupyterLab and Visual Studio +Code.

+
+
+

Packages

+

Packages, or libraries, are extensions to the statistical programming +language. They contain code, data, and documentation in a standardised +collection format that can be installed by users, typically via a +centralised software repository. A typical Python workflow will use base +Python (the core operations and functions provided by your Python +installation) as well as specialised data analysis and scientific +packages like NumPy, SciPy and Pandas.

+
+

Best Practices +

+

Let’s overview some base concepts that any programmer should always +keep in mind.

+
+

Documentation

+

Have you ever returned to a task and tried to read a note that you +quickly scrawled for yourself the last time you were working on it? Have +you ever inherited a project from a colleague and found you have no idea +what remains to be done?

+

It can be very challenging to return to your own work or a +colleague’s and this goes doubly for programming. Documentation is one +way we can reduce the burden on future selves and our colleagues.

+
+

Inline Documentation

+

As a new programmer, inline documentation can be the most helpful. +Inline documentation refers to writing comments on the same line as your +code. For example, if we wrote a line of code to sum 1+1, we might +document it as follows:

+
+

PYTHON +

+
1+1         # adding the numbers 1 and 1 together.
+
+

Although this is a very simple line of code and it might seem like +overkill to document it in this way, these types of comments can be very +helpful in jogging your memory when returning to a project. Inline +comments can also help you to break multi-step programs into digestible +and readable pieces.

+
+
+

External Documentation

+

Sometimes you require more detail than you can comfortably fit in +your inline documentation. In this case it can be helpful to create +separate files to document your project. This type of documentation will +typically focus on the goals, scope, and any special instructions +relating to your project rather than the details fo your code. The most +common type of external documentation is a README file. It is best +practice to create a basic README file for any project. A basic README +should include:

+
  • a brief description of the project,
  • +
  • any special instructions for installation or use,
  • +
  • the authors and any references.
  • +

README files are just text files and it is best practice is to save +your README file as a README.md markdown document. This +file format is automatically recognised by code repositories like +GitHub, so your README contents are displayed alongside your code +repository.

+
+
+

DocStrings

+

In chapter 7: functions we’ll learn +about documentation specific to functions known as DocStrings.

+
+
+

Getting Help +

+

Later on, in chapter 10: Errors +and Exceptions we will cover errors in more detail. However, before +we get there it’s very likely you’ll need some assistance writing Python +code.

+
+

Built-in Help

+

There is a help +function built into base Python. You can use it to investigate +built-in functions, data types, and more. For example, say we want to +know more about the print() function in Python:

+
+

PYTHON +

+
help(print)
+
+
+

OUTPUT +

+
Help on built-in function print in module builtins:
+
+print(...)
+    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
+
+    Prints the values to a stream, or to sys.stdout by default.
+    Optional keyword arguments:
+    file:  a file-like object (stream); defaults to the current sys.stdout.
+    sep:   string inserted between values, default a space.
+    end:   string appended after the last value, default a newline.
+-- More  --
+
+
+
+

Finding Resources online

+

Stack Overflow is a valuable +resource for programmers of all levels. It can be daunting to post your +own question! Fortunately, chances are someone else has already asked a +similar question!

+

The Official Python +Documentation is another great resource.

+

It can also be helpful to do a general search for a particular topic +or error message. It’s very likely the first few results will be from +StackOverflow, followed by a few from official documentation and then +you may start seeing results from personal blogs or third parties. These +third party results can sometime be valuable but we should be cautious! +Here are a few things to keep in mind when you are looking for online +resources:

+
  1. Don’t download or install anything unless you are certain of what it +is and why you need it.
  2. +
  3. Don’t copy or run code unless you fully understand what it +does.
  4. +
  5. Python is an open-source language; official documentation and +resources will not be behind a paywall.
  6. +
  7. You may not find a resource or solution to fit your exact needs. Try +to be flexible and adapt online solutions to fit your needs.
  8. +
+
+ +
+
+

Key Points +

+
+
  • Python is an interpreted language.
  • +
  • Code is commonly developed inside an integrated development +environment.
  • +
  • A typical Python workflow uses base Python and additional Python +packages developed for statistical programming purposes.
  • +
  • In-line and external documentation helps ensure that your code is +readable.
  • +
  • You can find help through the built-in help function and external +resources.
  • +
+
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/02-python_fundamentals.html b/02-python_fundamentals.html new file mode 100644 index 0000000..bf8cacc --- /dev/null +++ b/02-python_fundamentals.html @@ -0,0 +1,849 @@ + +Python for Official Statistics: Python Fundamentals +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Python Fundamentals

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What basic data types can I work with in Python?
  • +
  • How can I create a new variable in Python?
  • +
  • How do I use a function?
  • +
  • Can I change the value associated with a variable after I create +it?
  • +
+
+
+
+
+
+

Objectives

+
  • Assign values to variables.
  • +
+
+
+
+
+

Variables +

+

Any Python interpreter can be used as a calculator:

+
+

PYTHON +

+
3 + 5 * 4
+
+
+

OUTPUT +

+
23
+
+

This is great but not very interesting. To do anything useful with +data, we need to assign its value to a variable. In Python, we +can assign a value to a variable, using the equals sign +=. For example, we can track the weight of a patient who +weighs 60 kilograms by assigning the value 60 to a variable +weight_kg:

+
+

PYTHON +

+
weight_kg = 60
+
+

From now on, whenever we use weight_kg, Python will +substitute the value we assigned to it. In layperson’s terms, a +variable is a name for a value.

+

In Python, variable names:

+
  • can include letters, digits, and underscores
  • +
  • cannot start with a digit
  • +
  • are case sensitive.
  • +

This means that, for example:

+
  • +weight0 is a valid variable name, whereas +0weight is not
  • +
  • +weight and Weight are different +variables
  • +

Types of data +

+

Python knows various types of data. Three common ones are:

+
  • integer numbers
  • +
  • floating point numbers, and
  • +
  • strings.
  • +

In the example above, variable weight_kg has an integer +value of 60. If we want to more precisely track the weight +of our patient, we can use a floating point value by executing:

+
+

PYTHON +

+
weight_kg = 60.3
+
+

To create a string, we add single or double quotes around some text. +To identify and track a patient throughout our study, we can assign each +person a unique identifier by storing it in a string:

+
+

PYTHON +

+
patient_id = '001'
+
+

Using Variables in Python +

+

Once we have data stored with variable names, we can make use of it +in calculations. We may want to store our patient’s weight in pounds as +well as kilograms:

+
+

PYTHON +

+
weight_lb = 2.2 * weight_kg
+
+

We might decide to add a prefix to our patient identifier:

+
+

PYTHON +

+
patient_id = 'inflam_' + patient_id
+
+

Built-in Python functions +

+

To carry out common tasks with data and variables in Python, the +language provides us with several built-in functions. To display information to +the screen, we use the print function:

+
+

PYTHON +

+
print(weight_lb)
+print(patient_id)
+
+
+

OUTPUT +

+
132.66
+inflam_001
+
+

When we want to make use of a function, referred to as calling the +function, we follow its name by parentheses. The parentheses are +important: if you leave them off, the function doesn’t actually run! +Sometimes you will include values or variables inside the parentheses +for the function to use. In the case of print, we use the +parentheses to tell the function what value we want to display. We will +learn more about how functions work and how to create our own in later +episodes.

+

We can display multiple things at once using only one +print call:

+
+

PYTHON +

+
print(patient_id, 'weight in kilograms:', weight_kg)
+
+
+

OUTPUT +

+
inflam_001 weight in kilograms: 60.3
+
+

We can also call a function inside of another function call. For example, +Python has a built-in function called type that tells you a +value’s data type:

+
+

PYTHON +

+
print(type(60.3))
+print(type(patient_id))
+
+
+

OUTPUT +

+
<class 'float'>
+<class 'str'>
+
+

Moreover, we can do arithmetic with variables right inside the +print function:

+
+

PYTHON +

+
print('weight in pounds:', 2.2 * weight_kg)
+
+
+

OUTPUT +

+
weight in pounds: 132.66
+
+

The above command, however, did not change the value of +weight_kg:

+
+

PYTHON +

+
print(weight_kg)
+
+
+

OUTPUT +

+
60.3
+
+

To change the value of the weight_kg variable, we have +to assign weight_kg a new value using the +equals = sign:

+
+

PYTHON +

+
weight_kg = 65.0
+print('weight in kilograms is now:', weight_kg)
+
+
+

OUTPUT +

+
weight in kilograms is now: 65.0
+
+
+
+ +
+
+

Variables as Sticky Notes +

+
+

A variable in Python is analogous to a sticky note with a name +written on it: assigning a value to a variable is like putting that +sticky note on a particular value.

+
Value of 65.0 with weight_kg label stuck on it

Using this analogy, we can investigate how assigning a value to one +variable does not change values of other, seemingly +related, variables. For example, let’s store the subject’s weight in +pounds in its own variable:

+
+

PYTHON +

+
# There are 2.2 pounds per kilogram
+weight_lb = 2.2 * weight_kg
+print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
+
+
+

OUTPUT +

+
weight in kilograms: 65.0 and in pounds: 143.0
+
+

Everything in a line of code following the ‘#’ symbol is a comment that is ignored by Python. +Comments allow programmers to leave explanatory notes for other +programmers or their future selves.

+
Value of 65.0 with weight_kg label stuck on it, and value of 143.0 with weight_lb label stuck on it

Similar to above, the expression 2.2 * weight_kg is +evaluated to 143.0, and then this value is assigned to the +variable weight_lb (i.e. the sticky note +weight_lb is placed on 143.0). At this point, +each variable is “stuck” to completely distinct and unrelated +values.

+

Let’s now change weight_kg:

+
+

PYTHON +

+
weight_kg = 100.0
+print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
+
+
+

OUTPUT +

+
weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
+
+
Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lbstuck on it

Since weight_lb doesn’t “remember” where its value comes +from, it is not updated when we change weight_kg.

+
+
+
+
+
+ +
+
+

Check Your Understanding +

+
+

What values do the variables mass and age +have after each of the following statements? Test your answer by +executing the lines.

+
+

PYTHON +

+
mass = 47.5
+age = 122
+mass = mass * 2.0
+age = age - 20
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
`mass` holds a value of 47.5, `age` does not exist
+`mass` still holds a value of 47.5, `age` holds a value of 122
+`mass` now has a value of 95.0, `age`'s value is still 122
+`mass` still has a value of 95.0, `age` now holds 102
+
+
+
+
+
+
+
+ +
+
+

Sorting Out References +

+
+

Python allows you to assign multiple values to multiple variables in +one line by separating the variables and values with commas. What does +the following program print out?

+
+

PYTHON +

+
first, second = 'Grace', 'Hopper'
+third, fourth = second, first
+print(third, fourth)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
Hopper Grace
+
+
+
+
+
+
+
+ +
+
+

Seeing Data Types +

+
+

What are the data types of the following variables?

+
+

PYTHON +

+
planet = 'Earth'
+apples = 5
+distance = 10.5
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(type(planet))
+print(type(apples))
+print(type(distance))
+
+
+

OUTPUT +

+
<class 'str'>
+<class 'int'>
+<class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Key Points +

+
+
  • Basic data types in Python include integers, strings, and +floating-point numbers.
  • +
  • Use variable = value to assign a value to a variable in +order to record it in memory.
  • +
  • Variables are created on demand whenever a value is assigned to +them.
  • +
  • Use print(something) to display the value of +something.
  • +
  • Use # some kind of explanation to add comments to +programs.
  • +
  • Built-in functions are always available to use.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/03-data_transformation.html b/03-data_transformation.html new file mode 100644 index 0000000..c6a82aa --- /dev/null +++ b/03-data_transformation.html @@ -0,0 +1,863 @@ + +Python for Official Statistics: Data Transformation +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Data Transformation

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I process tabular data files in Python?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain what a library is and what libraries are used for.
  • +
  • Import a Python library and use the functions it contains.
  • +
  • Read tabular data from a file into a program.
  • +
  • Select individual values and subsections from data.
  • +
  • Perform operations on arrays of data.
  • +
+
+
+
+
+

Words are useful, but what’s more useful are the sentences and +stories we build with them. Similarly, while a lot of powerful, general +tools are built into Python, specialized tools built up from these basic +units live in libraries that can be +called upon when needed.

+

Loading data into Python +

+

To begin processing the clinical trial inflammation data, we need to +load it into Python. Python can work with many different file types. +Text files can be loaded into Python by using the base Python +function

+
+

PYTHON +

+
Open("filename.txt", "r") 
+
+

where “r” means read only, or if you want to write to the file, you +can use “w”.

+

However, our patient data is in a csv. file, which is more commonly +loaded by using a library. Python has hundreds of thousands of libraries +to choose from to help carry out your work. Importing a library is like +getting a piece of lab equipment out of a storage locker and setting it +up on the bench. Libraries provide additional functionality to the basic +Python package, much like a new piece of equipment adds functionality to +a lab space. Just like in the lab, importing too many libraries can +sometimes complicate and slow down your programs - so we only import +what we need for each program. There are a couple common Python +libraries to load (and work with data).

+

pandas +

+

The first library we will present is called pandas pandas is a +Python library containing a set of functions and specialised data +structures that have been designed to help Python programmers to perform +data analysis tasks in a structured way.

+

Most of the things that pandas can do can be done with basic Python, +but the collected set of pandas functions and data structure makes the +data analysis tasks more consistent in terms of syntax and therefore +aids readabilty.

+

Remember to write the library name with a lower case ‘p’ because the +name of the package and Python is case sensitive.

+
+

Importing the pandas library

+

Importing the pandas library is done in exactly the same way as for +any other library. In almost all examples of Python code using the +pandas library, it will have been imported and given an alias of +pd. We will follow the same convention.

+
+

PYTHON +

+
import pandas as pd
+
+
+
+

Pandas data structures

+

There are two main data structure used by pandas, they are the Series +and the Dataframe. The Series equates in general to a vector or a list. +The Dataframe is equivalent to a table. Each column in a pandas +Dataframe is a pandas Series data structure.

+

We will mainly be looking at the Dataframe.

+

We can easily create a Pandas Dataframe by reading a .csv file

+
+
+

Reading a csv file

+

When we read a csv dataset in base Python we did so by opening the +dataset, reading and processing a record at a time and then closing the +dataset after we had read the last record. Reading datasets in this way +is slow and places all of the responsibility for extracting individual +data items of information from the records on the programmer.

+

The main advantage of this approach, however, is that you only have +to store one dataset record in memory at a time. This means that if you +have the time, you can process datasets of any size.

+

In Pandas, csv files are read as complete datasets. You do not have +to explicitly open and close the dataset. All of the dataset records are +assembled into a Dataframe. If your dataset has column headers in the +first record then these can be used as the Dataframe column names. You +can explicitly state this in the parameters to the call, but pandas is +usually able to infer that there ia a header row and use it +automatically.

+

To tell Python that we’d like to start using pandas, we need to import it:

+
+

PYTHON +

+
import pandas as pd
+
+

Often, libraries are given an alias or a short form name, in this +case pandas is given the alias “pd”. Aliases for common data analysis +libraries include:

+
+

PYTHON +

+
import pandas as pd
+import numpy as np
+import matplotlib as plt
+import seaborn as sns
+
+

Once we’ve imported the library, we can ask the library to read our +data file for us:

+
+

PYTHON +

+
pd.read_csv("filename.csv)
+
+

pandas is a commonly used library for working with and analysing +data. However, we will be working with a different package for the +remainder of this course. If you would like to learn more about data +manipulation and analysis using pandas, we recommend checking out Data Analysis and +Visualization with Python for Social Scientists.

+
+

numpy +

+

The second package that we will present is called NumPy, which stands for Numerical +Python. In general, you should use this library when you want to do +fancy things with lots of numbers, especially if you have matrices or +arrays. Numpy matrices are typically lighter weight with better +performance, particularly when working with large datasets.

+

We will be using this package to work with our clinical trial +inflammation data.

+

To tell Python that we’d like to start using NumPy, we need to import it:

+
+

PYTHON +

+
import numpy as np
+
+

Now that we have imported the library, we can ask the library (by +using the alisa np) to read our data file for us:

+
+

PYTHON +

+
np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+
+

OUTPUT +

+
array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+
+

The expression np.loadtxt(...) is a function call that asks Python +to run the function +loadtxt which belongs to the np library. The +dot notation in Python is used most of all as an object +attribute/property specifier or for invoking its method. +object.property will give you the object.property value, +object_name.method() will invoke on object_name method.

+

As an example, John Smith is the John that belongs to the Smith +family. We could use the dot notation to write his name +smith.john, just as loadtxt is a function that +belongs to the np library.

+

np.loadtxt has two parameters: the name of the file we +want to read and the delimiter +that separates values on a line. These both need to be character strings +(or strings for short), so we put +them in quotes.

+

Since we haven’t told it to do anything else with the function’s +output, the notebook displays it. +In this case, that output is the data we just loaded. By default, only a +few rows and columns are shown (with ... to omit elements +when displaying big arrays). Note that, to save space when displaying +NumPy arrays, Python does not show us trailing zeros, so +1.0 becomes 1..

+

Our call to np.loadtxt read our file but didn’t save the +data in memory. To do that, we need to assign the array to a variable. +In a similar manner to how we assign a single value to a variable, we +can also assign an array of values to a variable using the same syntax. +Let’s re-run np.loadtxt and save the returned data:

+
+

PYTHON +

+
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+

This statement doesn’t produce any output because we’ve assigned the +output to the variable data. If we want to check that the +data have been loaded, we can print the variable’s value:

+
+

PYTHON +

+
print(data)
+
+
+

OUTPUT +

+
[[ 0.  0.  1. ...,  3.  0.  0.]
+ [ 0.  1.  2. ...,  1.  0.  1.]
+ [ 0.  1.  1. ...,  2.  1.  1.]
+ ...,
+ [ 0.  1.  1. ...,  1.  1.  1.]
+ [ 0.  0.  0. ...,  0.  2.  0.]
+ [ 0.  0.  1. ...,  1.  1.  0.]]
+
+

Now that the data are in memory, we can manipulate them. First, let’s +ask what type of thing +data refers to:

+
+

PYTHON +

+
print(type(data))
+
+
+

OUTPUT +

+
<class 'np.ndarray'>
+
+

The output tells us that data currently refers to an +N-dimensional array, the functionality for which is provided by the +NumPy library. These data correspond to arthritis patients’ +inflammation. The rows are the individual patients, and the columns are +their daily inflammation measurements.

+
+
+ +
+
+

Data Type +

+
+

A Numpy array contains one or more elements of the same type. The +type function will only tell you that a variable is a NumPy +array but won’t tell you the type of thing inside the array. We can find +out the type of the data contained in the NumPy array.

+
+

PYTHON +

+
print(data.dtype)
+
+
+

OUTPUT +

+
float64
+
+

This tells us that the NumPy array’s elements are floating-point +numbers.

+
+
+
+

With the following command, we can see the array’s shape:

+
+

PYTHON +

+
print(data.shape)
+
+
+

OUTPUT +

+
(60, 40)
+
+

The output tells us that the data array variable +contains 60 rows and 40 columns. When we created the variable +data to store our arthritis data, we did not only create +the array; we also created information about the array, called members or attributes. This extra +information describes data in the same way an adjective +describes a noun. data.shape is an attribute of +data which describes the dimensions of data. +We use the same dotted notation for the attributes of variables that we +use for the functions in libraries because they have the same +part-and-whole relationship.

+

If we want to get a single number from the array, we must provide an +index in square brackets after the +variable name, just as we do in math when referring to an element of a +matrix. Our inflammation data has two dimensions, so we will need to use +two indices to refer to one specific value:

+
+

PYTHON +

+
print('first value in data:', data[0, 0])
+
+
+

OUTPUT +

+
first value in data: 0.0
+
+
+

PYTHON +

+
print('middle value in data:', data[29, 19])
+
+
+

OUTPUT +

+
middle value in data: 16.0
+
+

The expression data[29, 19] accesses the element at row +30, column 20. While this expression may not surprise you, +data[0, 0] might. Programming languages like Fortran, +MATLAB and R start counting at 1 because that’s what human beings have +done for thousands of years. Languages in the C family (including C++, +Java, Perl, and Python) count from 0 because it represents an offset +from the first value in the array (the second value is offset by one +index from the first value). This is closer to the way that computers +represent arrays (if you are interested in the historical reasons behind +counting indices from zero, you can read Mike +Hoye’s blog post). As a result, if we have an M×N array in Python, +its indices go from 0 to M-1 on the first axis and 0 to N-1 on the +second. It takes a bit of getting used to, but one way to remember the +rule is that the index is how many steps we have to take from the start +to get the item we want.

+
'data' is a 3 by 3 numpy array containing row 0: ['A', 'B', 'C'], row 1: ['D', 'E', 'F'], androw 2: ['G', 'H', 'I']. Starting in the upper left hand corner, data[0, 0] = 'A', data[0, 1] = 'B',data[0, 2] = 'C', data[1, 0] = 'D', data[1, 1] = 'E', data[1, 2] = 'F', data[2, 0] = 'G',data[2, 1] = 'H', and data[2, 2] = 'I',in the bottom right hand corner.
+
+ +
+
+

In the Corner +

+
+

What may also surprise you is that when Python displays an array, it +shows the element with index [0, 0] in the upper left +corner rather than the lower left. This is consistent with the way +mathematicians draw matrices but different from the Cartesian +coordinates. The indices are (row, column) instead of (column, row) for +the same reason, which can be confusing when plotting data.

+
+
+
+

Slicing data +

+

An index like [30, 20] selects a single element of an +array, but we can select whole sections as well. For example, we can +select the first ten days (columns) of values for the first four +patients (rows) like this:

+
+

PYTHON +

+
print(data[0:4, 0:10])
+
+
+

OUTPUT +

+
[[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.]
+ [ 0.  1.  2.  1.  2.  1.  3.  2.  2.  6.]
+ [ 0.  1.  1.  3.  3.  2.  6.  2.  5.  9.]
+ [ 0.  0.  2.  0.  4.  2.  2.  1.  6.  7.]]
+
+

The slice 0:4 means, +“Start at index 0 and go up to, but not including, index 4”. Again, the +up-to-but-not-including takes a bit of getting used to, but the rule is +that the difference between the upper and lower bounds is the number of +values in the slice.

+

We don’t have to start slices at 0:

+
+

PYTHON +

+
print(data[5:10, 0:10])
+
+
+

OUTPUT +

+
[[ 0.  0.  1.  2.  2.  4.  2.  1.  6.  4.]
+ [ 0.  0.  2.  2.  4.  2.  2.  5.  5.  8.]
+ [ 0.  0.  1.  2.  3.  1.  2.  3.  5.  3.]
+ [ 0.  0.  0.  3.  1.  5.  6.  5.  5.  8.]
+ [ 0.  1.  1.  2.  1.  3.  5.  3.  5.  8.]]
+
+

We also don’t have to include the upper and lower bound on the slice. +If we don’t include the lower bound, Python uses 0 by default; if we +don’t include the upper, the slice runs to the end of the axis, and if +we don’t include either (i.e., if we use ‘:’ on its own), the slice +includes everything:

+
+

PYTHON +

+
small = data[:3, 36:]
+print('small is:')
+print(small)
+
+

The above example selects rows 0 through 2 and columns 36 through to +the end of the array.

+
+

OUTPUT +

+
small is:
+[[ 2.  3.  0.  0.]
+ [ 1.  1.  0.  1.]
+ [ 2.  2.  1.  1.]]
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/04-lists.html b/04-lists.html new file mode 100644 index 0000000..2c34ab1 --- /dev/null +++ b/04-lists.html @@ -0,0 +1,1105 @@ + +Python for Official Statistics: List and Dictionary Methods +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

List and Dictionary Methods

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I store many values together?
  • +
  • How can I create a list succinctly?
  • +
  • How can I efficiently access nested data?
  • +
+
+
+
+
+
+

Objectives

+
  • Identify and create lists and dictionaries
  • +
  • Understand the properties and behaviours of lists and +dictionaries
  • +
  • Access values in lists and dictionaries
  • +
  • Create and access values from nest lists and dictionaries
  • +
+
+
+
+
+

Values can also be stored in other Python data types such as lists, +dictionaries, sets and tuples. Storing objects in a list is a fast and +versatile way to apply transformations across a sequence of values. +Storing objects in dictionary as key-value pairs is useful for +extracting specific values i.e. performing lookup operations.

+

Create and access lists +

+

Lists have the following properties and behaviours:

+
  • A single list can store different primitive object types and even +other lists
  • +
  • Lists are ordered and have a 0-based index
  • +
  • Lists can be appended to using the methods append() or +insert() +
  • +
  • Values inside a list can be removed using the methods +remove() or pop() +
  • +
  • Two lists can be concatenated with the operator + +
  • +
  • Values inside a list can be conditionally iterated through
  • +
  • A list is mutable i.e. the values inside a list can be modified in +place
  • +

To create a list, values are contained within square brackets +i.e. [] and individually separated by commas. The function +list() can also be used to create a list of values from an +iterable object like a string, set or tuple.

+
+

PYTHON +

+
# Create a list of integers using []
+list_1 = [1, 3, 5, 7]
+print(list_1)
+
+
+

OUTPUT +

+
[1, 3, 5, 7]
+
+
+

PYTHON +

+
# Unlike atomic vectors in R, a list can contain multiple primitive object types
+list_2 = [1, "one", 1.0, True]
+print(list_2)
+
+
+

OUTPUT +

+
[1, 'one', 1.0, True]
+
+
+

PYTHON +

+
# You can also use list() on an iterable object to convert it into a list
+string = 'abcdefg'  
+list_3 = list(string)  
+print(list_3)
+
+
+

OUTPUT +

+
['a', 'b', 'c', 'd', 'e', 'f', 'g']
+
+

Because lists have a 0-based index, we can access individual values +by their list index position. For 0-based indexes, the first value +always starts at position 0 i.e. the first element has an index of 0. +Accessing multiple values by their index positions is also referred to +as slicing or subsetting a list.

+

Note that we can use negative numbers as indices in Python. When we +do so, the index -1 gives us the last element in the list, +-2 gives us the second to last element in the list, and so +on.

+
+

PYTHON +

+
# Extract individual values from list_3
+print('first value:', list_3[0])
+print('second value:', list_3[1])
+print('last value:', list_3[-1])
+
+
+

OUTPUT +

+
first value: a
+second value: b
+last value: g
+
+
+

PYTHON +

+
# A syntax quirk for slicing values is to +1 to the last value's index 
+# To extract from index 0 to 2, we need to slice from [0:2+1] or [0:3]
+
+# Extract the first three values from list_3
+print('first 3 values:', list_3[0:3])
+
+# Start from index 0 and extract values from each subsequent second position
+print('every second value:', list_3[0::2])
+
+# Start from index 1, end at index 3 and extract from each subsequent second position
+print('every second value from index 1 to 3:', list_3[1:4:2])
+
+
+

OUTPUT +

+
first 3 values: ['a', 'b', 'c']
+every second value: ['a', 'c', 'e', 'g']
+every second value from index 1 to 3: ['b', 'd']
+
+

Change list values +

+

Data which can be modified in place is called mutable, while data +which cannot be modified is called immutable. Strings and numbers are +immutable in that when we want to change the value of a string or number +variable, we can only replace the old value with a completely new +value.

+
+

PYTHON +

+
string = 'abcde'
+string[0] = 'b' # Produces a type error as strings are immutable
+
+# TypeError: 'str' object does not support item assignment
+
+

In contrast, lists are mutable and we can modify them after they have +been created. We can change individual values, append new values, or +reorder the whole list through sorting.

+
+

PYTHON +

+
list_4 = ['apple', 'pear', 'plum']
+print('original list_4:', list_4)
+
+# Change the first value i.e. modify the list in place
+list_4[0] = 'banana'
+print('modified list_4:', list_4)
+
+# Add new value to list using the method .insert(index number, value)
+list_4.insert(1, 'apple') # Index 1 refers to the second position
+print('appended list_4:', list_4)
+
+
+

OUTPUT +

+
original list_4: ['apple', 'pear', 'plum']
+modified list_4: ['banana', 'pear', 'plum']
+appended list_4: ['banana', 'apple', 'pear', 'plum']
+
+
+

PYTHON +

+
# Sorting a list also modifies it in place
+list_5 = [2, 1, 3, 7]
+list_5.sort()
+print('list_5:', list_5)
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+
+

However, be careful when modifying data in-place. If two variables +refer to the same list, and you modify the list value, it will change +for both variables!

+
+

PYTHON +

+
# When we assign list_6 to list_5, it means both list_6 and list_5 point to the
+# same list object, not that list_6 is a copy of list_5.  
+
+list_6 = list_5  
+print('list_5:', list_5)
+print('list_6:', list_6)
+
+# Change the first value in list_6 from 1 to 2 
+list_6[0] = 2 
+
+print('modified list_6:', list_6)
+print('unmodified list_5:', list_5)
+
+# Warning: list_5 and list_6 have both been modified in place!
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+list_6: [1, 2, 3, 7]
+modified list_6: [2, 2, 3, 7]
+unmodified list_5: [2, 2, 3, 7]
+
+

Because of this behaviour, code which modifies data in place should +be handled with care. You can also avoid this behaviour by expliciting +creating a copy of the original list and modifying only the object copy. +This is why creating a copy of the original data object can be useful in +Python.

+
+

PYTHON +

+
list_5 = [1, 2, 3, 7]
+list_7 = list_5.copy()  
+print('list_5:', list_5)
+print('list_7:', list_7)
+
+# As list_7 is a completely new object copied from list_5, modifying list_7 does
+# not affect list_5.  
+
+list_7[0] = 2 
+print('modified list_7:', list_7)
+print('unmodified list_5:', list_5)
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+list_7: [1, 2, 3, 7]
+modified list_7: [2, 2, 3, 7]
+unmodified list_5: [1, 2, 3, 7]
+
+

Useful list functions +

+

There are a lot of functions and methods which can be applied to +lists, such as len(), max(), +index() and so forth. Mathematical operations do not work +on lists of integers, with the exception of +.

+

Note that + concatenates two lists into a single longer +list, rather than outputting the sum of two lists of numbers.

+
+

PYTHON +

+
list_8 = [1, 2, 3]
+list_9 = [4, 5, 6]
+
+list_8 + list_9 # This concatenates the lists and does not sum the two lists together
+
+
+

OUTPUT +

+
[1, 2, 3, 4, 5, 6]
+
+

In your spare time after this workshop, you can search for different +list functions and methods and test them out yourselves.

+

Nested lists +

+

We have previously mentioned that lists can be used to store other +Python object types, including lists. This means that we can create +nested lists in Python i.e. lists containing lists containing values. +This property is useful when we have a collection of values that we want +to access or transform as a subgroup.

+

To create a nested list, we also use [] or +list() to contain one or more lists of values of +interest.

+
+

PYTHON +

+
veg_stock = [
+    ['lettuce', 'lettuce', 'tomato', 'zucchini'],
+    ['lettuce', 'lettuce', 'carrot', 'zucchini'],
+    ['lettuce', 'basil', 'tomato', 'zucchini']
+    ]
+
+# Check that veg_stock is a list object
+print(type(veg_stock))
+
+# Check that the first value in veg_stock is itself a list
+print(veg_stock[0], 'has type', type(veg_stock[0]))  
+
+
+

OUTPUT +

+
<class 'list'>
+['lettuce', 'lettuce', 'tomato', 'zucchini'] has type <class 'list'>
+
+

To extract the first sub-list within the veg_stock list +object, we refer to its index like we would with any other value inside +a list i.e. veg_stock[1] points to the second sub-list +within the veg_stock list.

+

To access an individual string value inside a sub-list, we make use +of a second index, which points to an individual value inside the +sub-list.

+
+

PYTHON +

+
print(veg_stock[0]) # Access the first sub-list 
+print(veg_stock[0][0]) # Access the first value in the first sub-list 
+
+print(type(veg_stock[0])) # The first value in veg_stock is a list
+print(type(veg_stock[0][0])) # The first value in the first list in veg_stock is a string
+
+
+

OUTPUT +

+
['lettuce', 'lettuce', 'tomato', 'zucchini']
+lettuce
+<class 'list'>
+<class 'str'>
+
+

In general, however, when we are analysing a large collection of +values, the best practice is to structure those values in columns and +rows as a tabular Pandas data frame object. This is covered in another +Carpentries Course called Python +for Social Sciences.

+

Lists are still incredibly versatile and useful when you have a +collection of values that need to be efficiently accessed or +transformed. For example, data frame column names are commonly extracted +and stored inside a list, so that the same transformation can then be +mapped across multiple columns.

+

Create and access dictionaries +

+

A dictionary is a Python data type that is particularly suited for +enabling quick lookup operations on unstructured data sets.

+

A dictionary can therefore be thought of as an unordered list where +every item or value is associated with a unique key (i.e. a self-defined +index of unique strings or numbers). The index values are called keys +and a dictionary contains key-value pairs with the format +{key: value(s)}.

+

Dictionaries can be created by listing individual key-values pairs +inside {} or using dict().

+
+

PYTHON +

+
# A key-value pair can contain single or multiple values  
+# Keys are treated as case sensitive and unique
+# Multiple values are first stored inside a list  
+
+teams = {
+    'data science': ['Mei Ling', 'Paul', 'Gwen', 'Suresh'],
+    'user design': ['Amy', 'Linh', 'Sasha'],
+    'software dev': ['David', 'Prya'],
+    'comms': 'Taylor' 
+    } 
+
+

When using dict(), we need to indicate which key is +associated with which value. This can be done directly using tuples, +direct association i.e. using = or using +zip(), which creates a set of tuples from an iterable +list.

+
+

PYTHON +

+
# To use dict(), key-value pairs are can be stored inside tuples  
+ds_emp_status = dict([
+        ('Mei Ling', 'full time'),
+        ('Paul', 'full time'),
+        ('Gwen', 'part time'),
+        ('Suresh', 'part time')
+    ])  
+
+# Key-value pairs can also be assigned by direct association  
+# Keys cannot be strings i.e. wrapped in '' using this approach
+ud_emp_status = dict(
+    Amy = 'full time',
+    Linh = 'full time',
+    Sasha = 'casual' 
+    ) 
+
+# zip() can also be used if each key has only one value  
+sd_emp_status = dict(zip(
+    ['David', 'Prya'],
+    ['full time', 'full time']
+    ))
+
+

To access a specific value inside a dictionary, we need to specify +its key using []. This is similar to slicing or subsetting +a list by specifying its index using [].

+
+

PYTHON +

+
# Access the values associated with the key 'data science'
+print(teams['data science'])
+
+print('The object teams is of type', type(teams))
+print('The dict value', teams['data science'], 'is of type', type(teams['data science']))
+
+
+

OUTPUT +

+
['Mei Ling', 'Paul', 'Gwen', 'Suresh']
+The data object teams is of type <class 'dict'>
+The value ['Mei Ling', 'Paul', 'Gwen', 'Suresh'] is of type <class 'list'>
+
+

We can also access a value from a dictionary using the +get() method.

+
+

PYTHON +

+
print(teams.get('user design'))
+
+# get() also enables us to return an alternate string when the key is not found   
+# This prevents our code from returning an error message that halts the analysis
+
+print(teams.get('data engineering', 'WARNING: key does not exist'))
+
+
+

OUTPUT +

+
['Amy', 'Linh', 'Sasha']
+WARNING: key does not exist
+
+

To access data inside a dictionary, we can also perform the following +other actions:

+
  • Check whether a key exists in a dictionary using the keyword +in +
  • +
  • Retrieve unique dictionary keys using dict.keys() +
  • +
  • Retrieve dictionary values using dict.values() +
  • +
  • Retrieve dictionary items using dict.items() +
  • +
+

PYTHON +

+
# Check whether a key exists in a dictionary 
+print('data science' in teams) 
+print('Data Science' in teams) # Keys are case sensitive  
+
+# Retrieve all dictionary keys  
+print(teams.keys())
+print(sd_emp_status.keys())
+
+# Retrieve all dictionary values  
+print(sd_emp_status.values())  
+
+# Retrieve all dictionary key-value pairs
+print(sd_emp_status.items())
+
+
+

OUTPUT +

+
True
+False
+dict_keys(['data science', 'user design', 'software dev', 'comms'])
+dict_keys(['David', 'Prya'])
+dict_values(['full time', 'full time'])
+dict_items([('David', 'full time'), ('Prya', 'full time')])
+
+

To add a new key-value pair to an existing dictionary, we can create +a new key and directly attach a new value to it using = or +alternatively use the method update().

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())  
+
+# Add new key-value pair using direct assignment  
+sd_emp_status['Mohammad'] = 'full time'
+
+# Add new key-value pair using update({'key': 'value'})   
+sd_emp_status.update({'Carrie': 'part time'})
+
+print('updated dict items:', sd_emp_status.items())    
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])
+updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'part time')])
+
+

Because keys are unique, a dictionary cannot contain two keys with +the same name. This means that adding an item using a key that is +already present in the dictionary will cause the previous value to be +overwritten.

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())  
+
+# As the key 'Carrie' already exists, its value will be overwritten
+sd_emp_status['Carrie'] = 'full time'
+print('updated dict items:', sd_emp_status.items())  
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'part time')])
+updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'full time')])
+
+

To remove a key-value pair for an existing dictionary, we can use the +del keyword or the method pop(). Using +pop() also enables us to return an alternate string if we +trt to remove a non-existing key, which prevents our code from returning +an error message that halts the analysis.

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())
+
+# Delete dictionary keys using del and pop()
+del sd_emp_status['Mohammad']
+sd_emp_status.pop('Carrie')
+sd_emp_status.pop('Anuradha', 'WARNING: key does not exist') # Does not generate an error
+
+print('modified dict items:', sd_emp_status.items())  
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'full time')])
+modified dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])
+
+

Nested dictionaries +

+

Similar to lists, dictionaries can be nested as we can also store +dictionaries as values inside a key-value pair using {}. +Nested dictionaries are useful when we need to store unstructured data +in a complex structure. For example, JSON data is commonly used for +transmitting data in web applications and often exists in a nested +structure that can be stored using nested dictionaries in Python.

+
+

PYTHON +

+
# Individual dictionaries are enclosed in {} and separated by a comma
+nested_dict = {
+    'dict_1': { # First key is a dictionary of key-value pairs 
+        'key_1a': 'value_1a',
+        'key_1b': 'value_1b'
+                },
+    'dict_2': { # Second key is another dictionary of key-value pairs
+        'key_2a': 'value_2a',
+        'key_2b': 'value_2b'
+                }
+            }
+
+print(nested_dict)
+
+
+

OUTPUT +

+
{'dict_1': {'key_1a': 'value_1a', 'key_1b': 'value_1b'},
+ 'dict_2': {'key_2a': 'value_2a', 'key_2b': 'value_2b'}}
+
+

Similar to working with nested lists, to extract a value from the +first sub-dictionary, we specify both the main dictionary and +sub-dictionary keys using [].

+
+

PYTHON +

+
# Extract the value for key 2a in dict_2
+print('original value:', nested_dict['dict_2']['key_2a'])
+
+# Adding or updating a value can be done through the same approach
+nested_dict['dict_2']['key_2a'] = "modified_value_2a"  
+
+print('modified value:', nested_dict['dict_2']['key_2a'])
+
+
+

OUTPUT +

+
original value: value_2a
+modified value: modified_value_2a
+
+

Optional: converting lists and dictionaries to Pandas data +frames +

+

Lists and dictionaries can be easily converted into a tabular Pandas +data frame format. This can be useful when you need to create a small +data set for unit testing purposes.

+
+

PYTHON +

+
# Import pandas library
+import pandas as pd
+
+# Create a dictionary with each key-value pair representing a data frame column
+data = {
+    'col_1': [3, 2, 1, 0],
+    'col_2': ['a', 'b', 'c', 'd']
+    }
+
+df = pd.DataFrame.from_dict(data) 
+
+print(df) # Outputs data as a tabular Pandas data frame   
+print(type(df))
+
+
+

OUTPUT +

+
   col_1 col_2
+0      3     a
+1      2     b
+2      1     c
+3      0     d
+<class 'pandas.core.frame.DataFrame'>
+
+
+
+ +
+
+

Key Points +

+
+
  • Lists can contain any Python object including other lists
  • +
  • Lists are ordered i.e. indexed and can therefore be sliced by index +number
  • +
  • Unlike strings and integers, the values inside a list can be +modified in place
  • +
  • A list which contains other lists is referred to as a nested +list
  • +
  • Dictionaries behave like unordered lists and are defined using +key-value pairs
  • +
  • Dictionary keys are unique
  • +
  • A dictionary which contains other dictionaries is referred to as a +nested dictionary
  • +
  • Values inside nested lists and dictionaries can be accessed by an +additional index
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/05-loops.html b/05-loops.html new file mode 100644 index 0000000..849a7bb --- /dev/null +++ b/05-loops.html @@ -0,0 +1,1591 @@ + +Python for Official Statistics: Loops and Conditional Logic +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Loops and Conditional Logic

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I do the same operations on many different values?
  • +
  • How can my programs do different things based on data values?
  • +
+
+
+
+
+
+

Objectives

+
  • identify and create loops
  • +
  • use logical statements to allow for decision-based operations in +code
  • +
+
+
+
+
+

This episode contains two lessons:

+
  1. Repeating Actions with +Loops
  2. +
  3. Making Choices with +Conditional Logic
  4. +

Repeating Actions with Loops +

+

In the episode about visualizing +data, we will see Python code that plots values of interest from our +first inflammation dataset (inflammation-01.csv), which +revealed some suspicious features.

+
Line graphs showing average, maximum, and minimum inflammation across all patients over a 40-day period.

We have a dozen data sets right now and potentially more on the way +if Dr. Maverick can keep up their surprisingly fast clinical trial rate. +We want to create plots for all of our data sets with a single +statement. To do that, we’ll have to teach the computer how to repeat +things.

+

An example task that we might want to repeat is accessing numbers in +a list, which we will do by printing each number on a line of its +own.

+
+

PYTHON +

+
odds = [1, 3, 5, 7]
+
+

In Python, a list is basically an ordered +collection of elements, and every element has a unique number associated +with it — its index. This means that we can access elements in a list +using their indices. For example, we can get the first number in the +list odds, by using odds[0]. One way to print +each number is to use four print statements:

+
+

PYTHON +

+
print(odds[0])
+print(odds[1])
+print(odds[2])
+print(odds[3])
+
+
+

OUTPUT +

+
1
+3
+5
+7
+
+

This is a bad approach for three reasons:

+
  1. Not scalable. Imagine you need to print a list +that has hundreds of elements. It might be easier to type them in +manually.

  2. +
  3. Difficult to maintain. If we want to decorate +each printed element with an asterisk or any other character, we would +have to change four lines of code. While this might not be a problem for +small lists, it would definitely be a problem for longer ones.

  4. +
  5. Fragile. If we use it with a list that has more +elements than what we initially envisioned, it will only display part of +the list’s elements. A shorter list, on the other hand, will cause an +error because it will be trying to display elements of the list that do +not exist.

  6. +
+

PYTHON +

+
odds = [1, 3, 5]
+print(odds[0])
+print(odds[1])
+print(odds[2])
+print(odds[3])
+
+
+

PYTHON +

+
1
+3
+5
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-3-7974b6cdaf14> in <module>()
+      3 print(odds[1])
+      4 print(odds[2])
+----> 5 print(odds[3])
+
+IndexError: list index out of range
+
+

Here’s a better approach: a for +loop

+
+

PYTHON +

+
odds = [1, 3, 5, 7]
+for num in odds:
+    print(num)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+
+

This is shorter — certainly shorter than something that prints every +number in a hundred-number list — and more robust as well:

+
+

PYTHON +

+
odds = [1, 3, 5, 7, 9, 11]
+for num in odds:
+    print(num)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+9
+11
+
+

The improved version uses a for +loop to repeat an operation — in this case, printing — once for each +thing in a sequence. The general form of a loop is:

+
+

PYTHON +

+
for variable in collection:
+    # do things using variable, such as print
+
+

Using the odds example above, the loop might look like this:

+
Loop variable 'num' being assigned the value of each element in the list odds in turn andthen being printed

where each number (num) in the variable +odds is looped through and printed one number after +another. The other numbers in the diagram denote which loop cycle the +number was printed in (1 being the first loop cycle, and 6 being the +final loop cycle).

+

We can call the loop +variable anything we like, but there must be a colon at the end of +the line starting the loop, and we must indent anything we want to run +inside the loop. Unlike many other languages, there is no command to +signify the end of the loop body (e.g., end for); +everything indented after the for statement belongs to the +loop.

+
+
+ +
+
+

What’s in a name? +

+
+

In the example above, the loop variable was given the name +num as a mnemonic; it is short for ‘number’. We can choose +any name we want for variables. We might just as easily have chosen the +name banana for the loop variable, as long as we use the +same name when we invoke the variable inside the loop:

+
+

PYTHON +

+
odds = [1, 3, 5, 7, 9, 11]
+for banana in odds:
+   print(banana)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+9
+11
+
+

It is a good idea to choose variable names that are meaningful, +otherwise it would be more difficult to understand what the loop is +doing.

+
+
+
+

Here’s another loop that repeatedly updates a variable:

+
+

PYTHON +

+
length = 0
+names = ['Curie', 'Darwin', 'Turing']
+for value in names:
+    length = length + 1
+print('There are', length, 'names in the list.')
+
+
+

OUTPUT +

+
There are 3 names in the list.
+
+

It’s worth tracing the execution of this little program step by step. +Since there are three names in names, the statement on line +4 will be executed three times. The first time around, +length is zero (the value assigned to it on line 1) and +value is Curie. The statement adds 1 to the +old value of length, producing 1, and updates +length to refer to that new value. The next time around, +value is Darwin and length is 1, +so length is updated to be 2. After one more update, +length is 3; since there is nothing left in +names for Python to process, the loop finishes and the +print function on line 5 tells us our final answer.

+

Note that a loop variable +is a variable that is being used to record progress in a loop. It still +exists after the loop is over, and we can re-use variables previously +defined as loop variables as +well:

+
+

PYTHON +

+
name = 'Rosalind'
+for name in ['Curie', 'Darwin', 'Turing']:
+    print(name)
+print('after the loop, name is', name)
+
+
+

OUTPUT +

+
Curie
+Darwin
+Turing
+after the loop, name is Turing
+
+

Note also that finding the length of an object is such a common +operation that Python actually has a built-in function to do it called +len:

+
+

PYTHON +

+
print(len([0, 1, 2, 3]))
+
+
+

OUTPUT +

+
4
+
+

len is much faster than any function we could write +ourselves, and much easier to read than a two-line loop; it will also +give us the length of many other data types we haven’t seen yet, so we +should always use it when we can.

+
+
+ +
+
+

From 1 to N +

+
+

Python has a built-in function called range that +generates a sequence of numbers range can accept 1, 2, or 3 +parameters.

+
  • If one parameter is given, range generates a sequence +of that length, starting at zero and incrementing by 1. For example, +range(3) produces the numbers 0, 1, 2.
  • +
  • If two parameters are given, range starts at the first +and ends just before the second, incrementing by one. For example, +range(2, 5) produces 2, 3, 4.
  • +
  • If range is given 3 parameters, it starts at the first +one, ends just before the second one, and increments by the third one. +For example, range(3, 10, 2) produces +3, 5, 7, 9.
  • +

Using range, write a loop that uses range +to print the first 3 natural numbers:

+
+

OUTPUT +

+
1
+2
+3
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
for number in range(1, 4):
+   print(number)
+
+
+
+
+
+
+
+ +
+
+

Understanding the loops +

+
+

Given the following loop:

+
+

PYTHON +

+
word = 'oxygen'
+for letter in word:
+    print(letter)
+
+

How many times is the body of the loop executed?

+
  • 3 times
  • +
  • 4 times
  • +
  • 5 times
  • +
  • 6 times
  • +
+
+
+
+
+ +
+
+

The body of the loop is executed 6 times.

+
+
+
+
+
+
+ +
+
+

Computing Powers With Loops +

+
+

Exponentiation is built into Python:

+
+

PYTHON +

+
print(5 ** 3)
+
+
+

OUTPUT +

+
125
+
+

Write a loop that calculates the same result as 5 ** 3 +using multiplication (and without exponentiation).

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
result = 1
+for number in range(0, 3):
+    result = result * 5
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Summing a List +

+
+

Write a loop that calculates the sum of elements in a list by adding +each element and printing the final value, so +[124, 402, 36] prints 562

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
numbers = [124, 402, 36]
+summed = 0
+for num in numbers:
+    summed = summed + num
+print(summed)
+
+
+
+
+
+
+
+ +
+
+

Computing the Value of a Polynomial +

+
+

The built-in function enumerate takes a sequence (e.g., +a list) and generates a new sequence of the +same length. Each element of the new sequence is a pair composed of the +index (0, 1, 2,…) and the value from the original sequence:

+
+

PYTHON +

+
for idx, val in enumerate(a_list):
+    # Do something using idx and val
+
+

The code above loops through a_list, assigning the index +to idx and the value to val.

+

Suppose you have encoded a polynomial as a list of coefficients in +the following way: the first element is the constant term, the second +element is the coefficient of the linear term, the third is the +coefficient of the quadratic term, etc.

+
+

PYTHON +

+
x = 5
+coefs = [2, 4, 3]
+y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2
+print(y)
+
+
+

OUTPUT +

+
97
+
+

Write a loop using enumerate(coefs) which computes the +value y of any polynomial, given x and +coefs.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
y = 0
+for idx, coef in enumerate(coefs):
+    y = y + coef * x**idx
+
+
+
+
+
+

Making Choices with Conditional Logic +

+

How can we use Python to automatically recognize different situations +we encounter with our data and take a different action for each? In this +lesson, we’ll learn how to write code that runs only when certain +conditions are true.

+
+

Conditionals

+

We can ask Python to take different actions, depending on a +condition, with an if statement:

+
+

PYTHON +

+
num = 37
+if num > 100:
+    print('greater')
+else:
+    print('not greater')
+print('done')
+
+
+

OUTPUT +

+
not greater
+done
+
+

The second line of this code uses the keyword if to tell +Python that we want to make a choice. If the test that follows the +if statement is true, the body of the if +(i.e., the set of lines indented underneath it) is executed, and +“greater” is printed. If the test is false, the body of the +else is executed instead, and “not greater” is printed. +Only one or the other is ever executed before continuing on with program +execution to print “done”:

+
A flowchart diagram of the if-else construct that tests if variable num is greater than 100

Conditional +statements don’t have to include an else. If there +isn’t one, Python simply does nothing if the test is false:

+
+

PYTHON +

+
num = 53
+print('before conditional...')
+if num > 100:
+    print(num, 'is greater than 100')
+print('...after conditional')
+
+
+

OUTPUT +

+
before conditional...
+...after conditional
+
+

We can also chain several tests together using elif, +which is short for “else if”. The following Python code uses +elif to print the sign of a number.

+
+

PYTHON +

+
num = -3
+
+if num > 0:
+    print(num, 'is positive')
+elif num == 0:
+    print(num, 'is zero')
+else:
+    print(num, 'is negative')
+
+
+

OUTPUT +

+
-3 is negative
+
+

Note that to test for equality we use a double equals sign +== rather than a single equals sign = which is +used to assign values.

+
+
+ +
+
+

Comparing in Python +

+
+

Along with the > and == operators we +have already used for comparing values in our conditionals, there are a +few more options to know about:

+
  • +>: greater than
  • +
  • +<: less than
  • +
  • +==: equal to
  • +
  • +!=: does not equal
  • +
  • +>=: greater than or equal to
  • +
  • +<=: less than or equal to
  • +
+
+
+

We can also combine tests using and and or. +and is only true if both parts are true:

+
+

PYTHON +

+
if (1 > 0) and (-1 >= 0):
+    print('both parts are true')
+else:
+    print('at least one part is false')
+
+
+

OUTPUT +

+
at least one part is false
+
+

while or is true if at least one part is true:

+
+

PYTHON +

+
if (1 < 0) or (1 >= 0):
+    print('at least one test is true')
+
+
+

OUTPUT +

+
at least one test is true
+
+
+
+ +
+
+

+True and False +

+
+

True and False are special words in Python +called booleans, which represent truth values. A statement +such as 1 < 0 returns the value False, +while -1 < 0 returns the value True.

+
+
+
+
+
+

Checking Our Data

+

Now that we’ve seen how conditionals work, we can use them to check +for the suspicious features we saw in our inflammation data. We are +about to use functions provided by the numpy module again. +Therefore, if you’re working in a new Python session, make sure to load +the module with:

+
+

PYTHON +

+
import numpy
+
+

From the first couple of plots, we saw that maximum daily +inflammation exhibits a strange behavior and raises one unit a day. +Wouldn’t it be a good idea to detect such behavior and report it as +suspicious? Let’s do that! However, instead of checking every single day +of the study, let’s merely check if maximum inflammation in the +beginning (day 0) and in the middle (day 20) of the study are equal to +the corresponding day numbers.

+
+

PYTHON +

+
max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+
+

We also saw a different problem in the third dataset; the minima per +day were all zero (looks like a healthy person snuck into our study). We +can also check for this with an elif condition:

+
+

PYTHON +

+
elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+
+

And if neither of these conditions are true, we can use +else to give the all-clear:

+
+

PYTHON +

+
else:
+    print('Seems OK!')
+
+

Let’s test that out:

+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+
+
+

OUTPUT +

+
Suspicious looking maxima!
+
+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-03.csv', delimiter=',')
+
+max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+
+
+

OUTPUT +

+
Minima add up to zero!
+
+

In this way, we have asked Python to do something different depending +on the condition of our data. Here we printed messages in all cases, but +we could also imagine not using the else catch-all so that +messages are only printed when something is wrong, freeing us from +having to manually examine every plot for features we’ve seen +before.

+
+
+ +
+
+

How Many Paths? +

+
+

Consider this code:

+
+

PYTHON +

+
if 4 > 5:
+    print('A')
+elif 4 == 5:
+    print('B')
+elif 4 < 5:
+    print('C')
+
+

Which of the following would be printed if you were to run this code? +Why did you pick this answer?

+
  1. A
  2. +
  3. B
  4. +
  5. C
  6. +
  7. B and C
  8. +
+
+
+
+
+ +
+
+

C gets printed because the first two conditions, +4 > 5 and 4 == 5, are not true, but +4 < 5 is true. In this case, only one of these +conditions can be true for at a time, but in other scenarios multiple +elif conditions could be met. In these scenarios, only the +action associated with the first true elif condition will +occur, starting from the top of the conditional section.

+
A flowchart diagram of a conditional section with multiple elif conditions and some > possible outcomes.

This contrasts with the case of multiple if statements, +where every action can occur as long as their condition is met.

+
A flowchart diagram of a conditional section with multiple if statements and some possible outcomes.
+
+
+
+
+
+ +
+
+

What Is Truth? +

+
+

True and False booleans are not the only +values in Python that are true and false. In fact, any value +can be used in an if or elif. After reading +and running the code below, explain what the rule is for which values +are considered true and which are > considered false.

+
+

PYTHON +

+
if '':
+    print('empty string is true')
+if 'word':
+    print('word is true')
+if []:
+    print('empty list is true')
+if [1, 2, 3]:
+    print('non-empty list is true')
+if 0:
+    print('zero is true')
+if 1:
+    print('one is true')
+
+
+
+
+
+
+ +
+
+

That’s Not Not What I Meant +

+
+

Sometimes it is useful to check whether some condition is +not true. The Boolean operator not can do this +explicitly. After reading and running the code below, write some +if statements that use not to test the rule +that you formulated in the previous challenge.

+
+

PYTHON +

+
if not '':
+    print('empty string is not true')
+if not 'word':
+    print('word is not true')
+if not not True:
+    print('not not True is true')
+
+
+
+
+
+
+ +
+
+

Close Enough +

+
+

Write some conditions that print True if the variable +a is within 10% of the variable b and +False otherwise. Compare your implementation with your +partner’s. Do you get the same answer for all possible pairs of +numbers?

+
+
+
+
+
+ +
+
+

There is a built-in +function abs that returns the absolute value of a +number:

+
+

PYTHON +

+
print(abs(-12))
+
+
+

OUTPUT +

+
12
+
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
a = 5
+b = 5.1
+
+if abs(a - b) <= 0.1 * abs(b):
+    print('True')
+else:
+    print('False')
+
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(abs(a - b) <= 0.1 * abs(b))
+
+

This works because the Booleans True and +False have string representations which can be printed.

+
+
+
+
+
+
+ +
+
+

In-Place Operators +

+
+

Python (and most other languages in the C family) provides in-place operators that +work like this:

+
+

PYTHON +

+
x = 1  # original value
+x += 1 # add one to x, assigning result back to x
+x *= 3 # multiply x by 3
+print(x)
+
+
+

OUTPUT +

+
6
+
+

Write some code that sums the positive and negative numbers in a list +separately, using in-place operators. Do you think the result is more or +less readable than writing the same without in-place operators?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
positive_sum = 0
+negative_sum = 0
+test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
+for num in test_list:
+    if num > 0:
+        positive_sum += num
+    elif num == 0:
+        pass
+    else:
+        negative_sum += num
+print(positive_sum, negative_sum)
+
+

Here pass means “don’t do anything”. In this particular +case, it’s not actually needed, since if num == 0 neither +sum needs to change, but it illustrates the use of elif and +pass.

+
+
+
+
+
+
+ +
+
+

Sorting a List Into Buckets +

+
+

In our data folder, large data sets are stored in files +whose names start with “inflammation-” and small data sets – in files +whose names start with “small-”. We also have some other files that we +do not care about at this point. We’d like to break all these files into +three lists called large_files, small_files, +and other_files, respectively.

+

Add code to the template below to do this. Note that the string +method startswith +returns True if and only if the string it is called on +starts with the string passed as an argument, that is:

+
+

PYTHON +

+
'String'.startswith('Str')
+
+
+

OUTPUT +

+
True
+
+

But

+
+

PYTHON +

+
'String'.startswith('str')
+
+
+

OUTPUT +

+
False
+
+

Use the following Python code as your starting point:

+
+

PYTHON +

+
filenames = ['inflammation-01.csv',
+         'myscript.py',
+         'inflammation-02.csv',
+         'small-01.csv',
+         'small-02.csv']
+large_files = []
+small_files = []
+other_files = []
+
+

Your solution should:

+
  1. loop over the names of the files
  2. +
  3. figure out which group each filename belongs in
  4. +
  5. append the filename to that list
  6. +

In the end the three lists should be:

+
+

PYTHON +

+
large_files = ['inflammation-01.csv', 'inflammation-02.csv']
+small_files = ['small-01.csv', 'small-02.csv']
+other_files = ['myscript.py']
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
for filename in filenames:
+    if filename.startswith('inflammation-'):
+        large_files.append(filename)
+    elif filename.startswith('small-'):
+        small_files.append(filename)
+    else:
+        other_files.append(filename)
+
+print('large_files:', large_files)
+print('small_files:', small_files)
+print('other_files:', other_files)
+
+
+
+
+
+
+
+ +
+
+
  1. Write a loop that counts the number of vowels in a character +string.
  2. +
  3. Test it on a few individual words and full sentences.
  4. +
  5. Once you are done, compare your solution to your neighbor’s. Did you +make the same decisions about how to handle the letter ‘y’ (which some +people think is a vowel, and some do not)?
  6. +
+

Solution

+
vowels = 'aeiouAEIOU'
+sentence = 'Mary had a little lamb.'
+count = 0
+for char in sentence:
+   if char in vowels:
+       count += 1
+
+print('The number of vowels in this string is ' + str(count))
+

{.challenge}

+
+
+
+
+
+
+
+ +
+
+

Key Points +

+
+
  • Use for variable in sequence to process the elements of +a sequence one at a time.
  • +
  • The body of a for loop must be indented.
  • +
  • Use len(thing) to determine the length of something +that contains other values.
  • +
  • Use if condition to start a conditional statement, +elif condition to provide additional tests, and +else to provide a default.
  • +
  • The bodies of the branches of conditional statements must be +indented.
  • +
  • Use == to test for equality.
  • +
  • +X and Y is only true if both X and +Y are true.
  • +
  • +X or Y is true if either X or +Y, or both, are true.
  • +
  • Zero, the empty string, and the empty list are considered false; all +other numbers, strings, and lists are considered true.
  • +
  • +True and False represent truth +values.
  • +
+
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/06-alternative_loops.html b/06-alternative_loops.html new file mode 100644 index 0000000..6ee802d --- /dev/null +++ b/06-alternative_loops.html @@ -0,0 +1,489 @@ + +Python for Official Statistics: Alternatives to Loops +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Alternatives to Loops

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I vectorize my loops?
  • +
+
+
+
+
+
+

Objectives

+
  • identify what vectorized operations are
  • +
  • perform basic vectorized operations
  • +
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
  • NULL
  • +
+
+
+ + + +
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/07-functions.html b/07-functions.html new file mode 100644 index 0000000..10d1c22 --- /dev/null +++ b/07-functions.html @@ -0,0 +1,1504 @@ + +Python for Official Statistics: Creating Functions +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Creating Functions

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What are functions, and how can I use them in Python?
  • +
  • How can I define new functions?
  • +
  • What’s the difference between defining and calling a function?
  • +
  • What happens when I call a function?
  • +
+
+
+
+
+
+

Objectives

+
  • identify what a function is
  • +
  • create new functions
  • +
  • Set default values for function parameters.
  • +
  • Explain why we should divide programs into small, single-purpose +functions.
  • +
+
+
+
+
+

At this point, we’ve seen that code can have Python make decisions +about what it sees in our data. What if we want to convert some of our +data, like taking a temperature in Fahrenheit and converting it to +Celsius. We could write something like this for converting a single +number

+
+

PYTHON +

+
fahrenheit_val = 99
+celsius_val = ((fahrenheit_val - 32) * (5/9))
+
+

and for a second number we could just copy the line and rename the +variables

+
+

PYTHON +

+
fahrenheit_val = 99
+celsius_val = ((fahrenheit_val - 32) * (5/9))
+
+fahrenheit_val2 = 43
+celsius_val2 = ((fahrenheit_val2 - 32) * (5/9))
+
+

But we would be in trouble as soon as we had to do this more than a +couple times. Cutting and pasting it is going to make our code get very +long and very repetitive, very quickly. We’d like a way to package our +code so that it is easier to reuse, a shorthand way of re-executing +longer pieces of code. In Python we can use ‘functions’. Let’s start by +defining a function fahr_to_celsius that converts +temperatures from Fahrenheit to Celsius:

+
+

PYTHON +

+
def explicit_fahr_to_celsius(temp):
+    # Assign the converted value to a variable
+    converted = ((temp - 32) * (5/9))
+    # Return the value of the new variable
+    return converted
+    
+def fahr_to_celsius(temp):
+    # Return converted value more efficiently using the return
+    # function without creating a new variable. This code does
+    # the same thing as the previous function but it is more explicit
+    # in explaining how the return command works.
+    return ((temp - 32) * (5/9))
+
+
Labeled parts of a Python function definition

The function definition opens with the keyword def +followed by the name of the function (fahr_to_celsius) and +a parenthesized list of parameter names (temp). The body of the function — the statements +that are executed when it runs — is indented below the definition line. +The body concludes with a return keyword followed by the +return value.

+

When we call the function, the values we pass to it are assigned to +those variables so that we can use them inside the function. Inside the +function, we use a return +statement to send a result back to whoever asked for it.

+

Let’s try running our function.

+
+

PYTHON +

+
fahr_to_celsius(32)
+
+

This command should call our function, using “32” as the input and +return the function value.

+

In fact, calling our own function is no different from calling any +other function:

+
+

PYTHON +

+
print('freezing point of water:', fahr_to_celsius(32), 'C')
+print('boiling point of water:', fahr_to_celsius(212), 'C')
+
+
+

OUTPUT +

+
freezing point of water: 0.0 C
+boiling point of water: 100.0 C
+
+

We’ve successfully called the function that we defined, and we have +access to the value that we returned.

+

Composing Functions +

+

Now that we’ve seen how to turn Fahrenheit into Celsius, we can also +write the function to turn Celsius into Kelvin:

+
+

PYTHON +

+
def celsius_to_kelvin(temp_c):
+    return temp_c + 273.15
+
+print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))
+
+
+

OUTPUT +

+
freezing point of water in Kelvin: 273.15
+
+

What about converting Fahrenheit to Kelvin? We could write out the +formula, but we don’t need to. Instead, we can compose the two functions we have +already created:

+
+

PYTHON +

+
def fahr_to_kelvin(temp_f):
+    temp_c = fahr_to_celsius(temp_f)
+    temp_k = celsius_to_kelvin(temp_c)
+    return temp_k
+
+print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))
+
+
+

OUTPUT +

+
boiling point of water in Kelvin: 373.15
+
+

This is our first taste of how larger programs are built: we define +basic operations, then combine them in ever-larger chunks to get the +effect we want. Real-life functions will usually be larger than the ones +shown here — typically half a dozen to a few dozen lines — but they +shouldn’t ever be much longer than that, or the next person who reads it +won’t be able to understand what’s going on.

+

Variable Scope +

+

In composing our temperature conversion functions, we created +variables inside of those functions, temp, +temp_c, temp_f, and temp_k. We +refer to these variables as local variables because they no +longer exist once the function is done executing. If we try to access +their values outside of the function, we will encounter an error:

+
+

PYTHON +

+
print('Again, temperature in Kelvin was:', temp_k)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-1-eed2471d229b> in <module>
+----> 1 print('Again, temperature in Kelvin was:', temp_k)
+
+NameError: name 'temp_k' is not defined
+
+

If you want to reuse the temperature in Kelvin after you have +calculated it with fahr_to_kelvin, you can store the result +of the function call in a variable:

+
+

PYTHON +

+
temp_kelvin = fahr_to_kelvin(212.0)
+print('temperature in Kelvin was:', temp_kelvin)
+
+
+

OUTPUT +

+
temperature in Kelvin was: 373.15
+
+

The variable temp_kelvin, being defined outside any +function, is said to be global.

+

Inside a function, one can read the value of such global +variables:

+
+

PYTHON +

+
def print_temperatures():
+  print('temperature in Fahrenheit was:', temp_fahr)
+  print('temperature in Kelvin was:', temp_kelvin)
+
+temp_fahr = 212.0
+temp_kelvin = fahr_to_kelvin(temp_fahr)
+
+print_temperatures()
+
+
+

OUTPUT +

+
temperature in Fahrenheit was: 212.0
+temperature in Kelvin was: 373.15
+
+

By giving our functions human-readable names, we can more easily read +and understand what is happening in the for loop. Even +better, if at some later date we want to use either of those pieces of +code again, we can do so in a single line.

+

Testing and Documenting +

+

Once we start putting things in functions so that we can re-use them, +we need to start testing that those functions are working correctly. To +see how to do this, let’s write a function to offset a dataset so that +it’s mean value shifts to a user-defined value:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    return (data - numpy.mean(data)) + target_mean_value
+
+

We could test this on our actual data, but since we don’t know what +the values ought to be, it will be hard to tell if the result was +correct. Instead, let’s use NumPy to create a matrix of 0’s and then +offset its values to have a mean value of 3:

+
+

PYTHON +

+
z = numpy.zeros((2,2))
+print(offset_mean(z, 3))
+
+
+

OUTPUT +

+
[[ 3.  3.]
+ [ 3.  3.]]
+
+

That looks right, so let’s try offset_mean on our real +data:

+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print(offset_mean(data, 0))
+
+
+

OUTPUT +

+
[[-6.14875 -6.14875 -5.14875 ... -3.14875 -6.14875 -6.14875]
+ [-6.14875 -5.14875 -4.14875 ... -5.14875 -6.14875 -5.14875]
+ [-6.14875 -5.14875 -5.14875 ... -4.14875 -5.14875 -5.14875]
+ ...
+ [-6.14875 -5.14875 -5.14875 ... -5.14875 -5.14875 -5.14875]
+ [-6.14875 -6.14875 -6.14875 ... -6.14875 -4.14875 -6.14875]
+ [-6.14875 -6.14875 -5.14875 ... -5.14875 -5.14875 -6.14875]]
+
+

It’s hard to tell from the default output whether the result is +correct, but there are a few tests that we can run to reassure us:

+
+

PYTHON +

+
print('original min, mean, and max are:', numpy.amin(data), numpy.mean(data), numpy.amax(data))
+offset_data = offset_mean(data, 0)
+print('min, mean, and max of offset data are:',
+      numpy.amin(offset_data),
+      numpy.mean(offset_data),
+      numpy.amax(offset_data))
+
+
+

OUTPUT +

+
original min, mean, and max are: 0.0 6.14875 20.0
+min, mean, and and max of offset data are: -6.14875 2.84217094304e-16 13.85125
+
+

That seems almost right: the original mean was about 6.1, so the +lower bound from zero is now about -6.1. The mean of the offset data +isn’t quite zero — we’ll explore why not in the challenges — but it’s +pretty close. We can even go further and check that the standard +deviation hasn’t changed:

+
+

PYTHON +

+
print('std dev before and after:', numpy.std(data), numpy.std(offset_data))
+
+
+

OUTPUT +

+
std dev before and after: 4.61383319712 4.61383319712
+
+

Those values look the same, but we probably wouldn’t notice if they +were different in the sixth decimal place. Let’s do this instead:

+
+

PYTHON +

+
print('difference in standard deviations before and after:',
+      numpy.std(data) - numpy.std(offset_data))
+
+
+

OUTPUT +

+
difference in standard deviations before and after: -3.5527136788e-15
+
+

Again, the difference is very small. It’s still possible that our +function is wrong, but it seems unlikely enough that we should probably +get back to doing our analysis.

+

Documentation +

+

We have one more task first, though: we should write some documentation for our function +to remind ourselves later what it’s for and how to use it.

+

The usual way to put documentation in software is to add comments like this:

+
+

PYTHON +

+
# offset_mean(data, target_mean_value):
+# return a new array containing the original data with its mean offset to match the desired value.
+def offset_mean(data, target_mean_value):
+    return (data - numpy.mean(data)) + target_mean_value
+
+

There’s a better way, though. If the first thing in a function is a +string that isn’t assigned to a variable, that string is attached to the +function as its documentation:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value."""
+    return (data - numpy.mean(data)) + target_mean_value
+
+

This is better because we can now ask Python’s built-in help system +to show us the documentation for the function:

+
+

PYTHON +

+
help(offset_mean)
+
+
+

OUTPUT +

+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+    Return a new array containing the original data with its mean offset to match the desired value.
+
+

A string like this is called a docstring. We don’t need to use +triple quotes when we write one, but if we do, we can break the string +across multiple lines:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value.
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3], 0)
+    array([-1.,  0.,  1.])
+    """
+    return (data - numpy.mean(data)) + target_mean_value
+
+help(offset_mean)
+
+
+

OUTPUT +

+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+    Return a new array containing the original data
+       with its mean offset to match the desired value.
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3], 0)
+    array([-1.,  0.,  1.])
+
+

Defining Defaults +

+

We have passed parameters to functions in two ways: directly, as in +type(data), and by name, as in +numpy.loadtxt(fname='something.csv', delimiter=','). In +fact, we can pass the filename to loadtxt without the +fname=:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', delimiter=',')
+
+
+

OUTPUT +

+
array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+
+

but we still need to say delimiter=:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', ',')
+
+
+

ERROR +

+
Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1041, in loa
+dtxt
+    dtype = np.dtype(dtype)
+  File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/core/_internal.py", line 199, in
+_commastring
+    newitem = (dtype, eval(repeats))
+  File "<string>", line 1
+    ,
+    ^
+SyntaxError: unexpected EOF while parsing
+
+

To understand what’s going on, and make our own functions easier to +use, let’s re-define our offset_mean function like +this:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value=0.0):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value, (0 by default).
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3])
+    array([-1.,  0.,  1.])
+    """
+    return (data - numpy.mean(data)) + target_mean_value
+
+

The key change is that the second parameter is now written +target_mean_value=0.0 instead of just +target_mean_value. If we call the function with two +arguments, it works as it did before:

+
+

PYTHON +

+
test_data = numpy.zeros((2, 2))
+print(offset_mean(test_data, 3))
+
+
+

OUTPUT +

+
[[ 3.  3.]
+ [ 3.  3.]]
+
+

But we can also now call it with just one parameter, in which case +target_mean_value is automatically assigned the default value of 0.0:

+
+

PYTHON +

+
more_data = 5 + numpy.zeros((2, 2))
+print('data before mean offset:')
+print(more_data)
+print('offset data:')
+print(offset_mean(more_data))
+
+
+

OUTPUT +

+
data before mean offset:
+[[ 5.  5.]
+ [ 5.  5.]]
+offset data:
+[[ 0.  0.]
+ [ 0.  0.]]
+
+

This is handy: if we usually want a function to work one way, but +occasionally need it to do something else, we can allow people to pass a +parameter when they need to but provide a default to make the normal +case easier. The example below shows how Python matches values to +parameters:

+
+

PYTHON +

+
def display(a=1, b=2, c=3):
+    print('a:', a, 'b:', b, 'c:', c)
+
+print('no parameters:')
+display()
+print('one parameter:')
+display(55)
+print('two parameters:')
+display(55, 66)
+
+
+

OUTPUT +

+
no parameters:
+a: 1 b: 2 c: 3
+one parameter:
+a: 55 b: 2 c: 3
+two parameters:
+a: 55 b: 66 c: 3
+
+

As this example shows, parameters are matched up from left to right, +and any that haven’t been given a value explicitly get their default +value. We can override this behavior by naming the value as we pass it +in:

+
+

PYTHON +

+
print('only setting the value of c')
+display(c=77)
+
+
+

OUTPUT +

+
only setting the value of c
+a: 1 b: 2 c: 77
+
+

With that in hand, let’s look at the help for +numpy.loadtxt:

+
+

PYTHON +

+
help(numpy.loadtxt)
+
+
+

OUTPUT +

+
Help on function loadtxt in module numpy.lib.npyio:
+
+loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+    Load data from a text file.
+
+    Each row in the text file must have the same number of values.
+
+    Parameters
+    ----------
+...
+
+

There’s a lot of information here, but the most important part is the +first couple of lines:

+
+

OUTPUT +

+
loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+
+

This tells us that loadtxt has one parameter called +fname that doesn’t have a default value, and eight others +that do. If we call the function like this:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', ',')
+
+

then the filename is assigned to fname (which is what we +want), but the delimiter string ',' is assigned to +dtype rather than delimiter, because +dtype is the second parameter in the list. However +',' isn’t a known dtype so our code produced +an error message when we tried to run it. When we call +loadtxt we don’t have to provide fname= for +the filename because it’s the first item in the list, but if we want the +',' to be assigned to the variable delimiter, +we do have to provide delimiter= for the second +parameter since delimiter is not the second parameter in +the list.

+

Readable functions +

+

Consider these two functions:

+
+

PYTHON +

+
def s(p):
+    a = 0
+    for v in p:
+        a += v
+    m = a / len(p)
+    d = 0
+    for v in p:
+        d += (v - m) * (v - m)
+    return numpy.sqrt(d / (len(p) - 1))
+
+def std_dev(sample):
+    sample_sum = 0
+    for value in sample:
+        sample_sum += value
+
+    sample_mean = sample_sum / len(sample)
+
+    sum_squared_devs = 0
+    for value in sample:
+        sum_squared_devs += (value - sample_mean) * (value - sample_mean)
+
+    return numpy.sqrt(sum_squared_devs / (len(sample) - 1))
+
+

The functions s and std_dev are +computationally equivalent (they both calculate the sample standard +deviation), but to a human reader, they look very different. You +probably found std_dev much easier to read and understand +than s.

+

As this example illustrates, both documentation and a programmer’s +coding style combine to determine how easy it is for others to +read and understand the programmer’s code. Choosing meaningful variable +names and using blank spaces to break the code into logical “chunks” are +helpful techniques for producing readable code. This is useful +not only for sharing code with others, but also for the original +programmer. If you need to revisit code that you wrote months ago and +haven’t thought about since then, you will appreciate the value of +readable code!

+
+
+ +
+
+

Combining Strings +

+
+

“Adding” two strings produces their concatenation: +'a' + 'b' is 'ab'. Write a function called +fence that takes two parameters called +original and wrapper and returns a new string +that has the wrapper character at the beginning and end of the original. +A call to your function should look like this:

+
+

PYTHON +

+
print(fence('name', '*'))
+
+
+

OUTPUT +

+
*name*
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def fence(original, wrapper):
+    return wrapper + original + wrapper
+
+
+
+
+
+
+
+ +
+
+

Return versus print +

+
+

Note that return and print are not +interchangeable. print is a Python function that +prints data to the screen. It enables us, users, see +the data. return statement, on the other hand, makes data +visible to the program. Let’s have a look at the following function:

+
+

PYTHON +

+
def add(a, b):
+    print(a + b)
+
+

Question: What will we see if we execute the +following commands?

+
+

PYTHON +

+
A = add(7, 3)
+print(A)
+
+
+
+
+
+
+ +
+
+

Python will first execute the function add with +a = 7 and b = 3, and, therefore, print +10. However, because function add does not +have a line that starts with return (no return +“statement”), it will, by default, return nothing which, in Python +world, is called None. Therefore, A will be +assigned to None and the last line (print(A)) +will print None. As a result, we will see:

+
+

OUTPUT +

+
10
+None
+
+
+
+
+
+
+
+ +
+
+

Selecting Characters From Strings +

+
+

If the variable s refers to a string, then +s[0] is the string’s first character and s[-1] +is its last. Write a function called outer that returns a +string made up of just the first and last characters of its input. A +call to your function should look like this:

+
+

PYTHON +

+
print(outer('helium'))
+
+
+

OUTPUT +

+
hm
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def outer(input_string):
+    return input_string[0] + input_string[-1]
+
+
+
+
+
+
+
+ +
+
+

Rescaling an Array +

+
+

Write a function rescale that takes an array as input +and returns a corresponding array of values scaled to lie in the range +0.0 to 1.0. (Hint: If L and H are the lowest +and highest values in the original array, then the replacement for a +value v should be (v-L) / (H-L).)

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def rescale(input_array):
+    L = numpy.amin(input_array)
+    H = numpy.amax(input_array)
+    output_array = (input_array - L) / (H - L)
+    return output_array
+
+
+
+
+
+
+
+ +
+
+

Testing and Documenting Your Function +

+
+

Run the commands help(numpy.arange) and +help(numpy.linspace) to see how to use these functions to +generate regularly-spaced values, then use those values to test your +rescale function. Once you’ve successfully tested your +function, add a docstring that explains what it does.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
"""Takes an array as input, and returns a corresponding array scaled so
+that 0 corresponds to the minimum and 1 to the maximum value of the input array.
+
+Examples:
+>>> rescale(numpy.arange(10.0))
+array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
+       0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])
+>>> rescale(numpy.linspace(0, 100, 5))
+array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
+"""
+
+
+
+
+
+
+
+ +
+
+

Defining Defaults +

+
+

Rewrite the rescale function so that it scales data to +lie between 0.0 and 1.0 by default, but will +allow the caller to specify lower and upper bounds if they want. Compare +your implementation to your neighbor’s: do the two functions always +behave the same way?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def rescale(input_array, low_val=0.0, high_val=1.0):
+    """rescales input array values to lie between low_val and high_val"""
+    L = numpy.amin(input_array)
+    H = numpy.amax(input_array)
+    intermed_array = (input_array - L) / (H - L)
+    output_array = intermed_array * (high_val - low_val) + low_val
+    return output_array
+
+
+
+
+
+
+
+ +
+
+

Variables Inside and Outside Functions +

+
+

What does the following piece of code display when run — and why?

+
+

PYTHON +

+
f = 0
+k = 0
+
+def f2k(f):
+    k = ((f - 32) * (5.0 / 9.0)) + 273.15
+    return k
+
+print(f2k(8))
+print(f2k(41))
+print(f2k(32))
+
+print(k)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
259.81666666666666
+278.15
+273.15
+0
+
+

k is 0 because the k inside the function +f2k doesn’t know about the k defined outside +the function. When the f2k function is called, it creates a +local variable +k. The function does not return any values and does not +alter k outside of its local copy. Therefore the original +value of k remains unchanged. Beware that a local +k is created because f2k internal statements +affect a new value to it. If k was only +read, it would simply retrieve the global k +value.

+
+
+
+
+
+
+ +
+
+

Mixing Default and Non-Default Parameters +

+
+

Given the following code:

+
+

PYTHON +

+
def numbers(one, two=2, three, four=4):
+    n = str(one) + str(two) + str(three) + str(four)
+    return n
+
+print(numbers(1, three=3))
+
+

what do you expect will be printed? What is actually printed? What +rule do you think Python is following?

+
  1. 1234
  2. +
  3. one2three4
  4. +
  5. 1239
  6. +
  7. SyntaxError
  8. +

Given that, what does the following piece of code display when +run?

+
+

PYTHON +

+
def func(a, b=3, c=6):
+    print('a: ', a, 'b: ', b, 'c:', c)
+
+func(-1, 2)
+
+
  1. a: b: 3 c: 6
  2. +
  3. a: -1 b: 3 c: 6
  4. +
  5. a: -1 b: 2 c: 6
  6. +
  7. a: b: -1 c: 2
  8. +
+
+
+
+
+ +
+
+

Attempting to define the numbers function results in +4. SyntaxError. The defined parameters two and +four are given default values. Because one and +three are not given default values, they are required to be +included as arguments when the function is called and must be placed +before any parameters that have default values in the function +definition.

+

The given call to func displays +a: -1 b: 2 c: 6. -1 is assigned to the first parameter +a, 2 is assigned to the next parameter b, and +c is not passed a value, so it uses its default value +6.

+
+
+
+
+
+
+ +
+
+

Readable Code +

+
+

Revise a function you wrote for one of the previous exercises to try +to make the code more readable. Then, collaborate with one of your +neighbors to critique each other’s functions and discuss how your +function implementations could be further improved to make them more +readable.

+
+
+
+
+
+ +
+
+

Key Points +

+
+
  • Define a function using +def function_name(parameter).
  • +
  • The body of a function must be indented.
  • +
  • Call a function using function_name(value).
  • +
  • Numbers are stored as integers or floating-point numbers.
  • +
  • Variables defined within a function can only be seen and used within +the body of the function.
  • +
  • Variables created outside of any function are called global +variables.
  • +
  • Within a function, we can access global variables.
  • +
  • Variables created within a function override global variables if +their names match.
  • +
  • Use help(thing) to view help for something.
  • +
  • Put docstrings in functions to provide help for that function.
  • +
  • Specify default values for parameters when defining a function using +name=value in the parameter list.
  • +
  • Parameters can be passed by matching based on name, by position, or +by omitting them (in which case the default value is used).
  • +
  • Put code whose parameters change frequently in a function, then call +it with different parameter values to customize its behavior.
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/08-data_analysis.html b/08-data_analysis.html new file mode 100644 index 0000000..3cc6743 --- /dev/null +++ b/08-data_analysis.html @@ -0,0 +1,491 @@ + +Python for Official Statistics: Data Analysis +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Data Analysis

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I process tabular data files in Python?
  • +
  • How can I do the same operations on many different files?
  • +
+
+
+
+
+
+

Objectives

+
  • read in data files to Python
  • +
  • perform common operations on tabular data
  • +
  • write code to perform the same operation on multiple files
  • +
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
  • NULL
  • +
+
+
+ + + +
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/09-visualizations.html b/09-visualizations.html new file mode 100644 index 0000000..067396b --- /dev/null +++ b/09-visualizations.html @@ -0,0 +1,490 @@ + +Python for Official Statistics: Visualizations +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Visualizations

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I visualize tabular data in Python?
  • +
  • How can I group several plots together?
  • +
+
+
+
+
+
+

Objectives

+
  • create graphs and other visualizations using tabular data
  • +
  • group plots together to make comparative visualizations
  • +
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
  • NULL
  • +
+
+
+ + + +
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/10-errors_exceptions.html b/10-errors_exceptions.html new file mode 100644 index 0000000..8989987 --- /dev/null +++ b/10-errors_exceptions.html @@ -0,0 +1,1184 @@ + +Python for Official Statistics: Errors and Exceptions +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Errors and Exceptions

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How does Python report errors?
  • +
  • How can I handle errors in Python programs?
  • +
+
+
+
+
+
+

Objectives

+
  • identify different errors and correct bugs associated with them
  • +
+
+
+
+
+

Every programmer encounters errors, both those who are just +beginning, and those who have been programming for years. Encountering +errors and exceptions can be very frustrating at times, and can make +coding feel like a hopeless endeavour. However, understanding what the +different types of errors are and when you are likely to encounter them +can help a lot. Once you know why you get certain types of +errors, they become much easier to fix.

+

Errors in Python have a very specific form, called a traceback. Let’s examine one:

+
+

PYTHON +

+
# This code has an intentional error. You can type it directly or
+# use it for reference to understand the error message below.
+def favorite_ice_cream():
+    ice_creams = [
+        'chocolate',
+        'vanilla',
+        'strawberry'
+    ]
+    print(ice_creams[3])
+
+favorite_ice_cream()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-1-70bd89baa4df> in <module>()
+      9     print(ice_creams[3])
+      10
+----> 11 favorite_ice_cream()
+
+<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
+      7         'strawberry'
+      8     ]
+----> 9     print(ice_creams[3])
+      10
+      11 favorite_ice_cream()
+
+IndexError: list index out of range
+
+

This particular traceback has two levels. You can determine the +number of levels by looking for the number of arrows on the left hand +side. In this case:

+
  1. The first shows code from the cell above, with an arrow pointing +to Line 11 (which is favorite_ice_cream()).

  2. +
  3. The second shows some code in the function +favorite_ice_cream, with an arrow pointing to Line 9 (which +is print(ice_creams[3])).

  4. +

The last level is the actual place where the error occurred. The +other level(s) show what function the program executed to get to the +next level down. So, in this case, the program first performed a function call to the function +favorite_ice_cream. Inside this function, the program +encountered an error on Line 6, when it tried to run the code +print(ice_creams[3]).

+
+
+ +
+
+

Long Tracebacks +

+
+

Sometimes, you might see a traceback that is very long -- sometimes +they might even be 20 levels deep! This can make it seem like something +horrible happened, but the length of the error message does not reflect +severity, rather, it indicates that your program called many functions +before it encountered the error. Most of the time, the actual place +where the error occurred is at the bottom-most level, so you can skip +down the traceback to the bottom.

+
+
+
+

So what error did the program actually encounter? In the last line of +the traceback, Python helpfully tells us the category or type of error +(in this case, it is an IndexError) and a more detailed +error message (in this case, it says “list index out of range”).

+

If you encounter an error and don’t know what it means, it is still +important to read the traceback closely. That way, if you fix the error, +but encounter a new one, you can tell that the error changed. +Additionally, sometimes knowing where the error occurred is +enough to fix it, even if you don’t entirely understand the message.

+

If you do encounter an error you don’t recognize, try looking at the +official +documentation on errors. However, note that you may not always be +able to find the error there, as it is possible to create custom errors. +In that case, hopefully the custom error message is informative enough +to help you figure out what went wrong. Libraries like pandas and numpy +have these custom errors, but the procedure to figure them out is the +same: go to the earliest line in the error, and look at the error +message for it. The documentation for these libraries will often provide +the information you need about any functions you are using. There are +also large communities of users for data libraries that can help as +well!

+
+
+ +
+
+

Reading Error Messages +

+
+

Read the Python code and the resulting traceback below, and answer +the following questions:

+
  1. How many levels does the traceback have?
  2. +
  3. What is the function name where the error occurred?
  4. +
  5. On which line number in this function did the error occur?
  6. +
  7. What is the type of error?
  8. +
  9. What is the error message?
  10. +
+

PYTHON +

+
# This code has an intentional error. Do not type it directly;
+# use it for reference to understand the error message below.
+def print_message(day):
+    messages = [
+        'Hello, world!',
+        'Today is Tuesday!',
+        'It is the middle of the week.',
+        'Today is Donnerstag in German!',
+        'Last day of the week!',
+        'Hooray for the weekend!',
+        'Aw, the weekend is almost over.'
+    ]
+    print(messages[day])
+
+def print_sunday_message():
+    print_message(7)
+
+print_sunday_message()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-7-3ad455d81842> in <module>
+     16     print_message(7)
+     17
+---> 18 print_sunday_message()
+     19
+
+<ipython-input-7-3ad455d81842> in print_sunday_message()
+     14
+     15 def print_sunday_message():
+---> 16     print_message(7)
+     17
+     18 print_sunday_message()
+
+<ipython-input-7-3ad455d81842> in print_message(day)
+     11         'Aw, the weekend is almost over.'
+     12     ]
+---> 13     print(messages[day])
+     14
+     15 def print_sunday_message():
+
+IndexError: list index out of range
+
+
+
+
+
+
+ +
+
+
  1. 3 levels
  2. +
  3. print_message
  4. +
  5. 13
  6. +
  7. IndexError
  8. +
  9. +list index out of range You can then infer that +7 is not the right index to use with +messages.
  10. +
+
+
+
+
+
+ +
+
+

Better errors on newer Pythons +

+
+

Newer versions of Python have improved error printouts. If you are +debugging errors, it is often helpful to use the latest Python version, +even if you support older versions of Python.

+
+
+
+

Type Errors +

+

One of the most common types of errors in Python are called type +errors. These errors occur when you try to perform an operation on +an object in python that cannot support it. This happens easily when +working with large datasets where there are expected value types like +either strings or integers. When we write a function expecting integers, +we will not get an error until we encounter an operation that cannot +handle strings. For example:

+
+

PYTHON +

+

+def our_function()
+  my_string="Hello World"
+  letter=my_string["e""]
+
+
+

ERROR +

+
  File "<ipython-input-3-6bb841ea1423>", line 3
+    letter=my_string["e"]
+                       ^
+TypeError: string indices must be integers
+
+

We get this error because we are trying to use an index to access +part of our string, which requires an integer. Instead, we entered a +character and received a type error. This is fixed by replacing “e” with +2.

+

In the case of datasets, we often see type errors when a mathematical +operation, such as taking a mean, is performed on a column that contains +characters, either as a result of formatting or introduced through +error. As a result, correcting the error can involve simply removing the +characters from the strings using regular expressions, or if the +characters have resulted in incorrect data, removing those observations +from the dataset.

+

Syntax Errors +

+

When you forget a colon at the end of a line, accidentally add one +space too many when indenting under an if statement, or +forget a parenthesis, you will encounter a syntax error. This means that +Python couldn’t figure out how to read your program. This is similar to +forgetting punctuation in English: for example, this text is difficult +to read there is no punctuation there is also no capitalization why is +this hard because you have to figure out where each sentence ends you +also have to figure out where each sentence begins to some extent it +might be ambiguous if there should be a sentence break or not

+

People can typically figure out what is meant by text with no +punctuation, but people are much smarter than computers. If Python +doesn’t know how to read the program, it will give up and inform you +with an error. For example:

+
+

PYTHON +

+
def some_function()
+    msg = 'hello, world!'
+    print(msg)
+     return msg
+
+
+

ERROR +

+
  File "<ipython-input-3-6bb841ea1423>", line 1
+    def some_function()
+                       ^
+SyntaxError: invalid syntax
+
+

Here, Python tells us that there is a SyntaxError on +line 1, and even puts a little arrow in the place where there is an +issue. In this case the problem is that the function definition is +missing a colon at the end.

+

Actually, the function above has two issues with syntax. If +we fix the problem with the colon, we see that there is also an +IndentationError, which means that the lines in the +function definition do not all have the same indentation:

+
+

PYTHON +

+
def some_function():
+    msg = 'hello, world!'
+    print(msg)
+     return msg
+
+
+

ERROR +

+
  File "<ipython-input-4-ae290e7659cb>", line 4
+    return msg
+    ^
+IndentationError: unexpected indent
+
+

Both SyntaxError and IndentationError +indicate a problem with the syntax of your program, but an +IndentationError is more specific: it always means +that there is a problem with how your code is indented.

+
+
+ +
+
+

Tabs and Spaces +

+
+

Some indentation errors are harder to spot than others. In +particular, mixing spaces and tabs can be difficult to spot because they +are both whitespace. In the +example below, the first two lines in the body of the function +some_function are indented with tabs, while the third line +— with spaces. If you’re working in a Jupyter notebook, be sure to copy +and paste this example rather than trying to type it in manually because +Jupyter automatically replaces tabs with spaces.

+
+

PYTHON +

+
def some_function():
+	msg = 'hello, world!'
+	print(msg)
+        return msg
+
+

Visually it is impossible to spot the error. Fortunately, Python does +not allow you to mix tabs and spaces.

+
+

ERROR +

+
  File "<ipython-input-5-653b36fbcd41>", line 4
+    return msg
+              ^
+TabError: inconsistent use of tabs and spaces in indentation
+
+
+
+
+

Variable Name Errors +

+

Another very common type of error is called a NameError, +and occurs when you try to use a variable that does not exist. For +example:

+
+

PYTHON +

+
print(a)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-7-9d7b17ad5387> in <module>()
+----> 1 print(a)
+
+NameError: name 'a' is not defined
+
+

Variable name errors come with some of the most informative error +messages, which are usually of the form “name ‘the_variable_name’ is not +defined”.

+

Why does this error message occur? That’s a harder question to +answer, because it depends on what your code is supposed to do. However, +there are a few very common reasons why you might have an undefined +variable. The first is that you meant to use a string, but forgot to put quotes around +it:

+
+

PYTHON +

+
print(hello)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-8-9553ee03b645> in <module>()
+----> 1 print(hello)
+
+NameError: name 'hello' is not defined
+
+

The second reason is that you might be trying to use a variable that +does not yet exist. In the following example, count should +have been defined (e.g., with count = 0) before the for +loop:

+
+

PYTHON +

+
for number in range(10):
+    count = count + number
+print('The count is:', count)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-9-dd6a12d7ca5c> in <module>()
+      1 for number in range(10):
+----> 2     count = count + number
+      3 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+

Finally, the third possibility is that you made a typo when you were +writing your code. Let’s say we fixed the error above by adding the line +Count = 0 before the for loop. Frustratingly, this actually +does not fix the error. Remember that variables are case-sensitive, so the variable +count is different from Count. We still get +the same error, because we still have not defined +count:

+
+

PYTHON +

+
Count = 0
+for number in range(10):
+    count = count + number
+print('The count is:', count)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-10-d77d40059aea> in <module>()
+      1 Count = 0
+      2 for number in range(10):
+----> 3     count = count + number
+      4 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+

Index Errors +

+

Next up are errors having to do with containers (like lists and +strings) and the items within them. If you try to access an item in a +list or a string that does not exist, then you will get an error. This +makes sense: if you asked someone what day they would like to get +coffee, and they answered “caturday”, you might be a bit annoyed. Python +gets similarly annoyed if you try to ask it for an item that doesn’t +exist:

+
+

PYTHON +

+
letters = ['a', 'b', 'c']
+print('Letter #1 is', letters[0])
+print('Letter #2 is', letters[1])
+print('Letter #3 is', letters[2])
+print('Letter #4 is', letters[3])
+
+
+

OUTPUT +

+
Letter #1 is a
+Letter #2 is b
+Letter #3 is c
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-11-d817f55b7d6c> in <module>()
+      3 print('Letter #2 is', letters[1])
+      4 print('Letter #3 is', letters[2])
+----> 5 print('Letter #4 is', letters[3])
+
+IndexError: list index out of range
+
+

Here, Python is telling us that there is an IndexError +in our code, meaning we tried to access a list index that did not +exist.

+

File Errors +

+

The last type of error we’ll cover today are the most common type of +error when using Python with data, those associated with reading and +writing files: FileNotFoundError. If you try to read a file +that does not exist, you will receive a FileNotFoundError +telling you so. If you attempt to write to a file that was opened +read-only, Python 3 returns an UnsupportedOperationError. +More generally, problems with input and output manifest as +OSErrors, which may show up as a more specific subclass; +you can see the +list in the Python docs. They all have a unique UNIX +errno, which is you can see in the error message.

+
+

PYTHON +

+
file_handle = open('myfile.txt', 'r')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+FileNotFoundError                         Traceback (most recent call last)
+<ipython-input-14-f6e1ac4aee96> in <module>()
+----> 1 file_handle = open('myfile.txt', 'r')
+
+FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
+
+

One reason for receiving this error is that you specified an +incorrect path to the file. For example, if I am currently in a folder +called myproject, and I have a file in +myproject/writing/myfile.txt, but I try to open +myfile.txt, this will fail. The correct path would be +writing/myfile.txt. It is also possible that the file name +or its path contains a typo. There may also be specific settings based +on your organization if you are using shared, networked, or cloud-based +drives. It is best to check with your IT administrators if you are still +encountering issues reading in a file after troubleshooting.

+

A related issue can occur if you use the “read” flag instead of the +“write” flag. Python will not give you an error if you try to open a +file for writing when the file does not exist. However, if you meant to +open a file for reading, but accidentally opened it for writing, and +then try to read from it, you will get an +UnsupportedOperation error telling you that the file was +not opened for reading:

+
+

PYTHON +

+
file_handle = open('myfile.txt', 'w')
+file_handle.read()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+UnsupportedOperation                      Traceback (most recent call last)
+<ipython-input-15-b846479bc61f> in <module>()
+      1 file_handle = open('myfile.txt', 'w')
+----> 2 file_handle.read()
+
+UnsupportedOperation: not readable
+
+

If you are getting a read or write error on file or folder that you +are able to open and/or edit with other programs, you may need to +contact an IT administrator to check the permissions granted to you and +any programs you are using.

+

These are the most common errors with files, though many others +exist. If you get an error that you’ve never seen before, searching the +Internet for that error type often reveals common reasons why you might +get that error.

+
+
+ +
+
+

Identifying Syntax Errors +

+
+
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. Is it a +SyntaxError or an IndentationError?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+

PYTHON +

+
def another_function
+  print('Syntax errors are annoying.')
+   print('But at least Python tells us about them!')
+  print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+ +
+
+

SyntaxError for missing (): at end of first +line, IndentationError for mismatch between second and +third lines. A fixed version is:

+
+

PYTHON +

+
def another_function():
+    print('Syntax errors are annoying.')
+    print('But at least Python tells us about them!')
+    print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+ +
+
+

Identifying Variable Name Errors +

+
+
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. What type of +NameError do you think this is? In other words, is it a +string with no quotes, a misspelled variable, or a variable that should +have been defined but was not?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+

PYTHON +

+
for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (Number % 3) == 0:
+        message = message + a
+    else:
+        message = message + 'b'
+print(message)
+
+
+
+
+
+
+ +
+
+

3 NameErrors for number being misspelled, +for message not defined, and for a not being +in quotes.

+

Fixed version:

+
+

PYTHON +

+
message = ''
+for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (number % 3) == 0:
+        message = message + 'a'
+    else:
+        message = message + 'b'
+print(message)
+
+
+
+
+
+
+
+ +
+
+

Identifying Index Errors +

+
+
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
  5. Fix the error.
  6. +
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+ +
+
+

IndexError; the last entry is seasons[3], +so seasons[4] doesn’t make sense. A fixed version is:

+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[-1])
+
+
+
+
+
+

A Final Note About Correcting Errors +

+

There are a lot of very helpful answers for many error messages, +however when working with official statistics, we need to also exercise +some caution. Be aware and be wary of any answers that ask you to +download a package from someone’s personal GitHub repository or other +file sharing service. Try to find the type of error first and understand +what the issue is before downloading anything claiming to fix the error. +If the error is the result of an issue with a version of a package, +check if there are any security vulnerabilities with that version, and +use a package manager to move between package versions.

+
+
+ +
+
+

Key Points +

+
+
  • NULL
  • +
+
+
+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/404.html b/404.html new file mode 100644 index 0000000..48eb95f --- /dev/null +++ b/404.html @@ -0,0 +1,445 @@ + +Python for Official Statistics: Page not found +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Page not found

+ +

Our apologies! +

+

We cannot seem to find the page you are looking for. Here are some +tips that may help:

+
  1. try going back to the previous +page or
  2. +
  3. navigate to any other page using the navigation bar on the +left.
  4. +
  5. if the URL ends with /index.html, try removing +that.
  6. +
  7. head over to the home page of this +lesson +
  8. +

If you came here from a link in this lesson, please contact the +lesson maintainers using the links at the foot of this page.

+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/CODE_OF_CONDUCT.html b/CODE_OF_CONDUCT.html new file mode 100644 index 0000000..c7a51e7 --- /dev/null +++ b/CODE_OF_CONDUCT.html @@ -0,0 +1,456 @@ + +Python for Official Statistics: Contributor Code of Conduct +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Contributor Code of Conduct

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +

As contributors and maintainers of this project, we pledge to follow +the The +Carpentries Code of Conduct.

+

Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by following our reporting +guidelines.

+ + + +
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/LICENSE.html b/LICENSE.html new file mode 100644 index 0000000..71d016f --- /dev/null +++ b/LICENSE.html @@ -0,0 +1,507 @@ + +Python for Official Statistics: Licenses +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Licenses

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + + +

Instructional Material +

+

All Carpentries (Software Carpentry, Data Carpentry, and Library +Carpentry) instructional material is made available under the Creative Commons +Attribution license. The following is a human-readable summary of +(and not a substitute for) the full legal +text of the CC BY 4.0 license.

+

You are free:

+
  • to Share—copy and redistribute the material in any +medium or format
  • +
  • to Adapt—remix, transform, and build upon the +material
  • +

for any purpose, even commercially.

+

The licensor cannot revoke these freedoms as long as you follow the +license terms.

+

Under the following terms:

+
  • Attribution—You must give appropriate credit +(mentioning that your work is derived from work that is Copyright (c) +The Carpentries and, where practical, linking to https://carpentries.org/), provide a link to the +license, and indicate if changes were made. You may do so in any +reasonable manner, but not in any way that suggests the licensor +endorses you or your use.

  • +
  • No additional restrictions—You may not apply +legal terms or technological measures that legally restrict others from +doing anything the license permits. With the understanding +that:

  • +

Notices:

+
  • You do not have to comply with the license for elements of the +material in the public domain or where your use is permitted by an +applicable exception or limitation.
  • +
  • No warranties are given. The license may not give you all of the +permissions necessary for your intended use. For example, other rights +such as publicity, privacy, or moral rights may limit how you use the +material.
  • +

Software +

+

Except where otherwise noted, the example programs and other software +provided by The Carpentries are made available under the OSI-approved MIT +license.

+

Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +“Software”), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions:

+

The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software.

+

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

+

Trademark +

+

“The Carpentries”, “Software Carpentry”, “Data Carpentry”, and +“Library Carpentry” and their respective logos are registered trademarks +of Community Initiatives.

+
+
+ + +
+
+
+ +
Back To Top +
+
+ + diff --git a/aio.html b/aio.html new file mode 100644 index 0000000..2d75453 --- /dev/null +++ b/aio.html @@ -0,0 +1,5371 @@ + + + + + +Python for Official Statistics: All in One View + + + + + + + + + + + + +
+ Python for Official Statistics +
+ +
+
+ + + + + + +
+
+ + +

Content from Introduction

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What is programming?
  • +
  • How do I document code?
  • +
  • How do I find reliable and safe resources or code online?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify basic concepts in programming
  • +
+
+
+
+
+
+

Programming in Python + +

+
+

In most general terms, programming is the process of writing +instructions for a computer. In this course we will be using Python as +the language to communicate with the computer.

+
+

Strictly speaking, Python is an interpreted language, rather than a +compiled language, meaning we are not communicating directly with the +computer when we use Python. When we run Python code, our Python source +code is first translated into byte code, which is then executed by the +Python virtual machine.

+
+

Programming is a wide topic including a variety of techniques and +tools. In this course we’ll be focusing on programming for statistical +analysis.

+
+

IDEs +

+

IDE stands for Integrated Development Environment. IDEs are where you +will write, edit, and debug python scripts, so you want to choose one +that makes you feel comfortable and includes the functionality that you +need. Some open-source IDEs for Python include JupyterLab and Visual Studio +Code.

+
+
+

Packages +

+

Packages, or libraries, are extensions to the statistical programming +language. They contain code, data, and documentation in a standardised +collection format that can be installed by users, typically via a +centralised software repository. A typical Python workflow will use base +Python (the core operations and functions provided by your Python +installation) as well as specialised data analysis and scientific +packages like NumPy, SciPy and Pandas.

+
+

Best Practices + +

+
+

Let’s overview some base concepts that any programmer should always +keep in mind.

+
+

Documentation +

+

Have you ever returned to a task and tried to read a note that you +quickly scrawled for yourself the last time you were working on it? Have +you ever inherited a project from a colleague and found you have no idea +what remains to be done?

+

It can be very challenging to return to your own work or a +colleague’s and this goes doubly for programming. Documentation is one +way we can reduce the burden on future selves and our colleagues.

+
+

Inline Documentation +

+

As a new programmer, inline documentation can be the most helpful. +Inline documentation refers to writing comments on the same line as your +code. For example, if we wrote a line of code to sum 1+1, we might +document it as follows:

+
+

PYTHON +

+
1+1         # adding the numbers 1 and 1 together.
+
+

Although this is a very simple line of code and it might seem like +overkill to document it in this way, these types of comments can be very +helpful in jogging your memory when returning to a project. Inline +comments can also help you to break multi-step programs into digestible +and readable pieces.

+
+
+

External Documentation +

+

Sometimes you require more detail than you can comfortably fit in +your inline documentation. In this case it can be helpful to create +separate files to document your project. This type of documentation will +typically focus on the goals, scope, and any special instructions +relating to your project rather than the details fo your code. The most +common type of external documentation is a README file. It is best +practice to create a basic README file for any project. A basic README +should include:

+
    +
  • a brief description of the project,
  • +
  • any special instructions for installation or use,
  • +
  • the authors and any references.
  • +
+

README files are just text files and it is best practice is to save +your README file as a README.md markdown document. This +file format is automatically recognised by code repositories like +GitHub, so your README contents are displayed alongside your code +repository.

+
+
+

DocStrings +

+

In chapter 7: functions we’ll learn +about documentation specific to functions known as DocStrings.

+
+
+

Getting Help + +

+
+

Later on, in chapter 10: Errors +and Exceptions we will cover errors in more detail. However, before +we get there it’s very likely you’ll need some assistance writing Python +code.

+
+

Built-in Help +

+

There is a help +function built into base Python. You can use it to investigate +built-in functions, data types, and more. For example, say we want to +know more about the print() function in Python:

+
+

PYTHON +

+
help(print)
+
+
+

OUTPUT +

+
Help on built-in function print in module builtins:
+
+print(...)
+    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
+
+    Prints the values to a stream, or to sys.stdout by default.
+    Optional keyword arguments:
+    file:  a file-like object (stream); defaults to the current sys.stdout.
+    sep:   string inserted between values, default a space.
+    end:   string appended after the last value, default a newline.
+-- More  --
+
+
+
+

Finding Resources online +

+

Stack Overflow is a valuable +resource for programmers of all levels. It can be daunting to post your +own question! Fortunately, chances are someone else has already asked a +similar question!

+

The Official Python +Documentation is another great resource.

+

It can also be helpful to do a general search for a particular topic +or error message. It’s very likely the first few results will be from +StackOverflow, followed by a few from official documentation and then +you may start seeing results from personal blogs or third parties. These +third party results can sometime be valuable but we should be cautious! +Here are a few things to keep in mind when you are looking for online +resources:

+
    +
  1. Don’t download or install anything unless you are certain of what it +is and why you need it.
  2. +
  3. Don’t copy or run code unless you fully understand what it +does.
  4. +
  5. Python is an open-source language; official documentation and +resources will not be behind a paywall.
  6. +
  7. You may not find a resource or solution to fit your exact needs. Try +to be flexible and adapt online solutions to fit your needs.
  8. +
+
+
+ +
+
+

Key Points +

+
+
    +
  • Python is an interpreted language.
  • +
  • Code is commonly developed inside an integrated development +environment.
  • +
  • A typical Python workflow uses base Python and additional Python +packages developed for statistical programming purposes.
  • +
  • In-line and external documentation helps ensure that your code is +readable.
  • +
  • You can find help through the built-in help function and external +resources.
  • +
+
+
+
+
+

Content from Python Fundamentals

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What basic data types can I work with in Python?
  • +
  • How can I create a new variable in Python?
  • +
  • How do I use a function?
  • +
  • Can I change the value associated with a variable after I create +it?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Assign values to variables.
  • +
+
+
+
+
+
+

Variables + +

+
+

Any Python interpreter can be used as a calculator:

+
+

PYTHON +

+
3 + 5 * 4
+
+
+

OUTPUT +

+
23
+
+

This is great but not very interesting. To do anything useful with +data, we need to assign its value to a variable. In Python, we +can assign a value to a variable, using the equals sign +=. For example, we can track the weight of a patient who +weighs 60 kilograms by assigning the value 60 to a variable +weight_kg:

+
+

PYTHON +

+
weight_kg = 60
+
+

From now on, whenever we use weight_kg, Python will +substitute the value we assigned to it. In layperson’s terms, a +variable is a name for a value.

+

In Python, variable names:

+
    +
  • can include letters, digits, and underscores
  • +
  • cannot start with a digit
  • +
  • are case sensitive.
  • +
+

This means that, for example:

+
    +
  • +weight0 is a valid variable name, whereas +0weight is not
  • +
  • +weight and Weight are different +variables
  • +

Types of data + +

+
+

Python knows various types of data. Three common ones are:

+
    +
  • integer numbers
  • +
  • floating point numbers, and
  • +
  • strings.
  • +
+

In the example above, variable weight_kg has an integer +value of 60. If we want to more precisely track the weight +of our patient, we can use a floating point value by executing:

+
+

PYTHON +

+
weight_kg = 60.3
+
+

To create a string, we add single or double quotes around some text. +To identify and track a patient throughout our study, we can assign each +person a unique identifier by storing it in a string:

+
+

PYTHON +

+
patient_id = '001'
+
+

Using Variables in Python + +

+
+

Once we have data stored with variable names, we can make use of it +in calculations. We may want to store our patient’s weight in pounds as +well as kilograms:

+
+

PYTHON +

+
weight_lb = 2.2 * weight_kg
+
+

We might decide to add a prefix to our patient identifier:

+
+

PYTHON +

+
patient_id = 'inflam_' + patient_id
+
+

Built-in Python functions + +

+
+

To carry out common tasks with data and variables in Python, the +language provides us with several built-in functions. To display information to +the screen, we use the print function:

+
+

PYTHON +

+
print(weight_lb)
+print(patient_id)
+
+
+

OUTPUT +

+
132.66
+inflam_001
+
+

When we want to make use of a function, referred to as calling the +function, we follow its name by parentheses. The parentheses are +important: if you leave them off, the function doesn’t actually run! +Sometimes you will include values or variables inside the parentheses +for the function to use. In the case of print, we use the +parentheses to tell the function what value we want to display. We will +learn more about how functions work and how to create our own in later +episodes.

+

We can display multiple things at once using only one +print call:

+
+

PYTHON +

+
print(patient_id, 'weight in kilograms:', weight_kg)
+
+
+

OUTPUT +

+
inflam_001 weight in kilograms: 60.3
+
+

We can also call a function inside of another function call. For example, +Python has a built-in function called type that tells you a +value’s data type:

+
+

PYTHON +

+
print(type(60.3))
+print(type(patient_id))
+
+
+

OUTPUT +

+
<class 'float'>
+<class 'str'>
+
+

Moreover, we can do arithmetic with variables right inside the +print function:

+
+

PYTHON +

+
print('weight in pounds:', 2.2 * weight_kg)
+
+
+

OUTPUT +

+
weight in pounds: 132.66
+
+

The above command, however, did not change the value of +weight_kg:

+
+

PYTHON +

+
print(weight_kg)
+
+
+

OUTPUT +

+
60.3
+
+

To change the value of the weight_kg variable, we have +to assign weight_kg a new value using the +equals = sign:

+
+

PYTHON +

+
weight_kg = 65.0
+print('weight in kilograms is now:', weight_kg)
+
+
+

OUTPUT +

+
weight in kilograms is now: 65.0
+
+
+
+ +
+
+

Variables as Sticky Notes +

+
+

A variable in Python is analogous to a sticky note with a name +written on it: assigning a value to a variable is like putting that +sticky note on a particular value.

+
Value of 65.0 with weight_kg label stuck on it

Using this analogy, we can investigate how assigning a value to one +variable does not change values of other, seemingly +related, variables. For example, let’s store the subject’s weight in +pounds in its own variable:

+
+

PYTHON +

+
# There are 2.2 pounds per kilogram
+weight_lb = 2.2 * weight_kg
+print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
+
+
+

OUTPUT +

+
weight in kilograms: 65.0 and in pounds: 143.0
+
+

Everything in a line of code following the ‘#’ symbol is a comment that is ignored by Python. +Comments allow programmers to leave explanatory notes for other +programmers or their future selves.

+
Value of 65.0 with weight_kg label stuck on it, and value of 143.0 with weight_lb label stuck on it

Similar to above, the expression 2.2 * weight_kg is +evaluated to 143.0, and then this value is assigned to the +variable weight_lb (i.e. the sticky note +weight_lb is placed on 143.0). At this point, +each variable is “stuck” to completely distinct and unrelated +values.

+

Let’s now change weight_kg:

+
+

PYTHON +

+
weight_kg = 100.0
+print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
+
+
+

OUTPUT +

+
weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
+
+
Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lbstuck on it

Since weight_lb doesn’t “remember” where its value comes +from, it is not updated when we change weight_kg.

+
+
+
+
+
+ +
+
+

Check Your Understanding +

+
+

What values do the variables mass and age +have after each of the following statements? Test your answer by +executing the lines.

+
+

PYTHON +

+
mass = 47.5
+age = 122
+mass = mass * 2.0
+age = age - 20
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
`mass` holds a value of 47.5, `age` does not exist
+`mass` still holds a value of 47.5, `age` holds a value of 122
+`mass` now has a value of 95.0, `age`'s value is still 122
+`mass` still has a value of 95.0, `age` now holds 102
+
+
+
+
+
+
+
+ +
+
+

Sorting Out References +

+
+

Python allows you to assign multiple values to multiple variables in +one line by separating the variables and values with commas. What does +the following program print out?

+
+

PYTHON +

+
first, second = 'Grace', 'Hopper'
+third, fourth = second, first
+print(third, fourth)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
Hopper Grace
+
+
+
+
+
+
+
+ +
+
+

Seeing Data Types +

+
+

What are the data types of the following variables?

+
+

PYTHON +

+
planet = 'Earth'
+apples = 5
+distance = 10.5
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(type(planet))
+print(type(apples))
+print(type(distance))
+
+
+

OUTPUT +

+
<class 'str'>
+<class 'int'>
+<class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Key Points +

+
+
    +
  • Basic data types in Python include integers, strings, and +floating-point numbers.
  • +
  • Use variable = value to assign a value to a variable in +order to record it in memory.
  • +
  • Variables are created on demand whenever a value is assigned to +them.
  • +
  • Use print(something) to display the value of +something.
  • +
  • Use # some kind of explanation to add comments to +programs.
  • +
  • Built-in functions are always available to use.
  • +
+
+
+
+

Content from Data Transformation

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I process tabular data files in Python?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain what a library is and what libraries are used for.
  • +
  • Import a Python library and use the functions it contains.
  • +
  • Read tabular data from a file into a program.
  • +
  • Select individual values and subsections from data.
  • +
  • Perform operations on arrays of data.
  • +
+
+
+
+
+
+

Words are useful, but what’s more useful are the sentences and +stories we build with them. Similarly, while a lot of powerful, general +tools are built into Python, specialized tools built up from these basic +units live in libraries that can be +called upon when needed.

+

Loading data into Python + +

+
+

To begin processing the clinical trial inflammation data, we need to +load it into Python. Python can work with many different file types. +Text files can be loaded into Python by using the base Python +function

+
+

PYTHON +

+
Open("filename.txt", "r") 
+
+

where “r” means read only, or if you want to write to the file, you +can use “w”.

+

However, our patient data is in a csv. file, which is more commonly +loaded by using a library. Python has hundreds of thousands of libraries +to choose from to help carry out your work. Importing a library is like +getting a piece of lab equipment out of a storage locker and setting it +up on the bench. Libraries provide additional functionality to the basic +Python package, much like a new piece of equipment adds functionality to +a lab space. Just like in the lab, importing too many libraries can +sometimes complicate and slow down your programs - so we only import +what we need for each program. There are a couple common Python +libraries to load (and work with data).

+

pandas + +

+
+

The first library we will present is called pandas pandas is a +Python library containing a set of functions and specialised data +structures that have been designed to help Python programmers to perform +data analysis tasks in a structured way.

+

Most of the things that pandas can do can be done with basic Python, +but the collected set of pandas functions and data structure makes the +data analysis tasks more consistent in terms of syntax and therefore +aids readabilty.

+

Remember to write the library name with a lower case ‘p’ because the +name of the package and Python is case sensitive.

+
+

Importing the pandas library +

+

Importing the pandas library is done in exactly the same way as for +any other library. In almost all examples of Python code using the +pandas library, it will have been imported and given an alias of +pd. We will follow the same convention.

+
+

PYTHON +

+
import pandas as pd
+
+
+
+

Pandas data structures +

+

There are two main data structure used by pandas, they are the Series +and the Dataframe. The Series equates in general to a vector or a list. +The Dataframe is equivalent to a table. Each column in a pandas +Dataframe is a pandas Series data structure.

+

We will mainly be looking at the Dataframe.

+

We can easily create a Pandas Dataframe by reading a .csv file

+
+
+

Reading a csv file +

+

When we read a csv dataset in base Python we did so by opening the +dataset, reading and processing a record at a time and then closing the +dataset after we had read the last record. Reading datasets in this way +is slow and places all of the responsibility for extracting individual +data items of information from the records on the programmer.

+

The main advantage of this approach, however, is that you only have +to store one dataset record in memory at a time. This means that if you +have the time, you can process datasets of any size.

+

In Pandas, csv files are read as complete datasets. You do not have +to explicitly open and close the dataset. All of the dataset records are +assembled into a Dataframe. If your dataset has column headers in the +first record then these can be used as the Dataframe column names. You +can explicitly state this in the parameters to the call, but pandas is +usually able to infer that there ia a header row and use it +automatically.

+

To tell Python that we’d like to start using pandas, we need to import it:

+
+

PYTHON +

+
import pandas as pd
+
+

Often, libraries are given an alias or a short form name, in this +case pandas is given the alias “pd”. Aliases for common data analysis +libraries include:

+
+

PYTHON +

+
import pandas as pd
+import numpy as np
+import matplotlib as plt
+import seaborn as sns
+
+

Once we’ve imported the library, we can ask the library to read our +data file for us:

+
+

PYTHON +

+
pd.read_csv("filename.csv)
+
+

pandas is a commonly used library for working with and analysing +data. However, we will be working with a different package for the +remainder of this course. If you would like to learn more about data +manipulation and analysis using pandas, we recommend checking out Data Analysis and +Visualization with Python for Social Scientists.

+
+

numpy + +

+
+

The second package that we will present is called NumPy, which stands for Numerical +Python. In general, you should use this library when you want to do +fancy things with lots of numbers, especially if you have matrices or +arrays. Numpy matrices are typically lighter weight with better +performance, particularly when working with large datasets.

+

We will be using this package to work with our clinical trial +inflammation data.

+

To tell Python that we’d like to start using NumPy, we need to import it:

+
+

PYTHON +

+
import numpy as np
+
+

Now that we have imported the library, we can ask the library (by +using the alisa np) to read our data file for us:

+
+

PYTHON +

+
np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+
+

OUTPUT +

+
array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+
+

The expression np.loadtxt(...) is a function call that asks Python +to run the function +loadtxt which belongs to the np library. The +dot notation in Python is used most of all as an object +attribute/property specifier or for invoking its method. +object.property will give you the object.property value, +object_name.method() will invoke on object_name method.

+

As an example, John Smith is the John that belongs to the Smith +family. We could use the dot notation to write his name +smith.john, just as loadtxt is a function that +belongs to the np library.

+

np.loadtxt has two parameters: the name of the file we +want to read and the delimiter +that separates values on a line. These both need to be character strings +(or strings for short), so we put +them in quotes.

+

Since we haven’t told it to do anything else with the function’s +output, the notebook displays it. +In this case, that output is the data we just loaded. By default, only a +few rows and columns are shown (with ... to omit elements +when displaying big arrays). Note that, to save space when displaying +NumPy arrays, Python does not show us trailing zeros, so +1.0 becomes 1..

+

Our call to np.loadtxt read our file but didn’t save the +data in memory. To do that, we need to assign the array to a variable. +In a similar manner to how we assign a single value to a variable, we +can also assign an array of values to a variable using the same syntax. +Let’s re-run np.loadtxt and save the returned data:

+
+

PYTHON +

+
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+

This statement doesn’t produce any output because we’ve assigned the +output to the variable data. If we want to check that the +data have been loaded, we can print the variable’s value:

+
+

PYTHON +

+
print(data)
+
+
+

OUTPUT +

+
[[ 0.  0.  1. ...,  3.  0.  0.]
+ [ 0.  1.  2. ...,  1.  0.  1.]
+ [ 0.  1.  1. ...,  2.  1.  1.]
+ ...,
+ [ 0.  1.  1. ...,  1.  1.  1.]
+ [ 0.  0.  0. ...,  0.  2.  0.]
+ [ 0.  0.  1. ...,  1.  1.  0.]]
+
+

Now that the data are in memory, we can manipulate them. First, let’s +ask what type of thing +data refers to:

+
+

PYTHON +

+
print(type(data))
+
+
+

OUTPUT +

+
<class 'np.ndarray'>
+
+

The output tells us that data currently refers to an +N-dimensional array, the functionality for which is provided by the +NumPy library. These data correspond to arthritis patients’ +inflammation. The rows are the individual patients, and the columns are +their daily inflammation measurements.

+
+
+ +
+
+

Data Type +

+
+

A Numpy array contains one or more elements of the same type. The +type function will only tell you that a variable is a NumPy +array but won’t tell you the type of thing inside the array. We can find +out the type of the data contained in the NumPy array.

+
+

PYTHON +

+
print(data.dtype)
+
+
+

OUTPUT +

+
float64
+
+

This tells us that the NumPy array’s elements are floating-point +numbers.

+
+
+
+

With the following command, we can see the array’s shape:

+
+

PYTHON +

+
print(data.shape)
+
+
+

OUTPUT +

+
(60, 40)
+
+

The output tells us that the data array variable +contains 60 rows and 40 columns. When we created the variable +data to store our arthritis data, we did not only create +the array; we also created information about the array, called members or attributes. This extra +information describes data in the same way an adjective +describes a noun. data.shape is an attribute of +data which describes the dimensions of data. +We use the same dotted notation for the attributes of variables that we +use for the functions in libraries because they have the same +part-and-whole relationship.

+

If we want to get a single number from the array, we must provide an +index in square brackets after the +variable name, just as we do in math when referring to an element of a +matrix. Our inflammation data has two dimensions, so we will need to use +two indices to refer to one specific value:

+
+

PYTHON +

+
print('first value in data:', data[0, 0])
+
+
+

OUTPUT +

+
first value in data: 0.0
+
+
+

PYTHON +

+
print('middle value in data:', data[29, 19])
+
+
+

OUTPUT +

+
middle value in data: 16.0
+
+

The expression data[29, 19] accesses the element at row +30, column 20. While this expression may not surprise you, +data[0, 0] might. Programming languages like Fortran, +MATLAB and R start counting at 1 because that’s what human beings have +done for thousands of years. Languages in the C family (including C++, +Java, Perl, and Python) count from 0 because it represents an offset +from the first value in the array (the second value is offset by one +index from the first value). This is closer to the way that computers +represent arrays (if you are interested in the historical reasons behind +counting indices from zero, you can read Mike +Hoye’s blog post). As a result, if we have an M×N array in Python, +its indices go from 0 to M-1 on the first axis and 0 to N-1 on the +second. It takes a bit of getting used to, but one way to remember the +rule is that the index is how many steps we have to take from the start +to get the item we want.

+
'data' is a 3 by 3 numpy array containing row 0: ['A', 'B', 'C'], row 1: ['D', 'E', 'F'], androw 2: ['G', 'H', 'I']. Starting in the upper left hand corner, data[0, 0] = 'A', data[0, 1] = 'B',data[0, 2] = 'C', data[1, 0] = 'D', data[1, 1] = 'E', data[1, 2] = 'F', data[2, 0] = 'G',data[2, 1] = 'H', and data[2, 2] = 'I',in the bottom right hand corner.
+
+ +
+
+

In the Corner +

+
+

What may also surprise you is that when Python displays an array, it +shows the element with index [0, 0] in the upper left +corner rather than the lower left. This is consistent with the way +mathematicians draw matrices but different from the Cartesian +coordinates. The indices are (row, column) instead of (column, row) for +the same reason, which can be confusing when plotting data.

+
+
+
+

Slicing data + +

+
+

An index like [30, 20] selects a single element of an +array, but we can select whole sections as well. For example, we can +select the first ten days (columns) of values for the first four +patients (rows) like this:

+
+

PYTHON +

+
print(data[0:4, 0:10])
+
+
+

OUTPUT +

+
[[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.]
+ [ 0.  1.  2.  1.  2.  1.  3.  2.  2.  6.]
+ [ 0.  1.  1.  3.  3.  2.  6.  2.  5.  9.]
+ [ 0.  0.  2.  0.  4.  2.  2.  1.  6.  7.]]
+
+

The slice 0:4 means, +“Start at index 0 and go up to, but not including, index 4”. Again, the +up-to-but-not-including takes a bit of getting used to, but the rule is +that the difference between the upper and lower bounds is the number of +values in the slice.

+

We don’t have to start slices at 0:

+
+

PYTHON +

+
print(data[5:10, 0:10])
+
+
+

OUTPUT +

+
[[ 0.  0.  1.  2.  2.  4.  2.  1.  6.  4.]
+ [ 0.  0.  2.  2.  4.  2.  2.  5.  5.  8.]
+ [ 0.  0.  1.  2.  3.  1.  2.  3.  5.  3.]
+ [ 0.  0.  0.  3.  1.  5.  6.  5.  5.  8.]
+ [ 0.  1.  1.  2.  1.  3.  5.  3.  5.  8.]]
+
+

We also don’t have to include the upper and lower bound on the slice. +If we don’t include the lower bound, Python uses 0 by default; if we +don’t include the upper, the slice runs to the end of the axis, and if +we don’t include either (i.e., if we use ‘:’ on its own), the slice +includes everything:

+
+

PYTHON +

+
small = data[:3, 36:]
+print('small is:')
+print(small)
+
+

The above example selects rows 0 through 2 and columns 36 through to +the end of the array.

+
+

OUTPUT +

+
small is:
+[[ 2.  3.  0.  0.]
+ [ 1.  1.  0.  1.]
+ [ 2.  2.  1.  1.]]
+
+

Content from List and Dictionary Methods

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I store many values together?
  • +
  • How can I create a list succinctly?
  • +
  • How can I efficiently access nested data?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Identify and create lists and dictionaries
  • +
  • Understand the properties and behaviours of lists and +dictionaries
  • +
  • Access values in lists and dictionaries
  • +
  • Create and access values from nest lists and dictionaries
  • +
+
+
+
+
+
+

Values can also be stored in other Python data types such as lists, +dictionaries, sets and tuples. Storing objects in a list is a fast and +versatile way to apply transformations across a sequence of values. +Storing objects in dictionary as key-value pairs is useful for +extracting specific values i.e. performing lookup operations.

+

Create and access lists + +

+
+

Lists have the following properties and behaviours:

+
    +
  • A single list can store different primitive object types and even +other lists
  • +
  • Lists are ordered and have a 0-based index
  • +
  • Lists can be appended to using the methods append() or +insert() +
  • +
  • Values inside a list can be removed using the methods +remove() or pop() +
  • +
  • Two lists can be concatenated with the operator + +
  • +
  • Values inside a list can be conditionally iterated through
  • +
  • A list is mutable i.e. the values inside a list can be modified in +place
  • +
+

To create a list, values are contained within square brackets +i.e. [] and individually separated by commas. The function +list() can also be used to create a list of values from an +iterable object like a string, set or tuple.

+
+

PYTHON +

+
# Create a list of integers using []
+list_1 = [1, 3, 5, 7]
+print(list_1)
+
+
+

OUTPUT +

+
[1, 3, 5, 7]
+
+
+

PYTHON +

+
# Unlike atomic vectors in R, a list can contain multiple primitive object types
+list_2 = [1, "one", 1.0, True]
+print(list_2)
+
+
+

OUTPUT +

+
[1, 'one', 1.0, True]
+
+
+

PYTHON +

+
# You can also use list() on an iterable object to convert it into a list
+string = 'abcdefg'  
+list_3 = list(string)  
+print(list_3)
+
+
+

OUTPUT +

+
['a', 'b', 'c', 'd', 'e', 'f', 'g']
+
+

Because lists have a 0-based index, we can access individual values +by their list index position. For 0-based indexes, the first value +always starts at position 0 i.e. the first element has an index of 0. +Accessing multiple values by their index positions is also referred to +as slicing or subsetting a list.

+

Note that we can use negative numbers as indices in Python. When we +do so, the index -1 gives us the last element in the list, +-2 gives us the second to last element in the list, and so +on.

+
+

PYTHON +

+
# Extract individual values from list_3
+print('first value:', list_3[0])
+print('second value:', list_3[1])
+print('last value:', list_3[-1])
+
+
+

OUTPUT +

+
first value: a
+second value: b
+last value: g
+
+
+

PYTHON +

+
# A syntax quirk for slicing values is to +1 to the last value's index 
+# To extract from index 0 to 2, we need to slice from [0:2+1] or [0:3]
+
+# Extract the first three values from list_3
+print('first 3 values:', list_3[0:3])
+
+# Start from index 0 and extract values from each subsequent second position
+print('every second value:', list_3[0::2])
+
+# Start from index 1, end at index 3 and extract from each subsequent second position
+print('every second value from index 1 to 3:', list_3[1:4:2])
+
+
+

OUTPUT +

+
first 3 values: ['a', 'b', 'c']
+every second value: ['a', 'c', 'e', 'g']
+every second value from index 1 to 3: ['b', 'd']
+
+

Change list values + +

+
+

Data which can be modified in place is called mutable, while data +which cannot be modified is called immutable. Strings and numbers are +immutable in that when we want to change the value of a string or number +variable, we can only replace the old value with a completely new +value.

+
+

PYTHON +

+
string = 'abcde'
+string[0] = 'b' # Produces a type error as strings are immutable
+
+# TypeError: 'str' object does not support item assignment
+
+

In contrast, lists are mutable and we can modify them after they have +been created. We can change individual values, append new values, or +reorder the whole list through sorting.

+
+

PYTHON +

+
list_4 = ['apple', 'pear', 'plum']
+print('original list_4:', list_4)
+
+# Change the first value i.e. modify the list in place
+list_4[0] = 'banana'
+print('modified list_4:', list_4)
+
+# Add new value to list using the method .insert(index number, value)
+list_4.insert(1, 'apple') # Index 1 refers to the second position
+print('appended list_4:', list_4)
+
+
+

OUTPUT +

+
original list_4: ['apple', 'pear', 'plum']
+modified list_4: ['banana', 'pear', 'plum']
+appended list_4: ['banana', 'apple', 'pear', 'plum']
+
+
+

PYTHON +

+
# Sorting a list also modifies it in place
+list_5 = [2, 1, 3, 7]
+list_5.sort()
+print('list_5:', list_5)
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+
+

However, be careful when modifying data in-place. If two variables +refer to the same list, and you modify the list value, it will change +for both variables!

+
+

PYTHON +

+
# When we assign list_6 to list_5, it means both list_6 and list_5 point to the
+# same list object, not that list_6 is a copy of list_5.  
+
+list_6 = list_5  
+print('list_5:', list_5)
+print('list_6:', list_6)
+
+# Change the first value in list_6 from 1 to 2 
+list_6[0] = 2 
+
+print('modified list_6:', list_6)
+print('unmodified list_5:', list_5)
+
+# Warning: list_5 and list_6 have both been modified in place!
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+list_6: [1, 2, 3, 7]
+modified list_6: [2, 2, 3, 7]
+unmodified list_5: [2, 2, 3, 7]
+
+

Because of this behaviour, code which modifies data in place should +be handled with care. You can also avoid this behaviour by expliciting +creating a copy of the original list and modifying only the object copy. +This is why creating a copy of the original data object can be useful in +Python.

+
+

PYTHON +

+
list_5 = [1, 2, 3, 7]
+list_7 = list_5.copy()  
+print('list_5:', list_5)
+print('list_7:', list_7)
+
+# As list_7 is a completely new object copied from list_5, modifying list_7 does
+# not affect list_5.  
+
+list_7[0] = 2 
+print('modified list_7:', list_7)
+print('unmodified list_5:', list_5)
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+list_7: [1, 2, 3, 7]
+modified list_7: [2, 2, 3, 7]
+unmodified list_5: [1, 2, 3, 7]
+
+

Useful list functions + +

+
+

There are a lot of functions and methods which can be applied to +lists, such as len(), max(), +index() and so forth. Mathematical operations do not work +on lists of integers, with the exception of +.

+

Note that + concatenates two lists into a single longer +list, rather than outputting the sum of two lists of numbers.

+
+

PYTHON +

+
list_8 = [1, 2, 3]
+list_9 = [4, 5, 6]
+
+list_8 + list_9 # This concatenates the lists and does not sum the two lists together
+
+
+

OUTPUT +

+
[1, 2, 3, 4, 5, 6]
+
+

In your spare time after this workshop, you can search for different +list functions and methods and test them out yourselves.

+

Nested lists + +

+
+

We have previously mentioned that lists can be used to store other +Python object types, including lists. This means that we can create +nested lists in Python i.e. lists containing lists containing values. +This property is useful when we have a collection of values that we want +to access or transform as a subgroup.

+

To create a nested list, we also use [] or +list() to contain one or more lists of values of +interest.

+
+

PYTHON +

+
veg_stock = [
+    ['lettuce', 'lettuce', 'tomato', 'zucchini'],
+    ['lettuce', 'lettuce', 'carrot', 'zucchini'],
+    ['lettuce', 'basil', 'tomato', 'zucchini']
+    ]
+
+# Check that veg_stock is a list object
+print(type(veg_stock))
+
+# Check that the first value in veg_stock is itself a list
+print(veg_stock[0], 'has type', type(veg_stock[0]))  
+
+
+

OUTPUT +

+
<class 'list'>
+['lettuce', 'lettuce', 'tomato', 'zucchini'] has type <class 'list'>
+
+

To extract the first sub-list within the veg_stock list +object, we refer to its index like we would with any other value inside +a list i.e. veg_stock[1] points to the second sub-list +within the veg_stock list.

+

To access an individual string value inside a sub-list, we make use +of a second index, which points to an individual value inside the +sub-list.

+
+

PYTHON +

+
print(veg_stock[0]) # Access the first sub-list 
+print(veg_stock[0][0]) # Access the first value in the first sub-list 
+
+print(type(veg_stock[0])) # The first value in veg_stock is a list
+print(type(veg_stock[0][0])) # The first value in the first list in veg_stock is a string
+
+
+

OUTPUT +

+
['lettuce', 'lettuce', 'tomato', 'zucchini']
+lettuce
+<class 'list'>
+<class 'str'>
+
+

In general, however, when we are analysing a large collection of +values, the best practice is to structure those values in columns and +rows as a tabular Pandas data frame object. This is covered in another +Carpentries Course called Python +for Social Sciences.

+

Lists are still incredibly versatile and useful when you have a +collection of values that need to be efficiently accessed or +transformed. For example, data frame column names are commonly extracted +and stored inside a list, so that the same transformation can then be +mapped across multiple columns.

+

Create and access dictionaries + +

+
+

A dictionary is a Python data type that is particularly suited for +enabling quick lookup operations on unstructured data sets.

+

A dictionary can therefore be thought of as an unordered list where +every item or value is associated with a unique key (i.e. a self-defined +index of unique strings or numbers). The index values are called keys +and a dictionary contains key-value pairs with the format +{key: value(s)}.

+

Dictionaries can be created by listing individual key-values pairs +inside {} or using dict().

+
+

PYTHON +

+
# A key-value pair can contain single or multiple values  
+# Keys are treated as case sensitive and unique
+# Multiple values are first stored inside a list  
+
+teams = {
+    'data science': ['Mei Ling', 'Paul', 'Gwen', 'Suresh'],
+    'user design': ['Amy', 'Linh', 'Sasha'],
+    'software dev': ['David', 'Prya'],
+    'comms': 'Taylor' 
+    } 
+
+

When using dict(), we need to indicate which key is +associated with which value. This can be done directly using tuples, +direct association i.e. using = or using +zip(), which creates a set of tuples from an iterable +list.

+
+

PYTHON +

+
# To use dict(), key-value pairs are can be stored inside tuples  
+ds_emp_status = dict([
+        ('Mei Ling', 'full time'),
+        ('Paul', 'full time'),
+        ('Gwen', 'part time'),
+        ('Suresh', 'part time')
+    ])  
+
+# Key-value pairs can also be assigned by direct association  
+# Keys cannot be strings i.e. wrapped in '' using this approach
+ud_emp_status = dict(
+    Amy = 'full time',
+    Linh = 'full time',
+    Sasha = 'casual' 
+    ) 
+
+# zip() can also be used if each key has only one value  
+sd_emp_status = dict(zip(
+    ['David', 'Prya'],
+    ['full time', 'full time']
+    ))
+
+

To access a specific value inside a dictionary, we need to specify +its key using []. This is similar to slicing or subsetting +a list by specifying its index using [].

+
+

PYTHON +

+
# Access the values associated with the key 'data science'
+print(teams['data science'])
+
+print('The object teams is of type', type(teams))
+print('The dict value', teams['data science'], 'is of type', type(teams['data science']))
+
+
+

OUTPUT +

+
['Mei Ling', 'Paul', 'Gwen', 'Suresh']
+The data object teams is of type <class 'dict'>
+The value ['Mei Ling', 'Paul', 'Gwen', 'Suresh'] is of type <class 'list'>
+
+

We can also access a value from a dictionary using the +get() method.

+
+

PYTHON +

+
print(teams.get('user design'))
+
+# get() also enables us to return an alternate string when the key is not found   
+# This prevents our code from returning an error message that halts the analysis
+
+print(teams.get('data engineering', 'WARNING: key does not exist'))
+
+
+

OUTPUT +

+
['Amy', 'Linh', 'Sasha']
+WARNING: key does not exist
+
+

To access data inside a dictionary, we can also perform the following +other actions:

+
    +
  • Check whether a key exists in a dictionary using the keyword +in +
  • +
  • Retrieve unique dictionary keys using dict.keys() +
  • +
  • Retrieve dictionary values using dict.values() +
  • +
  • Retrieve dictionary items using dict.items() +
  • +
+
+

PYTHON +

+
# Check whether a key exists in a dictionary 
+print('data science' in teams) 
+print('Data Science' in teams) # Keys are case sensitive  
+
+# Retrieve all dictionary keys  
+print(teams.keys())
+print(sd_emp_status.keys())
+
+# Retrieve all dictionary values  
+print(sd_emp_status.values())  
+
+# Retrieve all dictionary key-value pairs
+print(sd_emp_status.items())
+
+
+

OUTPUT +

+
True
+False
+dict_keys(['data science', 'user design', 'software dev', 'comms'])
+dict_keys(['David', 'Prya'])
+dict_values(['full time', 'full time'])
+dict_items([('David', 'full time'), ('Prya', 'full time')])
+
+

To add a new key-value pair to an existing dictionary, we can create +a new key and directly attach a new value to it using = or +alternatively use the method update().

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())  
+
+# Add new key-value pair using direct assignment  
+sd_emp_status['Mohammad'] = 'full time'
+
+# Add new key-value pair using update({'key': 'value'})   
+sd_emp_status.update({'Carrie': 'part time'})
+
+print('updated dict items:', sd_emp_status.items())    
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])
+updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'part time')])
+
+

Because keys are unique, a dictionary cannot contain two keys with +the same name. This means that adding an item using a key that is +already present in the dictionary will cause the previous value to be +overwritten.

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())  
+
+# As the key 'Carrie' already exists, its value will be overwritten
+sd_emp_status['Carrie'] = 'full time'
+print('updated dict items:', sd_emp_status.items())  
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'part time')])
+updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'full time')])
+
+

To remove a key-value pair for an existing dictionary, we can use the +del keyword or the method pop(). Using +pop() also enables us to return an alternate string if we +trt to remove a non-existing key, which prevents our code from returning +an error message that halts the analysis.

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())
+
+# Delete dictionary keys using del and pop()
+del sd_emp_status['Mohammad']
+sd_emp_status.pop('Carrie')
+sd_emp_status.pop('Anuradha', 'WARNING: key does not exist') # Does not generate an error
+
+print('modified dict items:', sd_emp_status.items())  
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'full time')])
+modified dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])
+
+

Nested dictionaries + +

+
+

Similar to lists, dictionaries can be nested as we can also store +dictionaries as values inside a key-value pair using {}. +Nested dictionaries are useful when we need to store unstructured data +in a complex structure. For example, JSON data is commonly used for +transmitting data in web applications and often exists in a nested +structure that can be stored using nested dictionaries in Python.

+
+

PYTHON +

+
# Individual dictionaries are enclosed in {} and separated by a comma
+nested_dict = {
+    'dict_1': { # First key is a dictionary of key-value pairs 
+        'key_1a': 'value_1a',
+        'key_1b': 'value_1b'
+                },
+    'dict_2': { # Second key is another dictionary of key-value pairs
+        'key_2a': 'value_2a',
+        'key_2b': 'value_2b'
+                }
+            }
+
+print(nested_dict)
+
+
+

OUTPUT +

+
{'dict_1': {'key_1a': 'value_1a', 'key_1b': 'value_1b'},
+ 'dict_2': {'key_2a': 'value_2a', 'key_2b': 'value_2b'}}
+
+

Similar to working with nested lists, to extract a value from the +first sub-dictionary, we specify both the main dictionary and +sub-dictionary keys using [].

+
+

PYTHON +

+
# Extract the value for key 2a in dict_2
+print('original value:', nested_dict['dict_2']['key_2a'])
+
+# Adding or updating a value can be done through the same approach
+nested_dict['dict_2']['key_2a'] = "modified_value_2a"  
+
+print('modified value:', nested_dict['dict_2']['key_2a'])
+
+
+

OUTPUT +

+
original value: value_2a
+modified value: modified_value_2a
+
+

Optional: converting lists and dictionaries to Pandas data +frames + +

+
+

Lists and dictionaries can be easily converted into a tabular Pandas +data frame format. This can be useful when you need to create a small +data set for unit testing purposes.

+
+

PYTHON +

+
# Import pandas library
+import pandas as pd
+
+# Create a dictionary with each key-value pair representing a data frame column
+data = {
+    'col_1': [3, 2, 1, 0],
+    'col_2': ['a', 'b', 'c', 'd']
+    }
+
+df = pd.DataFrame.from_dict(data) 
+
+print(df) # Outputs data as a tabular Pandas data frame   
+print(type(df))
+
+
+

OUTPUT +

+
   col_1 col_2
+0      3     a
+1      2     b
+2      1     c
+3      0     d
+<class 'pandas.core.frame.DataFrame'>
+
+
+
+ +
+
+

Key Points +

+
+
    +
  • Lists can contain any Python object including other lists
  • +
  • Lists are ordered i.e. indexed and can therefore be sliced by index +number
  • +
  • Unlike strings and integers, the values inside a list can be +modified in place
  • +
  • A list which contains other lists is referred to as a nested +list
  • +
  • Dictionaries behave like unordered lists and are defined using +key-value pairs
  • +
  • Dictionary keys are unique
  • +
  • A dictionary which contains other dictionaries is referred to as a +nested dictionary
  • +
  • Values inside nested lists and dictionaries can be accessed by an +additional index
  • +
+
+
+
+

Content from Loops and Conditional Logic

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I do the same operations on many different values?
  • +
  • How can my programs do different things based on data values?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify and create loops
  • +
  • use logical statements to allow for decision-based operations in +code
  • +
+
+
+
+
+
+

This episode contains two lessons:

+
    +
  1. Repeating Actions with +Loops
  2. +
  3. Making Choices with +Conditional Logic
  4. +
+

Repeating Actions with Loops + +

+
+

In the episode about visualizing +data, we will see Python code that plots values of interest from our +first inflammation dataset (inflammation-01.csv), which +revealed some suspicious features.

+
Line graphs showing average, maximum, and minimum inflammation across all patients over a 40-day period.

We have a dozen data sets right now and potentially more on the way +if Dr. Maverick can keep up their surprisingly fast clinical trial rate. +We want to create plots for all of our data sets with a single +statement. To do that, we’ll have to teach the computer how to repeat +things.

+

An example task that we might want to repeat is accessing numbers in +a list, which we will do by printing each number on a line of its +own.

+
+

PYTHON +

+
odds = [1, 3, 5, 7]
+
+

In Python, a list is basically an ordered +collection of elements, and every element has a unique number associated +with it — its index. This means that we can access elements in a list +using their indices. For example, we can get the first number in the +list odds, by using odds[0]. One way to print +each number is to use four print statements:

+
+

PYTHON +

+
print(odds[0])
+print(odds[1])
+print(odds[2])
+print(odds[3])
+
+
+

OUTPUT +

+
1
+3
+5
+7
+
+

This is a bad approach for three reasons:

+
    +
  1. Not scalable. Imagine you need to print a list +that has hundreds of elements. It might be easier to type them in +manually.

  2. +
  3. Difficult to maintain. If we want to decorate +each printed element with an asterisk or any other character, we would +have to change four lines of code. While this might not be a problem for +small lists, it would definitely be a problem for longer ones.

  4. +
  5. Fragile. If we use it with a list that has more +elements than what we initially envisioned, it will only display part of +the list’s elements. A shorter list, on the other hand, will cause an +error because it will be trying to display elements of the list that do +not exist.

  6. +
+
+

PYTHON +

+
odds = [1, 3, 5]
+print(odds[0])
+print(odds[1])
+print(odds[2])
+print(odds[3])
+
+
+

PYTHON +

+
1
+3
+5
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-3-7974b6cdaf14> in <module>()
+      3 print(odds[1])
+      4 print(odds[2])
+----> 5 print(odds[3])
+
+IndexError: list index out of range
+
+

Here’s a better approach: a for +loop

+
+

PYTHON +

+
odds = [1, 3, 5, 7]
+for num in odds:
+    print(num)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+
+

This is shorter — certainly shorter than something that prints every +number in a hundred-number list — and more robust as well:

+
+

PYTHON +

+
odds = [1, 3, 5, 7, 9, 11]
+for num in odds:
+    print(num)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+9
+11
+
+

The improved version uses a for +loop to repeat an operation — in this case, printing — once for each +thing in a sequence. The general form of a loop is:

+
+

PYTHON +

+
for variable in collection:
+    # do things using variable, such as print
+
+

Using the odds example above, the loop might look like this:

+
Loop variable 'num' being assigned the value of each element in the list odds in turn andthen being printed

where each number (num) in the variable +odds is looped through and printed one number after +another. The other numbers in the diagram denote which loop cycle the +number was printed in (1 being the first loop cycle, and 6 being the +final loop cycle).

+

We can call the loop +variable anything we like, but there must be a colon at the end of +the line starting the loop, and we must indent anything we want to run +inside the loop. Unlike many other languages, there is no command to +signify the end of the loop body (e.g., end for); +everything indented after the for statement belongs to the +loop.

+
+
+ +
+
+

What’s in a name? +

+
+

In the example above, the loop variable was given the name +num as a mnemonic; it is short for ‘number’. We can choose +any name we want for variables. We might just as easily have chosen the +name banana for the loop variable, as long as we use the +same name when we invoke the variable inside the loop:

+
+

PYTHON +

+
odds = [1, 3, 5, 7, 9, 11]
+for banana in odds:
+   print(banana)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+9
+11
+
+

It is a good idea to choose variable names that are meaningful, +otherwise it would be more difficult to understand what the loop is +doing.

+
+
+
+

Here’s another loop that repeatedly updates a variable:

+
+

PYTHON +

+
length = 0
+names = ['Curie', 'Darwin', 'Turing']
+for value in names:
+    length = length + 1
+print('There are', length, 'names in the list.')
+
+
+

OUTPUT +

+
There are 3 names in the list.
+
+

It’s worth tracing the execution of this little program step by step. +Since there are three names in names, the statement on line +4 will be executed three times. The first time around, +length is zero (the value assigned to it on line 1) and +value is Curie. The statement adds 1 to the +old value of length, producing 1, and updates +length to refer to that new value. The next time around, +value is Darwin and length is 1, +so length is updated to be 2. After one more update, +length is 3; since there is nothing left in +names for Python to process, the loop finishes and the +print function on line 5 tells us our final answer.

+

Note that a loop variable +is a variable that is being used to record progress in a loop. It still +exists after the loop is over, and we can re-use variables previously +defined as loop variables as +well:

+
+

PYTHON +

+
name = 'Rosalind'
+for name in ['Curie', 'Darwin', 'Turing']:
+    print(name)
+print('after the loop, name is', name)
+
+
+

OUTPUT +

+
Curie
+Darwin
+Turing
+after the loop, name is Turing
+
+

Note also that finding the length of an object is such a common +operation that Python actually has a built-in function to do it called +len:

+
+

PYTHON +

+
print(len([0, 1, 2, 3]))
+
+
+

OUTPUT +

+
4
+
+

len is much faster than any function we could write +ourselves, and much easier to read than a two-line loop; it will also +give us the length of many other data types we haven’t seen yet, so we +should always use it when we can.

+
+
+ +
+
+

From 1 to N +

+
+

Python has a built-in function called range that +generates a sequence of numbers range can accept 1, 2, or 3 +parameters.

+
    +
  • If one parameter is given, range generates a sequence +of that length, starting at zero and incrementing by 1. For example, +range(3) produces the numbers 0, 1, 2.
  • +
  • If two parameters are given, range starts at the first +and ends just before the second, incrementing by one. For example, +range(2, 5) produces 2, 3, 4.
  • +
  • If range is given 3 parameters, it starts at the first +one, ends just before the second one, and increments by the third one. +For example, range(3, 10, 2) produces +3, 5, 7, 9.
  • +
+

Using range, write a loop that uses range +to print the first 3 natural numbers:

+
+

OUTPUT +

+
1
+2
+3
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
for number in range(1, 4):
+   print(number)
+
+
+
+
+
+
+
+ +
+
+

Understanding the loops +

+
+

Given the following loop:

+
+

PYTHON +

+
word = 'oxygen'
+for letter in word:
+    print(letter)
+
+

How many times is the body of the loop executed?

+
    +
  • 3 times
  • +
  • 4 times
  • +
  • 5 times
  • +
  • 6 times
  • +
+
+
+
+
+
+ +
+
+

The body of the loop is executed 6 times.

+
+
+
+
+
+
+ +
+
+

Computing Powers With Loops +

+
+

Exponentiation is built into Python:

+
+

PYTHON +

+
print(5 ** 3)
+
+
+

OUTPUT +

+
125
+
+

Write a loop that calculates the same result as 5 ** 3 +using multiplication (and without exponentiation).

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
result = 1
+for number in range(0, 3):
+    result = result * 5
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Summing a List +

+
+

Write a loop that calculates the sum of elements in a list by adding +each element and printing the final value, so +[124, 402, 36] prints 562

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
numbers = [124, 402, 36]
+summed = 0
+for num in numbers:
+    summed = summed + num
+print(summed)
+
+
+
+
+
+
+
+ +
+
+

Computing the Value of a Polynomial +

+
+

The built-in function enumerate takes a sequence (e.g., +a list) and generates a new sequence of the +same length. Each element of the new sequence is a pair composed of the +index (0, 1, 2,…) and the value from the original sequence:

+
+

PYTHON +

+
for idx, val in enumerate(a_list):
+    # Do something using idx and val
+
+

The code above loops through a_list, assigning the index +to idx and the value to val.

+

Suppose you have encoded a polynomial as a list of coefficients in +the following way: the first element is the constant term, the second +element is the coefficient of the linear term, the third is the +coefficient of the quadratic term, etc.

+
+

PYTHON +

+
x = 5
+coefs = [2, 4, 3]
+y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2
+print(y)
+
+
+

OUTPUT +

+
97
+
+

Write a loop using enumerate(coefs) which computes the +value y of any polynomial, given x and +coefs.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
y = 0
+for idx, coef in enumerate(coefs):
+    y = y + coef * x**idx
+
+
+
+
+
+

Making Choices with Conditional Logic + +

+
+

How can we use Python to automatically recognize different situations +we encounter with our data and take a different action for each? In this +lesson, we’ll learn how to write code that runs only when certain +conditions are true.

+
+

Conditionals +

+

We can ask Python to take different actions, depending on a +condition, with an if statement:

+
+

PYTHON +

+
num = 37
+if num > 100:
+    print('greater')
+else:
+    print('not greater')
+print('done')
+
+
+

OUTPUT +

+
not greater
+done
+
+

The second line of this code uses the keyword if to tell +Python that we want to make a choice. If the test that follows the +if statement is true, the body of the if +(i.e., the set of lines indented underneath it) is executed, and +“greater” is printed. If the test is false, the body of the +else is executed instead, and “not greater” is printed. +Only one or the other is ever executed before continuing on with program +execution to print “done”:

+
A flowchart diagram of the if-else construct that tests if variable num is greater than 100

Conditional +statements don’t have to include an else. If there +isn’t one, Python simply does nothing if the test is false:

+
+

PYTHON +

+
num = 53
+print('before conditional...')
+if num > 100:
+    print(num, 'is greater than 100')
+print('...after conditional')
+
+
+

OUTPUT +

+
before conditional...
+...after conditional
+
+

We can also chain several tests together using elif, +which is short for “else if”. The following Python code uses +elif to print the sign of a number.

+
+

PYTHON +

+
num = -3
+
+if num > 0:
+    print(num, 'is positive')
+elif num == 0:
+    print(num, 'is zero')
+else:
+    print(num, 'is negative')
+
+
+

OUTPUT +

+
-3 is negative
+
+

Note that to test for equality we use a double equals sign +== rather than a single equals sign = which is +used to assign values.

+
+
+ +
+
+

Comparing in Python +

+
+

Along with the > and == operators we +have already used for comparing values in our conditionals, there are a +few more options to know about:

+
    +
  • +>: greater than
  • +
  • +<: less than
  • +
  • +==: equal to
  • +
  • +!=: does not equal
  • +
  • +>=: greater than or equal to
  • +
  • +<=: less than or equal to
  • +
+
+
+
+

We can also combine tests using and and or. +and is only true if both parts are true:

+
+

PYTHON +

+
if (1 > 0) and (-1 >= 0):
+    print('both parts are true')
+else:
+    print('at least one part is false')
+
+
+

OUTPUT +

+
at least one part is false
+
+

while or is true if at least one part is true:

+
+

PYTHON +

+
if (1 < 0) or (1 >= 0):
+    print('at least one test is true')
+
+
+

OUTPUT +

+
at least one test is true
+
+
+
+ +
+
+

+True and False +

+
+

True and False are special words in Python +called booleans, which represent truth values. A statement +such as 1 < 0 returns the value False, +while -1 < 0 returns the value True.

+
+
+
+
+
+

Checking Our Data +

+

Now that we’ve seen how conditionals work, we can use them to check +for the suspicious features we saw in our inflammation data. We are +about to use functions provided by the numpy module again. +Therefore, if you’re working in a new Python session, make sure to load +the module with:

+
+

PYTHON +

+
import numpy
+
+

From the first couple of plots, we saw that maximum daily +inflammation exhibits a strange behavior and raises one unit a day. +Wouldn’t it be a good idea to detect such behavior and report it as +suspicious? Let’s do that! However, instead of checking every single day +of the study, let’s merely check if maximum inflammation in the +beginning (day 0) and in the middle (day 20) of the study are equal to +the corresponding day numbers.

+
+

PYTHON +

+
max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+
+

We also saw a different problem in the third dataset; the minima per +day were all zero (looks like a healthy person snuck into our study). We +can also check for this with an elif condition:

+
+

PYTHON +

+
elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+
+

And if neither of these conditions are true, we can use +else to give the all-clear:

+
+

PYTHON +

+
else:
+    print('Seems OK!')
+
+

Let’s test that out:

+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+
+
+

OUTPUT +

+
Suspicious looking maxima!
+
+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-03.csv', delimiter=',')
+
+max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+
+
+

OUTPUT +

+
Minima add up to zero!
+
+

In this way, we have asked Python to do something different depending +on the condition of our data. Here we printed messages in all cases, but +we could also imagine not using the else catch-all so that +messages are only printed when something is wrong, freeing us from +having to manually examine every plot for features we’ve seen +before.

+
+
+ +
+
+

How Many Paths? +

+
+

Consider this code:

+
+

PYTHON +

+
if 4 > 5:
+    print('A')
+elif 4 == 5:
+    print('B')
+elif 4 < 5:
+    print('C')
+
+

Which of the following would be printed if you were to run this code? +Why did you pick this answer?

+
    +
  1. A
  2. +
  3. B
  4. +
  5. C
  6. +
  7. B and C
  8. +
+
+
+
+
+
+ +
+
+

C gets printed because the first two conditions, +4 > 5 and 4 == 5, are not true, but +4 < 5 is true. In this case, only one of these +conditions can be true for at a time, but in other scenarios multiple +elif conditions could be met. In these scenarios, only the +action associated with the first true elif condition will +occur, starting from the top of the conditional section.

+
A flowchart diagram of a conditional section with multiple elif conditions and some > possible outcomes.

This contrasts with the case of multiple if statements, +where every action can occur as long as their condition is met.

+
A flowchart diagram of a conditional section with multiple if statements and some possible outcomes.
+
+
+
+
+
+
+ +
+
+

What Is Truth? +

+
+

True and False booleans are not the only +values in Python that are true and false. In fact, any value +can be used in an if or elif. After reading +and running the code below, explain what the rule is for which values +are considered true and which are > considered false.

+
+

PYTHON +

+
if '':
+    print('empty string is true')
+if 'word':
+    print('word is true')
+if []:
+    print('empty list is true')
+if [1, 2, 3]:
+    print('non-empty list is true')
+if 0:
+    print('zero is true')
+if 1:
+    print('one is true')
+
+
+
+
+
+
+ +
+
+

That’s Not Not What I Meant +

+
+

Sometimes it is useful to check whether some condition is +not true. The Boolean operator not can do this +explicitly. After reading and running the code below, write some +if statements that use not to test the rule +that you formulated in the previous challenge.

+
+

PYTHON +

+
if not '':
+    print('empty string is not true')
+if not 'word':
+    print('word is not true')
+if not not True:
+    print('not not True is true')
+
+
+
+
+
+
+ +
+
+

Close Enough +

+
+

Write some conditions that print True if the variable +a is within 10% of the variable b and +False otherwise. Compare your implementation with your +partner’s. Do you get the same answer for all possible pairs of +numbers?

+
+
+
+
+
+ +
+
+

There is a built-in +function abs that returns the absolute value of a +number:

+
+

PYTHON +

+
print(abs(-12))
+
+
+

OUTPUT +

+
12
+
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
a = 5
+b = 5.1
+
+if abs(a - b) <= 0.1 * abs(b):
+    print('True')
+else:
+    print('False')
+
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(abs(a - b) <= 0.1 * abs(b))
+
+

This works because the Booleans True and +False have string representations which can be printed.

+
+
+
+
+
+
+ +
+
+

In-Place Operators +

+
+

Python (and most other languages in the C family) provides in-place operators that +work like this:

+
+

PYTHON +

+
x = 1  # original value
+x += 1 # add one to x, assigning result back to x
+x *= 3 # multiply x by 3
+print(x)
+
+
+

OUTPUT +

+
6
+
+

Write some code that sums the positive and negative numbers in a list +separately, using in-place operators. Do you think the result is more or +less readable than writing the same without in-place operators?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
positive_sum = 0
+negative_sum = 0
+test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
+for num in test_list:
+    if num > 0:
+        positive_sum += num
+    elif num == 0:
+        pass
+    else:
+        negative_sum += num
+print(positive_sum, negative_sum)
+
+

Here pass means “don’t do anything”. In this particular +case, it’s not actually needed, since if num == 0 neither +sum needs to change, but it illustrates the use of elif and +pass.

+
+
+
+
+
+
+ +
+
+

Sorting a List Into Buckets +

+
+

In our data folder, large data sets are stored in files +whose names start with “inflammation-” and small data sets – in files +whose names start with “small-”. We also have some other files that we +do not care about at this point. We’d like to break all these files into +three lists called large_files, small_files, +and other_files, respectively.

+

Add code to the template below to do this. Note that the string +method startswith +returns True if and only if the string it is called on +starts with the string passed as an argument, that is:

+
+

PYTHON +

+
'String'.startswith('Str')
+
+
+

OUTPUT +

+
True
+
+

But

+
+

PYTHON +

+
'String'.startswith('str')
+
+
+

OUTPUT +

+
False
+
+

Use the following Python code as your starting point:

+
+

PYTHON +

+
filenames = ['inflammation-01.csv',
+         'myscript.py',
+         'inflammation-02.csv',
+         'small-01.csv',
+         'small-02.csv']
+large_files = []
+small_files = []
+other_files = []
+
+

Your solution should:

+
    +
  1. loop over the names of the files
  2. +
  3. figure out which group each filename belongs in
  4. +
  5. append the filename to that list
  6. +
+

In the end the three lists should be:

+
+

PYTHON +

+
large_files = ['inflammation-01.csv', 'inflammation-02.csv']
+small_files = ['small-01.csv', 'small-02.csv']
+other_files = ['myscript.py']
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
for filename in filenames:
+    if filename.startswith('inflammation-'):
+        large_files.append(filename)
+    elif filename.startswith('small-'):
+        small_files.append(filename)
+    else:
+        other_files.append(filename)
+
+print('large_files:', large_files)
+print('small_files:', small_files)
+print('other_files:', other_files)
+
+
+
+
+
+
+
+ +
+
+
    +
  1. Write a loop that counts the number of vowels in a character +string.
  2. +
  3. Test it on a few individual words and full sentences.
  4. +
  5. Once you are done, compare your solution to your neighbor’s. Did you +make the same decisions about how to handle the letter ‘y’ (which some +people think is a vowel, and some do not)?
  6. +
+
+

Solution +

+
vowels = 'aeiouAEIOU'
+sentence = 'Mary had a little lamb.'
+count = 0
+for char in sentence:
+   if char in vowels:
+       count += 1
+
+print('The number of vowels in this string is ' + str(count))
+

{.challenge}

+
+
+
+
+
+
+
+ +
+
+

Key Points +

+
+
    +
  • Use for variable in sequence to process the elements of +a sequence one at a time.
  • +
  • The body of a for loop must be indented.
  • +
  • Use len(thing) to determine the length of something +that contains other values.
  • +
  • Use if condition to start a conditional statement, +elif condition to provide additional tests, and +else to provide a default.
  • +
  • The bodies of the branches of conditional statements must be +indented.
  • +
  • Use == to test for equality.
  • +
  • +X and Y is only true if both X and +Y are true.
  • +
  • +X or Y is true if either X or +Y, or both, are true.
  • +
  • Zero, the empty string, and the empty list are considered false; all +other numbers, strings, and lists are considered true.
  • +
  • +True and False represent truth +values.
  • +
+
+
+
+
+

Content from Alternatives to Loops

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I vectorize my loops?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify what vectorized operations are
  • +
  • perform basic vectorized operations
  • +
+
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
    +
  • NULL
  • +
+
+
+

Content from Creating Functions

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What are functions, and how can I use them in Python?
  • +
  • How can I define new functions?
  • +
  • What’s the difference between defining and calling a function?
  • +
  • What happens when I call a function?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify what a function is
  • +
  • create new functions
  • +
  • Set default values for function parameters.
  • +
  • Explain why we should divide programs into small, single-purpose +functions.
  • +
+
+
+
+
+
+

At this point, we’ve seen that code can have Python make decisions +about what it sees in our data. What if we want to convert some of our +data, like taking a temperature in Fahrenheit and converting it to +Celsius. We could write something like this for converting a single +number

+
+

PYTHON +

+
fahrenheit_val = 99
+celsius_val = ((fahrenheit_val - 32) * (5/9))
+
+

and for a second number we could just copy the line and rename the +variables

+
+

PYTHON +

+
fahrenheit_val = 99
+celsius_val = ((fahrenheit_val - 32) * (5/9))
+
+fahrenheit_val2 = 43
+celsius_val2 = ((fahrenheit_val2 - 32) * (5/9))
+
+

But we would be in trouble as soon as we had to do this more than a +couple times. Cutting and pasting it is going to make our code get very +long and very repetitive, very quickly. We’d like a way to package our +code so that it is easier to reuse, a shorthand way of re-executing +longer pieces of code. In Python we can use ‘functions’. Let’s start by +defining a function fahr_to_celsius that converts +temperatures from Fahrenheit to Celsius:

+
+

PYTHON +

+
def explicit_fahr_to_celsius(temp):
+    # Assign the converted value to a variable
+    converted = ((temp - 32) * (5/9))
+    # Return the value of the new variable
+    return converted
+    
+def fahr_to_celsius(temp):
+    # Return converted value more efficiently using the return
+    # function without creating a new variable. This code does
+    # the same thing as the previous function but it is more explicit
+    # in explaining how the return command works.
+    return ((temp - 32) * (5/9))
+
+
Labeled parts of a Python function definition

The function definition opens with the keyword def +followed by the name of the function (fahr_to_celsius) and +a parenthesized list of parameter names (temp). The body of the function — the statements +that are executed when it runs — is indented below the definition line. +The body concludes with a return keyword followed by the +return value.

+

When we call the function, the values we pass to it are assigned to +those variables so that we can use them inside the function. Inside the +function, we use a return +statement to send a result back to whoever asked for it.

+

Let’s try running our function.

+
+

PYTHON +

+
fahr_to_celsius(32)
+
+

This command should call our function, using “32” as the input and +return the function value.

+

In fact, calling our own function is no different from calling any +other function:

+
+

PYTHON +

+
print('freezing point of water:', fahr_to_celsius(32), 'C')
+print('boiling point of water:', fahr_to_celsius(212), 'C')
+
+
+

OUTPUT +

+
freezing point of water: 0.0 C
+boiling point of water: 100.0 C
+
+

We’ve successfully called the function that we defined, and we have +access to the value that we returned.

+

Composing Functions + +

+
+

Now that we’ve seen how to turn Fahrenheit into Celsius, we can also +write the function to turn Celsius into Kelvin:

+
+

PYTHON +

+
def celsius_to_kelvin(temp_c):
+    return temp_c + 273.15
+
+print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))
+
+
+

OUTPUT +

+
freezing point of water in Kelvin: 273.15
+
+

What about converting Fahrenheit to Kelvin? We could write out the +formula, but we don’t need to. Instead, we can compose the two functions we have +already created:

+
+

PYTHON +

+
def fahr_to_kelvin(temp_f):
+    temp_c = fahr_to_celsius(temp_f)
+    temp_k = celsius_to_kelvin(temp_c)
+    return temp_k
+
+print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))
+
+
+

OUTPUT +

+
boiling point of water in Kelvin: 373.15
+
+

This is our first taste of how larger programs are built: we define +basic operations, then combine them in ever-larger chunks to get the +effect we want. Real-life functions will usually be larger than the ones +shown here — typically half a dozen to a few dozen lines — but they +shouldn’t ever be much longer than that, or the next person who reads it +won’t be able to understand what’s going on.

+

Variable Scope + +

+
+

In composing our temperature conversion functions, we created +variables inside of those functions, temp, +temp_c, temp_f, and temp_k. We +refer to these variables as local variables because they no +longer exist once the function is done executing. If we try to access +their values outside of the function, we will encounter an error:

+
+

PYTHON +

+
print('Again, temperature in Kelvin was:', temp_k)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-1-eed2471d229b> in <module>
+----> 1 print('Again, temperature in Kelvin was:', temp_k)
+
+NameError: name 'temp_k' is not defined
+
+

If you want to reuse the temperature in Kelvin after you have +calculated it with fahr_to_kelvin, you can store the result +of the function call in a variable:

+
+

PYTHON +

+
temp_kelvin = fahr_to_kelvin(212.0)
+print('temperature in Kelvin was:', temp_kelvin)
+
+
+

OUTPUT +

+
temperature in Kelvin was: 373.15
+
+

The variable temp_kelvin, being defined outside any +function, is said to be global.

+

Inside a function, one can read the value of such global +variables:

+
+

PYTHON +

+
def print_temperatures():
+  print('temperature in Fahrenheit was:', temp_fahr)
+  print('temperature in Kelvin was:', temp_kelvin)
+
+temp_fahr = 212.0
+temp_kelvin = fahr_to_kelvin(temp_fahr)
+
+print_temperatures()
+
+
+

OUTPUT +

+
temperature in Fahrenheit was: 212.0
+temperature in Kelvin was: 373.15
+
+

By giving our functions human-readable names, we can more easily read +and understand what is happening in the for loop. Even +better, if at some later date we want to use either of those pieces of +code again, we can do so in a single line.

+

Testing and Documenting + +

+
+

Once we start putting things in functions so that we can re-use them, +we need to start testing that those functions are working correctly. To +see how to do this, let’s write a function to offset a dataset so that +it’s mean value shifts to a user-defined value:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    return (data - numpy.mean(data)) + target_mean_value
+
+

We could test this on our actual data, but since we don’t know what +the values ought to be, it will be hard to tell if the result was +correct. Instead, let’s use NumPy to create a matrix of 0’s and then +offset its values to have a mean value of 3:

+
+

PYTHON +

+
z = numpy.zeros((2,2))
+print(offset_mean(z, 3))
+
+
+

OUTPUT +

+
[[ 3.  3.]
+ [ 3.  3.]]
+
+

That looks right, so let’s try offset_mean on our real +data:

+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print(offset_mean(data, 0))
+
+
+

OUTPUT +

+
[[-6.14875 -6.14875 -5.14875 ... -3.14875 -6.14875 -6.14875]
+ [-6.14875 -5.14875 -4.14875 ... -5.14875 -6.14875 -5.14875]
+ [-6.14875 -5.14875 -5.14875 ... -4.14875 -5.14875 -5.14875]
+ ...
+ [-6.14875 -5.14875 -5.14875 ... -5.14875 -5.14875 -5.14875]
+ [-6.14875 -6.14875 -6.14875 ... -6.14875 -4.14875 -6.14875]
+ [-6.14875 -6.14875 -5.14875 ... -5.14875 -5.14875 -6.14875]]
+
+

It’s hard to tell from the default output whether the result is +correct, but there are a few tests that we can run to reassure us:

+
+

PYTHON +

+
print('original min, mean, and max are:', numpy.amin(data), numpy.mean(data), numpy.amax(data))
+offset_data = offset_mean(data, 0)
+print('min, mean, and max of offset data are:',
+      numpy.amin(offset_data),
+      numpy.mean(offset_data),
+      numpy.amax(offset_data))
+
+
+

OUTPUT +

+
original min, mean, and max are: 0.0 6.14875 20.0
+min, mean, and and max of offset data are: -6.14875 2.84217094304e-16 13.85125
+
+

That seems almost right: the original mean was about 6.1, so the +lower bound from zero is now about -6.1. The mean of the offset data +isn’t quite zero — we’ll explore why not in the challenges — but it’s +pretty close. We can even go further and check that the standard +deviation hasn’t changed:

+
+

PYTHON +

+
print('std dev before and after:', numpy.std(data), numpy.std(offset_data))
+
+
+

OUTPUT +

+
std dev before and after: 4.61383319712 4.61383319712
+
+

Those values look the same, but we probably wouldn’t notice if they +were different in the sixth decimal place. Let’s do this instead:

+
+

PYTHON +

+
print('difference in standard deviations before and after:',
+      numpy.std(data) - numpy.std(offset_data))
+
+
+

OUTPUT +

+
difference in standard deviations before and after: -3.5527136788e-15
+
+

Again, the difference is very small. It’s still possible that our +function is wrong, but it seems unlikely enough that we should probably +get back to doing our analysis.

+

Documentation + +

+
+

We have one more task first, though: we should write some documentation for our function +to remind ourselves later what it’s for and how to use it.

+

The usual way to put documentation in software is to add comments like this:

+
+

PYTHON +

+
# offset_mean(data, target_mean_value):
+# return a new array containing the original data with its mean offset to match the desired value.
+def offset_mean(data, target_mean_value):
+    return (data - numpy.mean(data)) + target_mean_value
+
+

There’s a better way, though. If the first thing in a function is a +string that isn’t assigned to a variable, that string is attached to the +function as its documentation:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value."""
+    return (data - numpy.mean(data)) + target_mean_value
+
+

This is better because we can now ask Python’s built-in help system +to show us the documentation for the function:

+
+

PYTHON +

+
help(offset_mean)
+
+
+

OUTPUT +

+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+    Return a new array containing the original data with its mean offset to match the desired value.
+
+

A string like this is called a docstring. We don’t need to use +triple quotes when we write one, but if we do, we can break the string +across multiple lines:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value.
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3], 0)
+    array([-1.,  0.,  1.])
+    """
+    return (data - numpy.mean(data)) + target_mean_value
+
+help(offset_mean)
+
+
+

OUTPUT +

+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+    Return a new array containing the original data
+       with its mean offset to match the desired value.
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3], 0)
+    array([-1.,  0.,  1.])
+
+

Defining Defaults + +

+
+

We have passed parameters to functions in two ways: directly, as in +type(data), and by name, as in +numpy.loadtxt(fname='something.csv', delimiter=','). In +fact, we can pass the filename to loadtxt without the +fname=:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', delimiter=',')
+
+
+

OUTPUT +

+
array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+
+

but we still need to say delimiter=:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', ',')
+
+
+

ERROR +

+
Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1041, in loa
+dtxt
+    dtype = np.dtype(dtype)
+  File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/core/_internal.py", line 199, in
+_commastring
+    newitem = (dtype, eval(repeats))
+  File "<string>", line 1
+    ,
+    ^
+SyntaxError: unexpected EOF while parsing
+
+

To understand what’s going on, and make our own functions easier to +use, let’s re-define our offset_mean function like +this:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value=0.0):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value, (0 by default).
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3])
+    array([-1.,  0.,  1.])
+    """
+    return (data - numpy.mean(data)) + target_mean_value
+
+

The key change is that the second parameter is now written +target_mean_value=0.0 instead of just +target_mean_value. If we call the function with two +arguments, it works as it did before:

+
+

PYTHON +

+
test_data = numpy.zeros((2, 2))
+print(offset_mean(test_data, 3))
+
+
+

OUTPUT +

+
[[ 3.  3.]
+ [ 3.  3.]]
+
+

But we can also now call it with just one parameter, in which case +target_mean_value is automatically assigned the default value of 0.0:

+
+

PYTHON +

+
more_data = 5 + numpy.zeros((2, 2))
+print('data before mean offset:')
+print(more_data)
+print('offset data:')
+print(offset_mean(more_data))
+
+
+

OUTPUT +

+
data before mean offset:
+[[ 5.  5.]
+ [ 5.  5.]]
+offset data:
+[[ 0.  0.]
+ [ 0.  0.]]
+
+

This is handy: if we usually want a function to work one way, but +occasionally need it to do something else, we can allow people to pass a +parameter when they need to but provide a default to make the normal +case easier. The example below shows how Python matches values to +parameters:

+
+

PYTHON +

+
def display(a=1, b=2, c=3):
+    print('a:', a, 'b:', b, 'c:', c)
+
+print('no parameters:')
+display()
+print('one parameter:')
+display(55)
+print('two parameters:')
+display(55, 66)
+
+
+

OUTPUT +

+
no parameters:
+a: 1 b: 2 c: 3
+one parameter:
+a: 55 b: 2 c: 3
+two parameters:
+a: 55 b: 66 c: 3
+
+

As this example shows, parameters are matched up from left to right, +and any that haven’t been given a value explicitly get their default +value. We can override this behavior by naming the value as we pass it +in:

+
+

PYTHON +

+
print('only setting the value of c')
+display(c=77)
+
+
+

OUTPUT +

+
only setting the value of c
+a: 1 b: 2 c: 77
+
+

With that in hand, let’s look at the help for +numpy.loadtxt:

+
+

PYTHON +

+
help(numpy.loadtxt)
+
+
+

OUTPUT +

+
Help on function loadtxt in module numpy.lib.npyio:
+
+loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+    Load data from a text file.
+
+    Each row in the text file must have the same number of values.
+
+    Parameters
+    ----------
+...
+
+

There’s a lot of information here, but the most important part is the +first couple of lines:

+
+

OUTPUT +

+
loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+
+

This tells us that loadtxt has one parameter called +fname that doesn’t have a default value, and eight others +that do. If we call the function like this:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', ',')
+
+

then the filename is assigned to fname (which is what we +want), but the delimiter string ',' is assigned to +dtype rather than delimiter, because +dtype is the second parameter in the list. However +',' isn’t a known dtype so our code produced +an error message when we tried to run it. When we call +loadtxt we don’t have to provide fname= for +the filename because it’s the first item in the list, but if we want the +',' to be assigned to the variable delimiter, +we do have to provide delimiter= for the second +parameter since delimiter is not the second parameter in +the list.

+

Readable functions + +

+
+

Consider these two functions:

+
+

PYTHON +

+
def s(p):
+    a = 0
+    for v in p:
+        a += v
+    m = a / len(p)
+    d = 0
+    for v in p:
+        d += (v - m) * (v - m)
+    return numpy.sqrt(d / (len(p) - 1))
+
+def std_dev(sample):
+    sample_sum = 0
+    for value in sample:
+        sample_sum += value
+
+    sample_mean = sample_sum / len(sample)
+
+    sum_squared_devs = 0
+    for value in sample:
+        sum_squared_devs += (value - sample_mean) * (value - sample_mean)
+
+    return numpy.sqrt(sum_squared_devs / (len(sample) - 1))
+
+

The functions s and std_dev are +computationally equivalent (they both calculate the sample standard +deviation), but to a human reader, they look very different. You +probably found std_dev much easier to read and understand +than s.

+

As this example illustrates, both documentation and a programmer’s +coding style combine to determine how easy it is for others to +read and understand the programmer’s code. Choosing meaningful variable +names and using blank spaces to break the code into logical “chunks” are +helpful techniques for producing readable code. This is useful +not only for sharing code with others, but also for the original +programmer. If you need to revisit code that you wrote months ago and +haven’t thought about since then, you will appreciate the value of +readable code!

+
+
+ +
+
+

Combining Strings +

+
+

“Adding” two strings produces their concatenation: +'a' + 'b' is 'ab'. Write a function called +fence that takes two parameters called +original and wrapper and returns a new string +that has the wrapper character at the beginning and end of the original. +A call to your function should look like this:

+
+

PYTHON +

+
print(fence('name', '*'))
+
+
+

OUTPUT +

+
*name*
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def fence(original, wrapper):
+    return wrapper + original + wrapper
+
+
+
+
+
+
+
+ +
+
+

Return versus print +

+
+

Note that return and print are not +interchangeable. print is a Python function that +prints data to the screen. It enables us, users, see +the data. return statement, on the other hand, makes data +visible to the program. Let’s have a look at the following function:

+
+

PYTHON +

+
def add(a, b):
+    print(a + b)
+
+

Question: What will we see if we execute the +following commands?

+
+

PYTHON +

+
A = add(7, 3)
+print(A)
+
+
+
+
+
+
+ +
+
+

Python will first execute the function add with +a = 7 and b = 3, and, therefore, print +10. However, because function add does not +have a line that starts with return (no return +“statement”), it will, by default, return nothing which, in Python +world, is called None. Therefore, A will be +assigned to None and the last line (print(A)) +will print None. As a result, we will see:

+
+

OUTPUT +

+
10
+None
+
+
+
+
+
+
+
+ +
+
+

Selecting Characters From Strings +

+
+

If the variable s refers to a string, then +s[0] is the string’s first character and s[-1] +is its last. Write a function called outer that returns a +string made up of just the first and last characters of its input. A +call to your function should look like this:

+
+

PYTHON +

+
print(outer('helium'))
+
+
+

OUTPUT +

+
hm
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def outer(input_string):
+    return input_string[0] + input_string[-1]
+
+
+
+
+
+
+
+ +
+
+

Rescaling an Array +

+
+

Write a function rescale that takes an array as input +and returns a corresponding array of values scaled to lie in the range +0.0 to 1.0. (Hint: If L and H are the lowest +and highest values in the original array, then the replacement for a +value v should be (v-L) / (H-L).)

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def rescale(input_array):
+    L = numpy.amin(input_array)
+    H = numpy.amax(input_array)
+    output_array = (input_array - L) / (H - L)
+    return output_array
+
+
+
+
+
+
+
+ +
+
+

Testing and Documenting Your Function +

+
+

Run the commands help(numpy.arange) and +help(numpy.linspace) to see how to use these functions to +generate regularly-spaced values, then use those values to test your +rescale function. Once you’ve successfully tested your +function, add a docstring that explains what it does.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
"""Takes an array as input, and returns a corresponding array scaled so
+that 0 corresponds to the minimum and 1 to the maximum value of the input array.
+
+Examples:
+>>> rescale(numpy.arange(10.0))
+array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
+       0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])
+>>> rescale(numpy.linspace(0, 100, 5))
+array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
+"""
+
+
+
+
+
+
+
+ +
+
+

Defining Defaults +

+
+

Rewrite the rescale function so that it scales data to +lie between 0.0 and 1.0 by default, but will +allow the caller to specify lower and upper bounds if they want. Compare +your implementation to your neighbor’s: do the two functions always +behave the same way?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def rescale(input_array, low_val=0.0, high_val=1.0):
+    """rescales input array values to lie between low_val and high_val"""
+    L = numpy.amin(input_array)
+    H = numpy.amax(input_array)
+    intermed_array = (input_array - L) / (H - L)
+    output_array = intermed_array * (high_val - low_val) + low_val
+    return output_array
+
+
+
+
+
+
+
+ +
+
+

Variables Inside and Outside Functions +

+
+

What does the following piece of code display when run — and why?

+
+

PYTHON +

+
f = 0
+k = 0
+
+def f2k(f):
+    k = ((f - 32) * (5.0 / 9.0)) + 273.15
+    return k
+
+print(f2k(8))
+print(f2k(41))
+print(f2k(32))
+
+print(k)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
259.81666666666666
+278.15
+273.15
+0
+
+

k is 0 because the k inside the function +f2k doesn’t know about the k defined outside +the function. When the f2k function is called, it creates a +local variable +k. The function does not return any values and does not +alter k outside of its local copy. Therefore the original +value of k remains unchanged. Beware that a local +k is created because f2k internal statements +affect a new value to it. If k was only +read, it would simply retrieve the global k +value.

+
+
+
+
+
+
+ +
+
+

Mixing Default and Non-Default Parameters +

+
+

Given the following code:

+
+

PYTHON +

+
def numbers(one, two=2, three, four=4):
+    n = str(one) + str(two) + str(three) + str(four)
+    return n
+
+print(numbers(1, three=3))
+
+

what do you expect will be printed? What is actually printed? What +rule do you think Python is following?

+
    +
  1. 1234
  2. +
  3. one2three4
  4. +
  5. 1239
  6. +
  7. SyntaxError
  8. +
+

Given that, what does the following piece of code display when +run?

+
+

PYTHON +

+
def func(a, b=3, c=6):
+    print('a: ', a, 'b: ', b, 'c:', c)
+
+func(-1, 2)
+
+
    +
  1. a: b: 3 c: 6
  2. +
  3. a: -1 b: 3 c: 6
  4. +
  5. a: -1 b: 2 c: 6
  6. +
  7. a: b: -1 c: 2
  8. +
+
+
+
+
+
+ +
+
+

Attempting to define the numbers function results in +4. SyntaxError. The defined parameters two and +four are given default values. Because one and +three are not given default values, they are required to be +included as arguments when the function is called and must be placed +before any parameters that have default values in the function +definition.

+

The given call to func displays +a: -1 b: 2 c: 6. -1 is assigned to the first parameter +a, 2 is assigned to the next parameter b, and +c is not passed a value, so it uses its default value +6.

+
+
+
+
+
+
+ +
+
+

Readable Code +

+
+

Revise a function you wrote for one of the previous exercises to try +to make the code more readable. Then, collaborate with one of your +neighbors to critique each other’s functions and discuss how your +function implementations could be further improved to make them more +readable.

+
+
+
+
+
+ +
+
+

Key Points +

+
+
    +
  • Define a function using +def function_name(parameter).
  • +
  • The body of a function must be indented.
  • +
  • Call a function using function_name(value).
  • +
  • Numbers are stored as integers or floating-point numbers.
  • +
  • Variables defined within a function can only be seen and used within +the body of the function.
  • +
  • Variables created outside of any function are called global +variables.
  • +
  • Within a function, we can access global variables.
  • +
  • Variables created within a function override global variables if +their names match.
  • +
  • Use help(thing) to view help for something.
  • +
  • Put docstrings in functions to provide help for that function.
  • +
  • Specify default values for parameters when defining a function using +name=value in the parameter list.
  • +
  • Parameters can be passed by matching based on name, by position, or +by omitting them (in which case the default value is used).
  • +
  • Put code whose parameters change frequently in a function, then call +it with different parameter values to customize its behavior.
  • +
+
+
+
+

Content from Data Analysis

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I process tabular data files in Python?
  • +
  • How can I do the same operations on many different files?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • read in data files to Python
  • +
  • perform common operations on tabular data
  • +
  • write code to perform the same operation on multiple files
  • +
+
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
    +
  • NULL
  • +
+
+
+

Content from Visualizations

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I visualize tabular data in Python?
  • +
  • How can I group several plots together?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • create graphs and other visualizations using tabular data
  • +
  • group plots together to make comparative visualizations
  • +
+
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
    +
  • NULL
  • +
+
+
+

Content from Errors and Exceptions

+
+

Last updated on 2024-07-11 | + + Edit this page

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How does Python report errors?
  • +
  • How can I handle errors in Python programs?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify different errors and correct bugs associated with them
  • +
+
+
+
+
+
+

Every programmer encounters errors, both those who are just +beginning, and those who have been programming for years. Encountering +errors and exceptions can be very frustrating at times, and can make +coding feel like a hopeless endeavour. However, understanding what the +different types of errors are and when you are likely to encounter them +can help a lot. Once you know why you get certain types of +errors, they become much easier to fix.

+

Errors in Python have a very specific form, called a traceback. Let’s examine one:

+
+

PYTHON +

+
# This code has an intentional error. You can type it directly or
+# use it for reference to understand the error message below.
+def favorite_ice_cream():
+    ice_creams = [
+        'chocolate',
+        'vanilla',
+        'strawberry'
+    ]
+    print(ice_creams[3])
+
+favorite_ice_cream()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-1-70bd89baa4df> in <module>()
+      9     print(ice_creams[3])
+      10
+----> 11 favorite_ice_cream()
+
+<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
+      7         'strawberry'
+      8     ]
+----> 9     print(ice_creams[3])
+      10
+      11 favorite_ice_cream()
+
+IndexError: list index out of range
+
+

This particular traceback has two levels. You can determine the +number of levels by looking for the number of arrows on the left hand +side. In this case:

+
    +
  1. The first shows code from the cell above, with an arrow pointing +to Line 11 (which is favorite_ice_cream()).

  2. +
  3. The second shows some code in the function +favorite_ice_cream, with an arrow pointing to Line 9 (which +is print(ice_creams[3])).

  4. +
+

The last level is the actual place where the error occurred. The +other level(s) show what function the program executed to get to the +next level down. So, in this case, the program first performed a function call to the function +favorite_ice_cream. Inside this function, the program +encountered an error on Line 6, when it tried to run the code +print(ice_creams[3]).

+
+
+ +
+
+

Long Tracebacks +

+
+

Sometimes, you might see a traceback that is very long -- sometimes +they might even be 20 levels deep! This can make it seem like something +horrible happened, but the length of the error message does not reflect +severity, rather, it indicates that your program called many functions +before it encountered the error. Most of the time, the actual place +where the error occurred is at the bottom-most level, so you can skip +down the traceback to the bottom.

+
+
+
+

So what error did the program actually encounter? In the last line of +the traceback, Python helpfully tells us the category or type of error +(in this case, it is an IndexError) and a more detailed +error message (in this case, it says “list index out of range”).

+

If you encounter an error and don’t know what it means, it is still +important to read the traceback closely. That way, if you fix the error, +but encounter a new one, you can tell that the error changed. +Additionally, sometimes knowing where the error occurred is +enough to fix it, even if you don’t entirely understand the message.

+

If you do encounter an error you don’t recognize, try looking at the +official +documentation on errors. However, note that you may not always be +able to find the error there, as it is possible to create custom errors. +In that case, hopefully the custom error message is informative enough +to help you figure out what went wrong. Libraries like pandas and numpy +have these custom errors, but the procedure to figure them out is the +same: go to the earliest line in the error, and look at the error +message for it. The documentation for these libraries will often provide +the information you need about any functions you are using. There are +also large communities of users for data libraries that can help as +well!

+
+
+ +
+
+

Reading Error Messages +

+
+

Read the Python code and the resulting traceback below, and answer +the following questions:

+
    +
  1. How many levels does the traceback have?
  2. +
  3. What is the function name where the error occurred?
  4. +
  5. On which line number in this function did the error occur?
  6. +
  7. What is the type of error?
  8. +
  9. What is the error message?
  10. +
+
+

PYTHON +

+
# This code has an intentional error. Do not type it directly;
+# use it for reference to understand the error message below.
+def print_message(day):
+    messages = [
+        'Hello, world!',
+        'Today is Tuesday!',
+        'It is the middle of the week.',
+        'Today is Donnerstag in German!',
+        'Last day of the week!',
+        'Hooray for the weekend!',
+        'Aw, the weekend is almost over.'
+    ]
+    print(messages[day])
+
+def print_sunday_message():
+    print_message(7)
+
+print_sunday_message()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-7-3ad455d81842> in <module>
+     16     print_message(7)
+     17
+---> 18 print_sunday_message()
+     19
+
+<ipython-input-7-3ad455d81842> in print_sunday_message()
+     14
+     15 def print_sunday_message():
+---> 16     print_message(7)
+     17
+     18 print_sunday_message()
+
+<ipython-input-7-3ad455d81842> in print_message(day)
+     11         'Aw, the weekend is almost over.'
+     12     ]
+---> 13     print(messages[day])
+     14
+     15 def print_sunday_message():
+
+IndexError: list index out of range
+
+
+
+
+
+
+ +
+
+
    +
  1. 3 levels
  2. +
  3. print_message
  4. +
  5. 13
  6. +
  7. IndexError
  8. +
  9. +list index out of range You can then infer that +7 is not the right index to use with +messages.
  10. +
+
+
+
+
+
+
+ +
+
+

Better errors on newer Pythons +

+
+

Newer versions of Python have improved error printouts. If you are +debugging errors, it is often helpful to use the latest Python version, +even if you support older versions of Python.

+
+
+
+

Type Errors + +

+
+

One of the most common types of errors in Python are called type +errors. These errors occur when you try to perform an operation on +an object in python that cannot support it. This happens easily when +working with large datasets where there are expected value types like +either strings or integers. When we write a function expecting integers, +we will not get an error until we encounter an operation that cannot +handle strings. For example:

+
+

PYTHON +

+

+def our_function()
+  my_string="Hello World"
+  letter=my_string["e""]
+
+
+

ERROR +

+
  File "<ipython-input-3-6bb841ea1423>", line 3
+    letter=my_string["e"]
+                       ^
+TypeError: string indices must be integers
+
+

We get this error because we are trying to use an index to access +part of our string, which requires an integer. Instead, we entered a +character and received a type error. This is fixed by replacing “e” with +2.

+

In the case of datasets, we often see type errors when a mathematical +operation, such as taking a mean, is performed on a column that contains +characters, either as a result of formatting or introduced through +error. As a result, correcting the error can involve simply removing the +characters from the strings using regular expressions, or if the +characters have resulted in incorrect data, removing those observations +from the dataset.

+

Syntax Errors + +

+
+

When you forget a colon at the end of a line, accidentally add one +space too many when indenting under an if statement, or +forget a parenthesis, you will encounter a syntax error. This means that +Python couldn’t figure out how to read your program. This is similar to +forgetting punctuation in English: for example, this text is difficult +to read there is no punctuation there is also no capitalization why is +this hard because you have to figure out where each sentence ends you +also have to figure out where each sentence begins to some extent it +might be ambiguous if there should be a sentence break or not

+

People can typically figure out what is meant by text with no +punctuation, but people are much smarter than computers. If Python +doesn’t know how to read the program, it will give up and inform you +with an error. For example:

+
+

PYTHON +

+
def some_function()
+    msg = 'hello, world!'
+    print(msg)
+     return msg
+
+
+

ERROR +

+
  File "<ipython-input-3-6bb841ea1423>", line 1
+    def some_function()
+                       ^
+SyntaxError: invalid syntax
+
+

Here, Python tells us that there is a SyntaxError on +line 1, and even puts a little arrow in the place where there is an +issue. In this case the problem is that the function definition is +missing a colon at the end.

+

Actually, the function above has two issues with syntax. If +we fix the problem with the colon, we see that there is also an +IndentationError, which means that the lines in the +function definition do not all have the same indentation:

+
+

PYTHON +

+
def some_function():
+    msg = 'hello, world!'
+    print(msg)
+     return msg
+
+
+

ERROR +

+
  File "<ipython-input-4-ae290e7659cb>", line 4
+    return msg
+    ^
+IndentationError: unexpected indent
+
+

Both SyntaxError and IndentationError +indicate a problem with the syntax of your program, but an +IndentationError is more specific: it always means +that there is a problem with how your code is indented.

+
+
+ +
+
+

Tabs and Spaces +

+
+

Some indentation errors are harder to spot than others. In +particular, mixing spaces and tabs can be difficult to spot because they +are both whitespace. In the +example below, the first two lines in the body of the function +some_function are indented with tabs, while the third line +— with spaces. If you’re working in a Jupyter notebook, be sure to copy +and paste this example rather than trying to type it in manually because +Jupyter automatically replaces tabs with spaces.

+
+

PYTHON +

+
def some_function():
+	msg = 'hello, world!'
+	print(msg)
+        return msg
+
+

Visually it is impossible to spot the error. Fortunately, Python does +not allow you to mix tabs and spaces.

+
+

ERROR +

+
  File "<ipython-input-5-653b36fbcd41>", line 4
+    return msg
+              ^
+TabError: inconsistent use of tabs and spaces in indentation
+
+
+
+
+

Variable Name Errors + +

+
+

Another very common type of error is called a NameError, +and occurs when you try to use a variable that does not exist. For +example:

+
+

PYTHON +

+
print(a)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-7-9d7b17ad5387> in <module>()
+----> 1 print(a)
+
+NameError: name 'a' is not defined
+
+

Variable name errors come with some of the most informative error +messages, which are usually of the form “name ‘the_variable_name’ is not +defined”.

+

Why does this error message occur? That’s a harder question to +answer, because it depends on what your code is supposed to do. However, +there are a few very common reasons why you might have an undefined +variable. The first is that you meant to use a string, but forgot to put quotes around +it:

+
+

PYTHON +

+
print(hello)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-8-9553ee03b645> in <module>()
+----> 1 print(hello)
+
+NameError: name 'hello' is not defined
+
+

The second reason is that you might be trying to use a variable that +does not yet exist. In the following example, count should +have been defined (e.g., with count = 0) before the for +loop:

+
+

PYTHON +

+
for number in range(10):
+    count = count + number
+print('The count is:', count)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-9-dd6a12d7ca5c> in <module>()
+      1 for number in range(10):
+----> 2     count = count + number
+      3 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+

Finally, the third possibility is that you made a typo when you were +writing your code. Let’s say we fixed the error above by adding the line +Count = 0 before the for loop. Frustratingly, this actually +does not fix the error. Remember that variables are case-sensitive, so the variable +count is different from Count. We still get +the same error, because we still have not defined +count:

+
+

PYTHON +

+
Count = 0
+for number in range(10):
+    count = count + number
+print('The count is:', count)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-10-d77d40059aea> in <module>()
+      1 Count = 0
+      2 for number in range(10):
+----> 3     count = count + number
+      4 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+

Index Errors + +

+
+

Next up are errors having to do with containers (like lists and +strings) and the items within them. If you try to access an item in a +list or a string that does not exist, then you will get an error. This +makes sense: if you asked someone what day they would like to get +coffee, and they answered “caturday”, you might be a bit annoyed. Python +gets similarly annoyed if you try to ask it for an item that doesn’t +exist:

+
+

PYTHON +

+
letters = ['a', 'b', 'c']
+print('Letter #1 is', letters[0])
+print('Letter #2 is', letters[1])
+print('Letter #3 is', letters[2])
+print('Letter #4 is', letters[3])
+
+
+

OUTPUT +

+
Letter #1 is a
+Letter #2 is b
+Letter #3 is c
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-11-d817f55b7d6c> in <module>()
+      3 print('Letter #2 is', letters[1])
+      4 print('Letter #3 is', letters[2])
+----> 5 print('Letter #4 is', letters[3])
+
+IndexError: list index out of range
+
+

Here, Python is telling us that there is an IndexError +in our code, meaning we tried to access a list index that did not +exist.

+

File Errors + +

+
+

The last type of error we’ll cover today are the most common type of +error when using Python with data, those associated with reading and +writing files: FileNotFoundError. If you try to read a file +that does not exist, you will receive a FileNotFoundError +telling you so. If you attempt to write to a file that was opened +read-only, Python 3 returns an UnsupportedOperationError. +More generally, problems with input and output manifest as +OSErrors, which may show up as a more specific subclass; +you can see the +list in the Python docs. They all have a unique UNIX +errno, which is you can see in the error message.

+
+

PYTHON +

+
file_handle = open('myfile.txt', 'r')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+FileNotFoundError                         Traceback (most recent call last)
+<ipython-input-14-f6e1ac4aee96> in <module>()
+----> 1 file_handle = open('myfile.txt', 'r')
+
+FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
+
+

One reason for receiving this error is that you specified an +incorrect path to the file. For example, if I am currently in a folder +called myproject, and I have a file in +myproject/writing/myfile.txt, but I try to open +myfile.txt, this will fail. The correct path would be +writing/myfile.txt. It is also possible that the file name +or its path contains a typo. There may also be specific settings based +on your organization if you are using shared, networked, or cloud-based +drives. It is best to check with your IT administrators if you are still +encountering issues reading in a file after troubleshooting.

+

A related issue can occur if you use the “read” flag instead of the +“write” flag. Python will not give you an error if you try to open a +file for writing when the file does not exist. However, if you meant to +open a file for reading, but accidentally opened it for writing, and +then try to read from it, you will get an +UnsupportedOperation error telling you that the file was +not opened for reading:

+
+

PYTHON +

+
file_handle = open('myfile.txt', 'w')
+file_handle.read()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+UnsupportedOperation                      Traceback (most recent call last)
+<ipython-input-15-b846479bc61f> in <module>()
+      1 file_handle = open('myfile.txt', 'w')
+----> 2 file_handle.read()
+
+UnsupportedOperation: not readable
+
+

If you are getting a read or write error on file or folder that you +are able to open and/or edit with other programs, you may need to +contact an IT administrator to check the permissions granted to you and +any programs you are using.

+

These are the most common errors with files, though many others +exist. If you get an error that you’ve never seen before, searching the +Internet for that error type often reveals common reasons why you might +get that error.

+
+
+ +
+
+

Identifying Syntax Errors +

+
+
    +
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. Is it a +SyntaxError or an IndentationError?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+
+

PYTHON +

+
def another_function
+  print('Syntax errors are annoying.')
+   print('But at least Python tells us about them!')
+  print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+ +
+
+

SyntaxError for missing (): at end of first +line, IndentationError for mismatch between second and +third lines. A fixed version is:

+
+

PYTHON +

+
def another_function():
+    print('Syntax errors are annoying.')
+    print('But at least Python tells us about them!')
+    print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+ +
+
+

Identifying Variable Name Errors +

+
+
    +
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. What type of +NameError do you think this is? In other words, is it a +string with no quotes, a misspelled variable, or a variable that should +have been defined but was not?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+
+

PYTHON +

+
for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (Number % 3) == 0:
+        message = message + a
+    else:
+        message = message + 'b'
+print(message)
+
+
+
+
+
+
+ +
+
+

3 NameErrors for number being misspelled, +for message not defined, and for a not being +in quotes.

+

Fixed version:

+
+

PYTHON +

+
message = ''
+for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (number % 3) == 0:
+        message = message + 'a'
+    else:
+        message = message + 'b'
+print(message)
+
+
+
+
+
+
+
+ +
+
+

Identifying Index Errors +

+
+
    +
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
  5. Fix the error.
  6. +
+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+ +
+
+

IndexError; the last entry is seasons[3], +so seasons[4] doesn’t make sense. A fixed version is:

+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[-1])
+
+
+
+
+
+

A Final Note About Correcting Errors + +

+
+

There are a lot of very helpful answers for many error messages, +however when working with official statistics, we need to also exercise +some caution. Be aware and be wary of any answers that ask you to +download a package from someone’s personal GitHub repository or other +file sharing service. Try to find the type of error first and understand +what the issue is before downloading anything claiming to fix the error. +If the error is the result of an issue with a version of a package, +check if there are any security vulnerabilities with that version, and +use a package manager to move between package versions.

+
+
+ +
+
+

Key Points +

+
+
    +
  • NULL
  • +
+
+
+
+
+
+
+
+ + +
+ + +
+
+ +
Back To Top +
+
+ + + + diff --git a/android-chrome-192x192.png b/android-chrome-192x192.png new file mode 100644 index 0000000..ed3c210 Binary files /dev/null and b/android-chrome-192x192.png differ diff --git a/android-chrome-512x512.png b/android-chrome-512x512.png new file mode 100644 index 0000000..c88d96c Binary files /dev/null and b/android-chrome-512x512.png differ diff --git a/apple-touch-icon.png b/apple-touch-icon.png new file mode 100644 index 0000000..8044fee Binary files /dev/null and b/apple-touch-icon.png differ diff --git a/assets/fonts/Mulish-Bold.ttf b/assets/fonts/Mulish-Bold.ttf new file mode 100644 index 0000000..1f522d4 Binary files /dev/null and b/assets/fonts/Mulish-Bold.ttf differ diff --git a/assets/fonts/Mulish-Bold.woff b/assets/fonts/Mulish-Bold.woff new file mode 100644 index 0000000..711448e Binary files /dev/null and b/assets/fonts/Mulish-Bold.woff differ diff --git a/assets/fonts/Mulish-ExtraBold.ttf b/assets/fonts/Mulish-ExtraBold.ttf new file mode 100644 index 0000000..62850ff Binary files /dev/null and b/assets/fonts/Mulish-ExtraBold.ttf differ diff --git a/assets/fonts/mulish-v5-latin-regular.eot b/assets/fonts/mulish-v5-latin-regular.eot new file mode 100644 index 0000000..423bcb1 Binary files /dev/null and b/assets/fonts/mulish-v5-latin-regular.eot differ diff --git a/assets/fonts/mulish-v5-latin-regular.svg b/assets/fonts/mulish-v5-latin-regular.svg new file mode 100644 index 0000000..70341f9 --- /dev/null +++ b/assets/fonts/mulish-v5-latin-regular.svg @@ -0,0 +1,305 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/fonts/mulish-v5-latin-regular.ttf b/assets/fonts/mulish-v5-latin-regular.ttf new file mode 100644 index 0000000..541bb40 Binary files /dev/null and b/assets/fonts/mulish-v5-latin-regular.ttf differ diff --git a/assets/fonts/mulish-v5-latin-regular.woff b/assets/fonts/mulish-v5-latin-regular.woff new file mode 100644 index 0000000..700ec13 Binary files /dev/null and b/assets/fonts/mulish-v5-latin-regular.woff differ diff --git a/assets/fonts/mulish-v5-latin-regular.woff2 b/assets/fonts/mulish-v5-latin-regular.woff2 new file mode 100644 index 0000000..b244298 Binary files /dev/null and b/assets/fonts/mulish-v5-latin-regular.woff2 differ diff --git a/assets/fonts/mulish-variablefont_wght.woff b/assets/fonts/mulish-variablefont_wght.woff new file mode 100644 index 0000000..fc42538 Binary files /dev/null and b/assets/fonts/mulish-variablefont_wght.woff differ diff --git a/assets/fonts/mulish-variablefont_wght.woff2 b/assets/fonts/mulish-variablefont_wght.woff2 new file mode 100644 index 0000000..8a233c6 Binary files /dev/null and b/assets/fonts/mulish-variablefont_wght.woff2 differ diff --git a/assets/images/carpentries-logo-sm.svg b/assets/images/carpentries-logo-sm.svg new file mode 100644 index 0000000..da70d40 --- /dev/null +++ b/assets/images/carpentries-logo-sm.svg @@ -0,0 +1,7 @@ + + + + + + + \ No newline at end of file diff --git a/assets/images/carpentries-logo.svg b/assets/images/carpentries-logo.svg new file mode 100644 index 0000000..6cbe665 --- /dev/null +++ b/assets/images/carpentries-logo.svg @@ -0,0 +1,19 @@ + + + + + + + + + + + + + + + + + + + diff --git a/assets/images/data-logo-sm.svg b/assets/images/data-logo-sm.svg new file mode 100644 index 0000000..6d4019e --- /dev/null +++ b/assets/images/data-logo-sm.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/assets/images/data-logo.svg b/assets/images/data-logo.svg new file mode 100644 index 0000000..c594952 --- /dev/null +++ b/assets/images/data-logo.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/assets/images/dropdown-arrow.svg b/assets/images/dropdown-arrow.svg new file mode 100644 index 0000000..a12b04b --- /dev/null +++ b/assets/images/dropdown-arrow.svg @@ -0,0 +1,12 @@ + + + + +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Discussion

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +
+ +
+ + +

FIXME

+ + +
+
+ + +
+
+ + + diff --git a/docsearch.css b/docsearch.css new file mode 100644 index 0000000..e5f1fe1 --- /dev/null +++ b/docsearch.css @@ -0,0 +1,148 @@ +/* Docsearch -------------------------------------------------------------- */ +/* + Source: https://github.com/algolia/docsearch/ + License: MIT +*/ + +.algolia-autocomplete { + display: block; + -webkit-box-flex: 1; + -ms-flex: 1; + flex: 1 +} + +.algolia-autocomplete .ds-dropdown-menu { + width: 100%; + min-width: none; + max-width: none; + padding: .75rem 0; + background-color: #fff; + background-clip: padding-box; + border: 1px solid rgba(0, 0, 0, .1); + box-shadow: 0 .5rem 1rem rgba(0, 0, 0, .175); +} + +@media (min-width:768px) { + .algolia-autocomplete .ds-dropdown-menu { + width: 175% + } +} + +.algolia-autocomplete .ds-dropdown-menu::before { + display: none +} + +.algolia-autocomplete .ds-dropdown-menu [class^=ds-dataset-] { + padding: 0; + background-color: rgb(255,255,255); + border: 0; + max-height: 80vh; +} + +.algolia-autocomplete .ds-dropdown-menu .ds-suggestions { + margin-top: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion { + padding: 0; + overflow: visible +} + +.algolia-autocomplete .algolia-docsearch-suggestion--category-header { + padding: .125rem 1rem; + margin-top: 0; + font-size: 1.3em; + font-weight: 500; + color: #00008B; + border-bottom: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--wrapper { + float: none; + padding-top: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--subcategory-column { + float: none; + width: auto; + padding: 0; + text-align: left +} + +.algolia-autocomplete .algolia-docsearch-suggestion--content { + float: none; + width: auto; + padding: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--content::before { + display: none +} + +.algolia-autocomplete .ds-suggestion:not(:first-child) .algolia-docsearch-suggestion--category-header { + padding-top: .75rem; + margin-top: .75rem; + border-top: 1px solid rgba(0, 0, 0, .1) +} + +.algolia-autocomplete .ds-suggestion .algolia-docsearch-suggestion--subcategory-column { + display: block; + padding: .1rem 1rem; + margin-bottom: 0.1; + font-size: 1.0em; + font-weight: 400 + /* display: none */ +} + +.algolia-autocomplete .algolia-docsearch-suggestion--title { + display: block; + padding: .25rem 1rem; + margin-bottom: 0; + font-size: 0.9em; + font-weight: 400 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--text { + padding: 0 1rem .5rem; + margin-top: -.25rem; + font-size: 0.8em; + font-weight: 400; + line-height: 1.25 +} + +.algolia-autocomplete .algolia-docsearch-footer { + width: 110px; + height: 20px; + z-index: 3; + margin-top: 10.66667px; + float: right; + font-size: 0; + line-height: 0; +} + +.algolia-autocomplete .algolia-docsearch-footer--logo { + background-image: url("data:image/svg+xml;utf8,"); + background-repeat: no-repeat; + background-position: 50%; + background-size: 100%; + overflow: hidden; + text-indent: -9000px; + width: 100%; + height: 100%; + display: block; + transform: translate(-8px); +} + +.algolia-autocomplete .algolia-docsearch-suggestion--highlight { + color: #FF8C00; + background: rgba(232, 189, 54, 0.1) +} + + +.algolia-autocomplete .algolia-docsearch-suggestion--text .algolia-docsearch-suggestion--highlight { + box-shadow: inset 0 -2px 0 0 rgba(105, 105, 105, .5) +} + +.algolia-autocomplete .ds-suggestion.ds-cursor .algolia-docsearch-suggestion--content { + background-color: rgba(192, 192, 192, .15) +} diff --git a/docsearch.js b/docsearch.js new file mode 100644 index 0000000..b35504c --- /dev/null +++ b/docsearch.js @@ -0,0 +1,85 @@ +$(function() { + + // register a handler to move the focus to the search bar + // upon pressing shift + "/" (i.e. "?") + $(document).on('keydown', function(e) { + if (e.shiftKey && e.keyCode == 191) { + e.preventDefault(); + $("#search-input").focus(); + } + }); + + $(document).ready(function() { + // do keyword highlighting + /* modified from https://jsfiddle.net/julmot/bL6bb5oo/ */ + var mark = function() { + + var referrer = document.URL ; + var paramKey = "q" ; + + if (referrer.indexOf("?") !== -1) { + var qs = referrer.substr(referrer.indexOf('?') + 1); + var qs_noanchor = qs.split('#')[0]; + var qsa = qs_noanchor.split('&'); + var keyword = ""; + + for (var i = 0; i < qsa.length; i++) { + var currentParam = qsa[i].split('='); + + if (currentParam.length !== 2) { + continue; + } + + if (currentParam[0] == paramKey) { + keyword = decodeURIComponent(currentParam[1].replace(/\+/g, "%20")); + } + } + + if (keyword !== "") { + $(".contents").unmark({ + done: function() { + $(".contents").mark(keyword); + } + }); + } + } + }; + + mark(); + }); +}); + +/* Search term highlighting ------------------------------*/ + +function matchedWords(hit) { + var words = []; + + var hierarchy = hit._highlightResult.hierarchy; + // loop to fetch from lvl0, lvl1, etc. + for (var idx in hierarchy) { + words = words.concat(hierarchy[idx].matchedWords); + } + + var content = hit._highlightResult.content; + if (content) { + words = words.concat(content.matchedWords); + } + + // return unique words + var words_uniq = [...new Set(words)]; + return words_uniq; +} + +function updateHitURL(hit) { + + var words = matchedWords(hit); + var url = ""; + + if (hit.anchor) { + url = hit.url_without_anchor + '?q=' + escape(words.join(" ")) + '#' + hit.anchor; + } else { + url = hit.url + '?q=' + escape(words.join(" ")); + } + + return url; +} diff --git a/favicon-16x16.png b/favicon-16x16.png new file mode 100644 index 0000000..d44f8ac Binary files /dev/null and b/favicon-16x16.png differ diff --git a/favicon-32x32.png b/favicon-32x32.png new file mode 100644 index 0000000..63441d4 Binary files /dev/null and b/favicon-32x32.png differ diff --git a/favicons/cp/apple-touch-icon-114x114.png b/favicons/cp/apple-touch-icon-114x114.png new file mode 100644 index 0000000..a60b758 Binary files /dev/null and b/favicons/cp/apple-touch-icon-114x114.png differ diff --git a/favicons/cp/apple-touch-icon-120x120.png b/favicons/cp/apple-touch-icon-120x120.png new file mode 100644 index 0000000..8f20a8f Binary files /dev/null and b/favicons/cp/apple-touch-icon-120x120.png differ diff --git a/favicons/cp/apple-touch-icon-144x144.png b/favicons/cp/apple-touch-icon-144x144.png new file mode 100644 index 0000000..4be151b Binary files /dev/null and b/favicons/cp/apple-touch-icon-144x144.png differ diff --git a/favicons/cp/apple-touch-icon-152x152.png b/favicons/cp/apple-touch-icon-152x152.png new file mode 100644 index 0000000..7d1d943 Binary files /dev/null and b/favicons/cp/apple-touch-icon-152x152.png differ diff --git a/favicons/cp/apple-touch-icon-57x57.png b/favicons/cp/apple-touch-icon-57x57.png new file mode 100644 index 0000000..92309ce Binary files /dev/null and b/favicons/cp/apple-touch-icon-57x57.png differ diff --git a/favicons/cp/apple-touch-icon-60x60.png b/favicons/cp/apple-touch-icon-60x60.png new file mode 100644 index 0000000..de8148e Binary files /dev/null and b/favicons/cp/apple-touch-icon-60x60.png differ diff --git a/favicons/cp/apple-touch-icon-72x72.png b/favicons/cp/apple-touch-icon-72x72.png new file mode 100644 index 0000000..81d7e3d Binary files /dev/null and b/favicons/cp/apple-touch-icon-72x72.png differ diff --git a/favicons/cp/apple-touch-icon-76x76.png b/favicons/cp/apple-touch-icon-76x76.png new file mode 100644 index 0000000..15bca5c Binary files /dev/null and b/favicons/cp/apple-touch-icon-76x76.png differ diff --git a/favicons/cp/favicon-128.png b/favicons/cp/favicon-128.png new file mode 100644 index 0000000..e612cdc Binary files /dev/null and b/favicons/cp/favicon-128.png differ diff --git a/favicons/cp/favicon-16x16.png b/favicons/cp/favicon-16x16.png new file mode 100644 index 0000000..65b3311 Binary files /dev/null and b/favicons/cp/favicon-16x16.png differ diff --git a/favicons/cp/favicon-196x196.png b/favicons/cp/favicon-196x196.png new file mode 100644 index 0000000..0da938b Binary files /dev/null and b/favicons/cp/favicon-196x196.png differ diff --git a/favicons/cp/favicon-32x32.png b/favicons/cp/favicon-32x32.png new file mode 100644 index 0000000..0c1442e Binary files /dev/null and b/favicons/cp/favicon-32x32.png differ diff --git a/favicons/cp/favicon-96x96.png b/favicons/cp/favicon-96x96.png new file mode 100644 index 0000000..bed74ec Binary files /dev/null and b/favicons/cp/favicon-96x96.png differ diff --git a/favicons/cp/favicon.ico b/favicons/cp/favicon.ico new file mode 100644 index 0000000..4f2f2f1 Binary files /dev/null and b/favicons/cp/favicon.ico differ diff --git a/favicons/cp/mstile-144x144.png b/favicons/cp/mstile-144x144.png new file mode 100644 index 0000000..4be151b Binary files /dev/null and b/favicons/cp/mstile-144x144.png differ diff --git a/favicons/cp/mstile-150x150.png b/favicons/cp/mstile-150x150.png new file mode 100644 index 0000000..bf7ad5e Binary files /dev/null and b/favicons/cp/mstile-150x150.png differ diff --git a/favicons/cp/mstile-310x150.png b/favicons/cp/mstile-310x150.png new file mode 100644 index 0000000..6ac8048 Binary files /dev/null and b/favicons/cp/mstile-310x150.png differ diff --git a/favicons/cp/mstile-310x310.png b/favicons/cp/mstile-310x310.png new file mode 100644 index 0000000..b778147 Binary files /dev/null and b/favicons/cp/mstile-310x310.png differ diff --git a/favicons/cp/mstile-70x70.png b/favicons/cp/mstile-70x70.png new file mode 100644 index 0000000..e612cdc Binary files /dev/null and b/favicons/cp/mstile-70x70.png differ diff --git a/favicons/dc/apple-touch-icon-114x114.png b/favicons/dc/apple-touch-icon-114x114.png new file mode 100644 index 0000000..edafbda Binary files /dev/null and b/favicons/dc/apple-touch-icon-114x114.png differ diff --git a/favicons/dc/apple-touch-icon-120x120.png b/favicons/dc/apple-touch-icon-120x120.png new file mode 100644 index 0000000..ee145ec Binary files /dev/null and b/favicons/dc/apple-touch-icon-120x120.png differ diff --git a/favicons/dc/apple-touch-icon-144x144.png b/favicons/dc/apple-touch-icon-144x144.png new file mode 100644 index 0000000..bf50701 Binary files /dev/null and b/favicons/dc/apple-touch-icon-144x144.png differ diff --git a/favicons/dc/apple-touch-icon-152x152.png b/favicons/dc/apple-touch-icon-152x152.png new file mode 100644 index 0000000..bd596c8 Binary files /dev/null and b/favicons/dc/apple-touch-icon-152x152.png differ diff --git a/favicons/dc/apple-touch-icon-57x57.png b/favicons/dc/apple-touch-icon-57x57.png new file mode 100644 index 0000000..61c1527 Binary files /dev/null and b/favicons/dc/apple-touch-icon-57x57.png differ diff --git a/favicons/dc/apple-touch-icon-60x60.png b/favicons/dc/apple-touch-icon-60x60.png new file mode 100644 index 0000000..9daad36 Binary files /dev/null and b/favicons/dc/apple-touch-icon-60x60.png differ diff --git a/favicons/dc/apple-touch-icon-72x72.png b/favicons/dc/apple-touch-icon-72x72.png new file mode 100644 index 0000000..2069520 Binary files /dev/null and b/favicons/dc/apple-touch-icon-72x72.png differ diff --git a/favicons/dc/apple-touch-icon-76x76.png b/favicons/dc/apple-touch-icon-76x76.png new file mode 100644 index 0000000..3db01ca Binary files /dev/null and b/favicons/dc/apple-touch-icon-76x76.png differ diff --git a/favicons/dc/favicon-128.png b/favicons/dc/favicon-128.png new file mode 100644 index 0000000..9e3de2a Binary files /dev/null and b/favicons/dc/favicon-128.png differ diff --git a/favicons/dc/favicon-16x16.png b/favicons/dc/favicon-16x16.png new file mode 100644 index 0000000..4c9f9b8 Binary files /dev/null and b/favicons/dc/favicon-16x16.png differ diff --git a/favicons/dc/favicon-196x196.png b/favicons/dc/favicon-196x196.png new file mode 100644 index 0000000..588afc2 Binary files /dev/null and b/favicons/dc/favicon-196x196.png differ diff --git a/favicons/dc/favicon-32x32.png b/favicons/dc/favicon-32x32.png new file mode 100644 index 0000000..9c2ecbf Binary files /dev/null and b/favicons/dc/favicon-32x32.png differ diff --git a/favicons/dc/favicon-96x96.png b/favicons/dc/favicon-96x96.png new file mode 100644 index 0000000..ff13fc0 Binary files /dev/null and b/favicons/dc/favicon-96x96.png differ diff --git a/favicons/dc/favicon.ico b/favicons/dc/favicon.ico new file mode 100644 index 0000000..e4715f3 Binary files /dev/null and b/favicons/dc/favicon.ico differ diff --git a/favicons/dc/mstile-144x144.png b/favicons/dc/mstile-144x144.png new file mode 100644 index 0000000..bf50701 Binary files /dev/null and b/favicons/dc/mstile-144x144.png differ diff --git a/favicons/dc/mstile-150x150.png b/favicons/dc/mstile-150x150.png new file mode 100644 index 0000000..c5844cc Binary files /dev/null and b/favicons/dc/mstile-150x150.png differ diff --git a/favicons/dc/mstile-310x150.png b/favicons/dc/mstile-310x150.png new file mode 100644 index 0000000..786813a Binary files /dev/null and b/favicons/dc/mstile-310x150.png differ diff --git a/favicons/dc/mstile-310x310.png b/favicons/dc/mstile-310x310.png new file mode 100644 index 0000000..9580653 Binary files /dev/null and b/favicons/dc/mstile-310x310.png differ diff --git a/favicons/dc/mstile-70x70.png b/favicons/dc/mstile-70x70.png new file mode 100644 index 0000000..9e3de2a Binary files /dev/null and b/favicons/dc/mstile-70x70.png differ diff --git a/favicons/lc/apple-touch-icon-114x114.png b/favicons/lc/apple-touch-icon-114x114.png new file mode 100644 index 0000000..6c83127 Binary files /dev/null and b/favicons/lc/apple-touch-icon-114x114.png differ diff --git a/favicons/lc/apple-touch-icon-120x120.png b/favicons/lc/apple-touch-icon-120x120.png new file mode 100644 index 0000000..8334648 Binary files /dev/null and b/favicons/lc/apple-touch-icon-120x120.png differ diff --git a/favicons/lc/apple-touch-icon-144x144.png b/favicons/lc/apple-touch-icon-144x144.png new file mode 100644 index 0000000..5f32151 Binary files /dev/null and b/favicons/lc/apple-touch-icon-144x144.png differ diff --git a/favicons/lc/apple-touch-icon-152x152.png b/favicons/lc/apple-touch-icon-152x152.png new file mode 100644 index 0000000..4e5c177 Binary files /dev/null and b/favicons/lc/apple-touch-icon-152x152.png differ diff --git a/favicons/lc/apple-touch-icon-57x57.png b/favicons/lc/apple-touch-icon-57x57.png new file mode 100644 index 0000000..61f9c9c Binary files /dev/null and b/favicons/lc/apple-touch-icon-57x57.png differ diff --git a/favicons/lc/apple-touch-icon-60x60.png b/favicons/lc/apple-touch-icon-60x60.png new file mode 100644 index 0000000..ccb5ada Binary files /dev/null and b/favicons/lc/apple-touch-icon-60x60.png differ diff --git a/favicons/lc/apple-touch-icon-72x72.png b/favicons/lc/apple-touch-icon-72x72.png new file mode 100644 index 0000000..517d459 Binary files /dev/null and b/favicons/lc/apple-touch-icon-72x72.png differ diff --git a/favicons/lc/apple-touch-icon-76x76.png b/favicons/lc/apple-touch-icon-76x76.png new file mode 100644 index 0000000..17454b3 Binary files /dev/null and b/favicons/lc/apple-touch-icon-76x76.png differ diff --git a/favicons/lc/favicon-128.png b/favicons/lc/favicon-128.png new file mode 100644 index 0000000..9d781c9 Binary files /dev/null and b/favicons/lc/favicon-128.png differ diff --git a/favicons/lc/favicon-16x16.png b/favicons/lc/favicon-16x16.png new file mode 100644 index 0000000..3c20abc Binary files /dev/null and b/favicons/lc/favicon-16x16.png differ diff --git a/favicons/lc/favicon-196x196.png b/favicons/lc/favicon-196x196.png new file mode 100644 index 0000000..46baaf8 Binary files /dev/null and b/favicons/lc/favicon-196x196.png differ diff --git a/favicons/lc/favicon-32x32.png b/favicons/lc/favicon-32x32.png new file mode 100644 index 0000000..ed6701e Binary files /dev/null and b/favicons/lc/favicon-32x32.png differ diff --git a/favicons/lc/favicon-96x96.png b/favicons/lc/favicon-96x96.png new file mode 100644 index 0000000..bc468c7 Binary files /dev/null and b/favicons/lc/favicon-96x96.png differ diff --git a/favicons/lc/favicon.ico b/favicons/lc/favicon.ico new file mode 100644 index 0000000..5c14e80 Binary files /dev/null and b/favicons/lc/favicon.ico differ diff --git a/favicons/lc/mstile-144x144.png b/favicons/lc/mstile-144x144.png new file mode 100644 index 0000000..5f32151 Binary files /dev/null and b/favicons/lc/mstile-144x144.png differ diff --git a/favicons/lc/mstile-150x150.png b/favicons/lc/mstile-150x150.png new file mode 100644 index 0000000..924953a Binary files /dev/null and b/favicons/lc/mstile-150x150.png differ diff --git a/favicons/lc/mstile-310x150.png b/favicons/lc/mstile-310x150.png new file mode 100644 index 0000000..e4dcda4 Binary files /dev/null and b/favicons/lc/mstile-310x150.png differ diff --git a/favicons/lc/mstile-310x310.png b/favicons/lc/mstile-310x310.png new file mode 100644 index 0000000..a12c876 Binary files /dev/null and b/favicons/lc/mstile-310x310.png differ diff --git a/favicons/lc/mstile-70x70.png b/favicons/lc/mstile-70x70.png new file mode 100644 index 0000000..9d781c9 Binary files /dev/null and b/favicons/lc/mstile-70x70.png differ diff --git a/favicons/swc/apple-touch-icon-114x114.png b/favicons/swc/apple-touch-icon-114x114.png new file mode 100644 index 0000000..e5125f8 Binary files /dev/null and b/favicons/swc/apple-touch-icon-114x114.png differ diff --git a/favicons/swc/apple-touch-icon-120x120.png b/favicons/swc/apple-touch-icon-120x120.png new file mode 100644 index 0000000..0f97a0a Binary files /dev/null and b/favicons/swc/apple-touch-icon-120x120.png differ diff --git a/favicons/swc/apple-touch-icon-144x144.png b/favicons/swc/apple-touch-icon-144x144.png new file mode 100644 index 0000000..7441446 Binary files /dev/null and b/favicons/swc/apple-touch-icon-144x144.png differ diff --git a/favicons/swc/apple-touch-icon-152x152.png b/favicons/swc/apple-touch-icon-152x152.png new file mode 100644 index 0000000..45cc338 Binary files /dev/null and b/favicons/swc/apple-touch-icon-152x152.png differ diff --git a/favicons/swc/apple-touch-icon-57x57.png b/favicons/swc/apple-touch-icon-57x57.png new file mode 100644 index 0000000..e180a4a Binary files /dev/null and b/favicons/swc/apple-touch-icon-57x57.png differ diff --git a/favicons/swc/apple-touch-icon-60x60.png b/favicons/swc/apple-touch-icon-60x60.png new file mode 100644 index 0000000..c96fd6c Binary files /dev/null and b/favicons/swc/apple-touch-icon-60x60.png differ diff --git a/favicons/swc/apple-touch-icon-72x72.png b/favicons/swc/apple-touch-icon-72x72.png new file mode 100644 index 0000000..aae014a Binary files /dev/null and b/favicons/swc/apple-touch-icon-72x72.png differ diff --git a/favicons/swc/apple-touch-icon-76x76.png b/favicons/swc/apple-touch-icon-76x76.png new file mode 100644 index 0000000..2167f94 Binary files /dev/null and b/favicons/swc/apple-touch-icon-76x76.png differ diff --git a/favicons/swc/favicon-128.png b/favicons/swc/favicon-128.png new file mode 100644 index 0000000..f61df62 Binary files /dev/null and b/favicons/swc/favicon-128.png differ diff --git a/favicons/swc/favicon-16x16.png b/favicons/swc/favicon-16x16.png new file mode 100644 index 0000000..2d20a40 Binary files /dev/null and b/favicons/swc/favicon-16x16.png differ diff --git a/favicons/swc/favicon-196x196.png b/favicons/swc/favicon-196x196.png new file mode 100644 index 0000000..2a20d3a Binary files /dev/null and b/favicons/swc/favicon-196x196.png differ diff --git a/favicons/swc/favicon-32x32.png b/favicons/swc/favicon-32x32.png new file mode 100644 index 0000000..f622b73 Binary files /dev/null and b/favicons/swc/favicon-32x32.png differ diff --git a/favicons/swc/favicon-96x96.png b/favicons/swc/favicon-96x96.png new file mode 100644 index 0000000..5e57f66 Binary files /dev/null and b/favicons/swc/favicon-96x96.png differ diff --git a/favicons/swc/favicon.ico b/favicons/swc/favicon.ico new file mode 100644 index 0000000..f771790 Binary files /dev/null and b/favicons/swc/favicon.ico differ diff --git a/favicons/swc/mstile-144x144.png b/favicons/swc/mstile-144x144.png new file mode 100644 index 0000000..7441446 Binary files /dev/null and b/favicons/swc/mstile-144x144.png differ diff --git a/favicons/swc/mstile-150x150.png b/favicons/swc/mstile-150x150.png new file mode 100644 index 0000000..d1594bc Binary files /dev/null and b/favicons/swc/mstile-150x150.png differ diff --git a/favicons/swc/mstile-310x150.png b/favicons/swc/mstile-310x150.png new file mode 100644 index 0000000..f7d58b2 Binary files /dev/null and b/favicons/swc/mstile-310x150.png differ diff --git a/favicons/swc/mstile-310x310.png b/favicons/swc/mstile-310x310.png new file mode 100644 index 0000000..b632b42 Binary files /dev/null and b/favicons/swc/mstile-310x310.png differ diff --git a/favicons/swc/mstile-70x70.png b/favicons/swc/mstile-70x70.png new file mode 100644 index 0000000..f61df62 Binary files /dev/null and b/favicons/swc/mstile-70x70.png differ diff --git a/fig/03-loop_2_0.png b/fig/03-loop_2_0.png new file mode 100644 index 0000000..0d62eb3 Binary files /dev/null and b/fig/03-loop_2_0.png differ diff --git a/fig/05-loops_image_num.png b/fig/05-loops_image_num.png new file mode 100644 index 0000000..088a0b2 Binary files /dev/null and b/fig/05-loops_image_num.png differ diff --git a/fig/python-else-if.png b/fig/python-else-if.png new file mode 100644 index 0000000..e324259 Binary files /dev/null and b/fig/python-else-if.png differ diff --git a/fig/python-flowchart-conditional.png b/fig/python-flowchart-conditional.png new file mode 100644 index 0000000..0163989 Binary files /dev/null and b/fig/python-flowchart-conditional.png differ diff --git a/fig/python-function.svg b/fig/python-function.svg new file mode 100644 index 0000000..fa15036 --- /dev/null +++ b/fig/python-function.svg @@ -0,0 +1,24 @@ + + + + + + + def fahr_to_celsius(temp): return ((temp - 32) * (5/9)) + + + + + + + + + def statement + name + parameter names + body + return statement + return value + + + \ No newline at end of file diff --git a/fig/python-multi-if.png b/fig/python-multi-if.png new file mode 100644 index 0000000..e75f9af Binary files /dev/null and b/fig/python-multi-if.png differ diff --git a/images.html b/images.html new file mode 100644 index 0000000..7b4e1d2 --- /dev/null +++ b/images.html @@ -0,0 +1,532 @@ + + + + + +Python for Official Statistics: All Images + + + + + + + + + + + + +
+ Python for Official Statistics +
+ +
+
+ + + + + + +
+
+ + +

Introduction

+

Python Fundamentals

+
+

Figure 1

+ +
Value of 65.0 with weight_kg label stuck on it

+

Figure 2

+ +
Value of 65.0 with weight_kg label stuck on it, and value of 143.0 with weight_lb label stuck on it

+

Figure 3

+ +
Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lbstuck on it

Data Transformation

+
+

Figure 1

+ +
'data' is a 3 by 3 numpy array containing row 0: ['A', 'B', 'C'], row 1: ['D', 'E', 'F'], androw 2: ['G', 'H', 'I']. Starting in the upper left hand corner, data[0, 0] = 'A', data[0, 1] = 'B',data[0, 2] = 'C', data[1, 0] = 'D', data[1, 1] = 'E', data[1, 2] = 'F', data[2, 0] = 'G',data[2, 1] = 'H', and data[2, 2] = 'I',in the bottom right hand corner.

List and Dictionary Methods

+

Loops and Conditional Logic

+
+

Figure 1

+ +
Line graphs showing average, maximum, and minimum inflammation across all patients over a 40-day period.

+

Figure 2

+ +
Loop variable 'num' being assigned the value of each element in the list odds in turn andthen being printed

+

Figure 3

+ +
A flowchart diagram of the if-else construct that tests if variable num is greater than 100

+

Figure 4

+ +
A flowchart diagram of a conditional section with multiple elif conditions and some > possible outcomes.

+

Figure 5

+ +
A flowchart diagram of a conditional section with multiple if statements and some possible outcomes.

Alternatives to Loops

+

Creating Functions

+
+

Figure 1

+ +
Labeled parts of a Python function definition

Data Analysis

+

Visualizations

+

Errors and Exceptions

+
+
+
+
+ + +
+ + +
+ + + + + diff --git a/index.html b/index.html new file mode 100644 index 0000000..e2d7808 --- /dev/null +++ b/index.html @@ -0,0 +1,443 @@ + +Python for Official Statistics: Summary and Setup +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+

Summary and Setup

+ + +

Python for Official Statistics will teach participants the basics of +Python for its use in creating Official Statistics. Participants will +learn basic programming principles, and employ them in the manipulation +of data and data structures.

+
+
+ +
+
+

Prerequisites +

+
+

FIXME

+
+
+
+ + +

FIXME

+ +
+ + +
+
+ + + diff --git a/instructor-notes.html b/instructor-notes.html new file mode 100644 index 0000000..011bdbb --- /dev/null +++ b/instructor-notes.html @@ -0,0 +1,494 @@ + + + + + +Python for Official Statistics: Instructor Notes + + + + + + + + + + + + +
+ Python for Official Statistics +
+ +
+
+ + + + + + +
+
+

Instructor Notes

+ +

FIXME

+ +
+
+
+
+ + +
+ + +
+ + + + + diff --git a/instructor/01-introduction.html b/instructor/01-introduction.html new file mode 100644 index 0000000..a9a38c1 --- /dev/null +++ b/instructor/01-introduction.html @@ -0,0 +1,650 @@ + +Python for Official Statistics: Introduction +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Introduction

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 15 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What is programming?
  • +
  • How do I document code?
  • +
  • How do I find reliable and safe resources or code online?
  • +
+
+
+
+
+
+

Objectives

+
  • identify basic concepts in programming
  • +
+
+
+
+
+

Programming in Python +

+

In most general terms, programming is the process of writing +instructions for a computer. In this course we will be using Python as +the language to communicate with the computer.

+
+

Strictly speaking, Python is an interpreted language, rather than a +compiled language, meaning we are not communicating directly with the +computer when we use Python. When we run Python code, our Python source +code is first translated into byte code, which is then executed by the +Python virtual machine.

+
+

Programming is a wide topic including a variety of techniques and +tools. In this course we’ll be focusing on programming for statistical +analysis.

+
+

IDEs

+

IDE stands for Integrated Development Environment. IDEs are where you +will write, edit, and debug python scripts, so you want to choose one +that makes you feel comfortable and includes the functionality that you +need. Some open-source IDEs for Python include JupyterLab and Visual Studio +Code.

+
+
+

Packages

+

Packages, or libraries, are extensions to the statistical programming +language. They contain code, data, and documentation in a standardised +collection format that can be installed by users, typically via a +centralised software repository. A typical Python workflow will use base +Python (the core operations and functions provided by your Python +installation) as well as specialised data analysis and scientific +packages like NumPy, SciPy and Pandas.

+
+

Best Practices +

+

Let’s overview some base concepts that any programmer should always +keep in mind.

+
+

Documentation

+

Have you ever returned to a task and tried to read a note that you +quickly scrawled for yourself the last time you were working on it? Have +you ever inherited a project from a colleague and found you have no idea +what remains to be done?

+

It can be very challenging to return to your own work or a +colleague’s and this goes doubly for programming. Documentation is one +way we can reduce the burden on future selves and our colleagues.

+
+

Inline Documentation

+

As a new programmer, inline documentation can be the most helpful. +Inline documentation refers to writing comments on the same line as your +code. For example, if we wrote a line of code to sum 1+1, we might +document it as follows:

+
+

PYTHON +

+
1+1         # adding the numbers 1 and 1 together.
+
+

Although this is a very simple line of code and it might seem like +overkill to document it in this way, these types of comments can be very +helpful in jogging your memory when returning to a project. Inline +comments can also help you to break multi-step programs into digestible +and readable pieces.

+
+
+

External Documentation

+

Sometimes you require more detail than you can comfortably fit in +your inline documentation. In this case it can be helpful to create +separate files to document your project. This type of documentation will +typically focus on the goals, scope, and any special instructions +relating to your project rather than the details fo your code. The most +common type of external documentation is a README file. It is best +practice to create a basic README file for any project. A basic README +should include:

+
  • a brief description of the project,
  • +
  • any special instructions for installation or use,
  • +
  • the authors and any references.
  • +

README files are just text files and it is best practice is to save +your README file as a README.md markdown document. This +file format is automatically recognised by code repositories like +GitHub, so your README contents are displayed alongside your code +repository.

+
+
+

DocStrings

+

In chapter 7: functions we’ll learn +about documentation specific to functions known as DocStrings.

+
+
+

Getting Help +

+

Later on, in chapter 10: Errors +and Exceptions we will cover errors in more detail. However, before +we get there it’s very likely you’ll need some assistance writing Python +code.

+
+

Built-in Help

+

There is a help +function built into base Python. You can use it to investigate +built-in functions, data types, and more. For example, say we want to +know more about the print() function in Python:

+
+

PYTHON +

+
help(print)
+
+
+

OUTPUT +

+
Help on built-in function print in module builtins:
+
+print(...)
+    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
+
+    Prints the values to a stream, or to sys.stdout by default.
+    Optional keyword arguments:
+    file:  a file-like object (stream); defaults to the current sys.stdout.
+    sep:   string inserted between values, default a space.
+    end:   string appended after the last value, default a newline.
+-- More  --
+
+
+
+

Finding Resources online

+

Stack Overflow is a valuable +resource for programmers of all levels. It can be daunting to post your +own question! Fortunately, chances are someone else has already asked a +similar question!

+

The Official Python +Documentation is another great resource.

+

It can also be helpful to do a general search for a particular topic +or error message. It’s very likely the first few results will be from +StackOverflow, followed by a few from official documentation and then +you may start seeing results from personal blogs or third parties. These +third party results can sometime be valuable but we should be cautious! +Here are a few things to keep in mind when you are looking for online +resources:

+
  1. Don’t download or install anything unless you are certain of what it +is and why you need it.
  2. +
  3. Don’t copy or run code unless you fully understand what it +does.
  4. +
  5. Python is an open-source language; official documentation and +resources will not be behind a paywall.
  6. +
  7. You may not find a resource or solution to fit your exact needs. Try +to be flexible and adapt online solutions to fit your needs.
  8. +
+
+ +
+
+

Key Points +

+
+
  • Python is an interpreted language.
  • +
  • Code is commonly developed inside an integrated development +environment.
  • +
  • A typical Python workflow uses base Python and additional Python +packages developed for statistical programming purposes.
  • +
  • In-line and external documentation helps ensure that your code is +readable.
  • +
  • You can find help through the built-in help function and external +resources.
  • +
+
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/02-python_fundamentals.html b/instructor/02-python_fundamentals.html new file mode 100644 index 0000000..43bf04c --- /dev/null +++ b/instructor/02-python_fundamentals.html @@ -0,0 +1,851 @@ + +Python for Official Statistics: Python Fundamentals +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Python Fundamentals

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 30 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What basic data types can I work with in Python?
  • +
  • How can I create a new variable in Python?
  • +
  • How do I use a function?
  • +
  • Can I change the value associated with a variable after I create +it?
  • +
+
+
+
+
+
+

Objectives

+
  • Assign values to variables.
  • +
+
+
+
+
+

Variables +

+

Any Python interpreter can be used as a calculator:

+
+

PYTHON +

+
3 + 5 * 4
+
+
+

OUTPUT +

+
23
+
+

This is great but not very interesting. To do anything useful with +data, we need to assign its value to a variable. In Python, we +can assign a value to a variable, using the equals sign +=. For example, we can track the weight of a patient who +weighs 60 kilograms by assigning the value 60 to a variable +weight_kg:

+
+

PYTHON +

+
weight_kg = 60
+
+

From now on, whenever we use weight_kg, Python will +substitute the value we assigned to it. In layperson’s terms, a +variable is a name for a value.

+

In Python, variable names:

+
  • can include letters, digits, and underscores
  • +
  • cannot start with a digit
  • +
  • are case sensitive.
  • +

This means that, for example:

+
  • +weight0 is a valid variable name, whereas +0weight is not
  • +
  • +weight and Weight are different +variables
  • +

Types of data +

+

Python knows various types of data. Three common ones are:

+
  • integer numbers
  • +
  • floating point numbers, and
  • +
  • strings.
  • +

In the example above, variable weight_kg has an integer +value of 60. If we want to more precisely track the weight +of our patient, we can use a floating point value by executing:

+
+

PYTHON +

+
weight_kg = 60.3
+
+

To create a string, we add single or double quotes around some text. +To identify and track a patient throughout our study, we can assign each +person a unique identifier by storing it in a string:

+
+

PYTHON +

+
patient_id = '001'
+
+

Using Variables in Python +

+

Once we have data stored with variable names, we can make use of it +in calculations. We may want to store our patient’s weight in pounds as +well as kilograms:

+
+

PYTHON +

+
weight_lb = 2.2 * weight_kg
+
+

We might decide to add a prefix to our patient identifier:

+
+

PYTHON +

+
patient_id = 'inflam_' + patient_id
+
+

Built-in Python functions +

+

To carry out common tasks with data and variables in Python, the +language provides us with several built-in functions. To display information to +the screen, we use the print function:

+
+

PYTHON +

+
print(weight_lb)
+print(patient_id)
+
+
+

OUTPUT +

+
132.66
+inflam_001
+
+

When we want to make use of a function, referred to as calling the +function, we follow its name by parentheses. The parentheses are +important: if you leave them off, the function doesn’t actually run! +Sometimes you will include values or variables inside the parentheses +for the function to use. In the case of print, we use the +parentheses to tell the function what value we want to display. We will +learn more about how functions work and how to create our own in later +episodes.

+

We can display multiple things at once using only one +print call:

+
+

PYTHON +

+
print(patient_id, 'weight in kilograms:', weight_kg)
+
+
+

OUTPUT +

+
inflam_001 weight in kilograms: 60.3
+
+

We can also call a function inside of another function call. For example, +Python has a built-in function called type that tells you a +value’s data type:

+
+

PYTHON +

+
print(type(60.3))
+print(type(patient_id))
+
+
+

OUTPUT +

+
<class 'float'>
+<class 'str'>
+
+

Moreover, we can do arithmetic with variables right inside the +print function:

+
+

PYTHON +

+
print('weight in pounds:', 2.2 * weight_kg)
+
+
+

OUTPUT +

+
weight in pounds: 132.66
+
+

The above command, however, did not change the value of +weight_kg:

+
+

PYTHON +

+
print(weight_kg)
+
+
+

OUTPUT +

+
60.3
+
+

To change the value of the weight_kg variable, we have +to assign weight_kg a new value using the +equals = sign:

+
+

PYTHON +

+
weight_kg = 65.0
+print('weight in kilograms is now:', weight_kg)
+
+
+

OUTPUT +

+
weight in kilograms is now: 65.0
+
+
+
+ +
+
+

Variables as Sticky Notes +

+
+

A variable in Python is analogous to a sticky note with a name +written on it: assigning a value to a variable is like putting that +sticky note on a particular value.

+
Value of 65.0 with weight_kg label stuck on it

Using this analogy, we can investigate how assigning a value to one +variable does not change values of other, seemingly +related, variables. For example, let’s store the subject’s weight in +pounds in its own variable:

+
+

PYTHON +

+
# There are 2.2 pounds per kilogram
+weight_lb = 2.2 * weight_kg
+print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
+
+
+

OUTPUT +

+
weight in kilograms: 65.0 and in pounds: 143.0
+
+

Everything in a line of code following the ‘#’ symbol is a comment that is ignored by Python. +Comments allow programmers to leave explanatory notes for other +programmers or their future selves.

+
Value of 65.0 with weight_kg label stuck on it, and value of 143.0 with weight_lb label stuck on it

Similar to above, the expression 2.2 * weight_kg is +evaluated to 143.0, and then this value is assigned to the +variable weight_lb (i.e. the sticky note +weight_lb is placed on 143.0). At this point, +each variable is “stuck” to completely distinct and unrelated +values.

+

Let’s now change weight_kg:

+
+

PYTHON +

+
weight_kg = 100.0
+print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
+
+
+

OUTPUT +

+
weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
+
+
Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lbstuck on it

Since weight_lb doesn’t “remember” where its value comes +from, it is not updated when we change weight_kg.

+
+
+
+
+
+ +
+
+

Check Your Understanding +

+
+

What values do the variables mass and age +have after each of the following statements? Test your answer by +executing the lines.

+
+

PYTHON +

+
mass = 47.5
+age = 122
+mass = mass * 2.0
+age = age - 20
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
`mass` holds a value of 47.5, `age` does not exist
+`mass` still holds a value of 47.5, `age` holds a value of 122
+`mass` now has a value of 95.0, `age`'s value is still 122
+`mass` still has a value of 95.0, `age` now holds 102
+
+
+
+
+
+
+
+ +
+
+

Sorting Out References +

+
+

Python allows you to assign multiple values to multiple variables in +one line by separating the variables and values with commas. What does +the following program print out?

+
+

PYTHON +

+
first, second = 'Grace', 'Hopper'
+third, fourth = second, first
+print(third, fourth)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
Hopper Grace
+
+
+
+
+
+
+
+ +
+
+

Seeing Data Types +

+
+

What are the data types of the following variables?

+
+

PYTHON +

+
planet = 'Earth'
+apples = 5
+distance = 10.5
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(type(planet))
+print(type(apples))
+print(type(distance))
+
+
+

OUTPUT +

+
<class 'str'>
+<class 'int'>
+<class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Key Points +

+
+
  • Basic data types in Python include integers, strings, and +floating-point numbers.
  • +
  • Use variable = value to assign a value to a variable in +order to record it in memory.
  • +
  • Variables are created on demand whenever a value is assigned to +them.
  • +
  • Use print(something) to display the value of +something.
  • +
  • Use # some kind of explanation to add comments to +programs.
  • +
  • Built-in functions are always available to use.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/03-data_transformation.html b/instructor/03-data_transformation.html new file mode 100644 index 0000000..2f61a80 --- /dev/null +++ b/instructor/03-data_transformation.html @@ -0,0 +1,865 @@ + +Python for Official Statistics: Data Transformation +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Data Transformation

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 60 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I process tabular data files in Python?
  • +
+
+
+
+
+
+

Objectives

+
  • Explain what a library is and what libraries are used for.
  • +
  • Import a Python library and use the functions it contains.
  • +
  • Read tabular data from a file into a program.
  • +
  • Select individual values and subsections from data.
  • +
  • Perform operations on arrays of data.
  • +
+
+
+
+
+

Words are useful, but what’s more useful are the sentences and +stories we build with them. Similarly, while a lot of powerful, general +tools are built into Python, specialized tools built up from these basic +units live in libraries that can be +called upon when needed.

+

Loading data into Python +

+

To begin processing the clinical trial inflammation data, we need to +load it into Python. Python can work with many different file types. +Text files can be loaded into Python by using the base Python +function

+
+

PYTHON +

+
Open("filename.txt", "r") 
+
+

where “r” means read only, or if you want to write to the file, you +can use “w”.

+

However, our patient data is in a csv. file, which is more commonly +loaded by using a library. Python has hundreds of thousands of libraries +to choose from to help carry out your work. Importing a library is like +getting a piece of lab equipment out of a storage locker and setting it +up on the bench. Libraries provide additional functionality to the basic +Python package, much like a new piece of equipment adds functionality to +a lab space. Just like in the lab, importing too many libraries can +sometimes complicate and slow down your programs - so we only import +what we need for each program. There are a couple common Python +libraries to load (and work with data).

+

pandas +

+

The first library we will present is called pandas pandas is a +Python library containing a set of functions and specialised data +structures that have been designed to help Python programmers to perform +data analysis tasks in a structured way.

+

Most of the things that pandas can do can be done with basic Python, +but the collected set of pandas functions and data structure makes the +data analysis tasks more consistent in terms of syntax and therefore +aids readabilty.

+

Remember to write the library name with a lower case ‘p’ because the +name of the package and Python is case sensitive.

+
+

Importing the pandas library

+

Importing the pandas library is done in exactly the same way as for +any other library. In almost all examples of Python code using the +pandas library, it will have been imported and given an alias of +pd. We will follow the same convention.

+
+

PYTHON +

+
import pandas as pd
+
+
+
+

Pandas data structures

+

There are two main data structure used by pandas, they are the Series +and the Dataframe. The Series equates in general to a vector or a list. +The Dataframe is equivalent to a table. Each column in a pandas +Dataframe is a pandas Series data structure.

+

We will mainly be looking at the Dataframe.

+

We can easily create a Pandas Dataframe by reading a .csv file

+
+
+

Reading a csv file

+

When we read a csv dataset in base Python we did so by opening the +dataset, reading and processing a record at a time and then closing the +dataset after we had read the last record. Reading datasets in this way +is slow and places all of the responsibility for extracting individual +data items of information from the records on the programmer.

+

The main advantage of this approach, however, is that you only have +to store one dataset record in memory at a time. This means that if you +have the time, you can process datasets of any size.

+

In Pandas, csv files are read as complete datasets. You do not have +to explicitly open and close the dataset. All of the dataset records are +assembled into a Dataframe. If your dataset has column headers in the +first record then these can be used as the Dataframe column names. You +can explicitly state this in the parameters to the call, but pandas is +usually able to infer that there ia a header row and use it +automatically.

+

To tell Python that we’d like to start using pandas, we need to import it:

+
+

PYTHON +

+
import pandas as pd
+
+

Often, libraries are given an alias or a short form name, in this +case pandas is given the alias “pd”. Aliases for common data analysis +libraries include:

+
+

PYTHON +

+
import pandas as pd
+import numpy as np
+import matplotlib as plt
+import seaborn as sns
+
+

Once we’ve imported the library, we can ask the library to read our +data file for us:

+
+

PYTHON +

+
pd.read_csv("filename.csv)
+
+

pandas is a commonly used library for working with and analysing +data. However, we will be working with a different package for the +remainder of this course. If you would like to learn more about data +manipulation and analysis using pandas, we recommend checking out Data Analysis and +Visualization with Python for Social Scientists.

+
+

numpy +

+

The second package that we will present is called NumPy, which stands for Numerical +Python. In general, you should use this library when you want to do +fancy things with lots of numbers, especially if you have matrices or +arrays. Numpy matrices are typically lighter weight with better +performance, particularly when working with large datasets.

+

We will be using this package to work with our clinical trial +inflammation data.

+

To tell Python that we’d like to start using NumPy, we need to import it:

+
+

PYTHON +

+
import numpy as np
+
+

Now that we have imported the library, we can ask the library (by +using the alisa np) to read our data file for us:

+
+

PYTHON +

+
np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+
+

OUTPUT +

+
array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+
+

The expression np.loadtxt(...) is a function call that asks Python +to run the function +loadtxt which belongs to the np library. The +dot notation in Python is used most of all as an object +attribute/property specifier or for invoking its method. +object.property will give you the object.property value, +object_name.method() will invoke on object_name method.

+

As an example, John Smith is the John that belongs to the Smith +family. We could use the dot notation to write his name +smith.john, just as loadtxt is a function that +belongs to the np library.

+

np.loadtxt has two parameters: the name of the file we +want to read and the delimiter +that separates values on a line. These both need to be character strings +(or strings for short), so we put +them in quotes.

+

Since we haven’t told it to do anything else with the function’s +output, the notebook displays it. +In this case, that output is the data we just loaded. By default, only a +few rows and columns are shown (with ... to omit elements +when displaying big arrays). Note that, to save space when displaying +NumPy arrays, Python does not show us trailing zeros, so +1.0 becomes 1..

+

Our call to np.loadtxt read our file but didn’t save the +data in memory. To do that, we need to assign the array to a variable. +In a similar manner to how we assign a single value to a variable, we +can also assign an array of values to a variable using the same syntax. +Let’s re-run np.loadtxt and save the returned data:

+
+

PYTHON +

+
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+

This statement doesn’t produce any output because we’ve assigned the +output to the variable data. If we want to check that the +data have been loaded, we can print the variable’s value:

+
+

PYTHON +

+
print(data)
+
+
+

OUTPUT +

+
[[ 0.  0.  1. ...,  3.  0.  0.]
+ [ 0.  1.  2. ...,  1.  0.  1.]
+ [ 0.  1.  1. ...,  2.  1.  1.]
+ ...,
+ [ 0.  1.  1. ...,  1.  1.  1.]
+ [ 0.  0.  0. ...,  0.  2.  0.]
+ [ 0.  0.  1. ...,  1.  1.  0.]]
+
+

Now that the data are in memory, we can manipulate them. First, let’s +ask what type of thing +data refers to:

+
+

PYTHON +

+
print(type(data))
+
+
+

OUTPUT +

+
<class 'np.ndarray'>
+
+

The output tells us that data currently refers to an +N-dimensional array, the functionality for which is provided by the +NumPy library. These data correspond to arthritis patients’ +inflammation. The rows are the individual patients, and the columns are +their daily inflammation measurements.

+
+
+ +
+
+

Data Type +

+
+

A Numpy array contains one or more elements of the same type. The +type function will only tell you that a variable is a NumPy +array but won’t tell you the type of thing inside the array. We can find +out the type of the data contained in the NumPy array.

+
+

PYTHON +

+
print(data.dtype)
+
+
+

OUTPUT +

+
float64
+
+

This tells us that the NumPy array’s elements are floating-point +numbers.

+
+
+
+

With the following command, we can see the array’s shape:

+
+

PYTHON +

+
print(data.shape)
+
+
+

OUTPUT +

+
(60, 40)
+
+

The output tells us that the data array variable +contains 60 rows and 40 columns. When we created the variable +data to store our arthritis data, we did not only create +the array; we also created information about the array, called members or attributes. This extra +information describes data in the same way an adjective +describes a noun. data.shape is an attribute of +data which describes the dimensions of data. +We use the same dotted notation for the attributes of variables that we +use for the functions in libraries because they have the same +part-and-whole relationship.

+

If we want to get a single number from the array, we must provide an +index in square brackets after the +variable name, just as we do in math when referring to an element of a +matrix. Our inflammation data has two dimensions, so we will need to use +two indices to refer to one specific value:

+
+

PYTHON +

+
print('first value in data:', data[0, 0])
+
+
+

OUTPUT +

+
first value in data: 0.0
+
+
+

PYTHON +

+
print('middle value in data:', data[29, 19])
+
+
+

OUTPUT +

+
middle value in data: 16.0
+
+

The expression data[29, 19] accesses the element at row +30, column 20. While this expression may not surprise you, +data[0, 0] might. Programming languages like Fortran, +MATLAB and R start counting at 1 because that’s what human beings have +done for thousands of years. Languages in the C family (including C++, +Java, Perl, and Python) count from 0 because it represents an offset +from the first value in the array (the second value is offset by one +index from the first value). This is closer to the way that computers +represent arrays (if you are interested in the historical reasons behind +counting indices from zero, you can read Mike +Hoye’s blog post). As a result, if we have an M×N array in Python, +its indices go from 0 to M-1 on the first axis and 0 to N-1 on the +second. It takes a bit of getting used to, but one way to remember the +rule is that the index is how many steps we have to take from the start +to get the item we want.

+
'data' is a 3 by 3 numpy array containing row 0: ['A', 'B', 'C'], row 1: ['D', 'E', 'F'], androw 2: ['G', 'H', 'I']. Starting in the upper left hand corner, data[0, 0] = 'A', data[0, 1] = 'B',data[0, 2] = 'C', data[1, 0] = 'D', data[1, 1] = 'E', data[1, 2] = 'F', data[2, 0] = 'G',data[2, 1] = 'H', and data[2, 2] = 'I',in the bottom right hand corner.
+
+ +
+
+

In the Corner +

+
+

What may also surprise you is that when Python displays an array, it +shows the element with index [0, 0] in the upper left +corner rather than the lower left. This is consistent with the way +mathematicians draw matrices but different from the Cartesian +coordinates. The indices are (row, column) instead of (column, row) for +the same reason, which can be confusing when plotting data.

+
+
+
+

Slicing data +

+

An index like [30, 20] selects a single element of an +array, but we can select whole sections as well. For example, we can +select the first ten days (columns) of values for the first four +patients (rows) like this:

+
+

PYTHON +

+
print(data[0:4, 0:10])
+
+
+

OUTPUT +

+
[[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.]
+ [ 0.  1.  2.  1.  2.  1.  3.  2.  2.  6.]
+ [ 0.  1.  1.  3.  3.  2.  6.  2.  5.  9.]
+ [ 0.  0.  2.  0.  4.  2.  2.  1.  6.  7.]]
+
+

The slice 0:4 means, +“Start at index 0 and go up to, but not including, index 4”. Again, the +up-to-but-not-including takes a bit of getting used to, but the rule is +that the difference between the upper and lower bounds is the number of +values in the slice.

+

We don’t have to start slices at 0:

+
+

PYTHON +

+
print(data[5:10, 0:10])
+
+
+

OUTPUT +

+
[[ 0.  0.  1.  2.  2.  4.  2.  1.  6.  4.]
+ [ 0.  0.  2.  2.  4.  2.  2.  5.  5.  8.]
+ [ 0.  0.  1.  2.  3.  1.  2.  3.  5.  3.]
+ [ 0.  0.  0.  3.  1.  5.  6.  5.  5.  8.]
+ [ 0.  1.  1.  2.  1.  3.  5.  3.  5.  8.]]
+
+

We also don’t have to include the upper and lower bound on the slice. +If we don’t include the lower bound, Python uses 0 by default; if we +don’t include the upper, the slice runs to the end of the axis, and if +we don’t include either (i.e., if we use ‘:’ on its own), the slice +includes everything:

+
+

PYTHON +

+
small = data[:3, 36:]
+print('small is:')
+print(small)
+
+

The above example selects rows 0 through 2 and columns 36 through to +the end of the array.

+
+

OUTPUT +

+
small is:
+[[ 2.  3.  0.  0.]
+ [ 1.  1.  0.  1.]
+ [ 2.  2.  1.  1.]]
+
+
+
+ + +
+
+ + + diff --git a/instructor/04-lists.html b/instructor/04-lists.html new file mode 100644 index 0000000..5ba60d6 --- /dev/null +++ b/instructor/04-lists.html @@ -0,0 +1,1107 @@ + +Python for Official Statistics: List and Dictionary Methods +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

List and Dictionary Methods

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 40 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I store many values together?
  • +
  • How can I create a list succinctly?
  • +
  • How can I efficiently access nested data?
  • +
+
+
+
+
+
+

Objectives

+
  • Identify and create lists and dictionaries
  • +
  • Understand the properties and behaviours of lists and +dictionaries
  • +
  • Access values in lists and dictionaries
  • +
  • Create and access values from nest lists and dictionaries
  • +
+
+
+
+
+

Values can also be stored in other Python data types such as lists, +dictionaries, sets and tuples. Storing objects in a list is a fast and +versatile way to apply transformations across a sequence of values. +Storing objects in dictionary as key-value pairs is useful for +extracting specific values i.e. performing lookup operations.

+

Create and access lists +

+

Lists have the following properties and behaviours:

+
  • A single list can store different primitive object types and even +other lists
  • +
  • Lists are ordered and have a 0-based index
  • +
  • Lists can be appended to using the methods append() or +insert() +
  • +
  • Values inside a list can be removed using the methods +remove() or pop() +
  • +
  • Two lists can be concatenated with the operator + +
  • +
  • Values inside a list can be conditionally iterated through
  • +
  • A list is mutable i.e. the values inside a list can be modified in +place
  • +

To create a list, values are contained within square brackets +i.e. [] and individually separated by commas. The function +list() can also be used to create a list of values from an +iterable object like a string, set or tuple.

+
+

PYTHON +

+
# Create a list of integers using []
+list_1 = [1, 3, 5, 7]
+print(list_1)
+
+
+

OUTPUT +

+
[1, 3, 5, 7]
+
+
+

PYTHON +

+
# Unlike atomic vectors in R, a list can contain multiple primitive object types
+list_2 = [1, "one", 1.0, True]
+print(list_2)
+
+
+

OUTPUT +

+
[1, 'one', 1.0, True]
+
+
+

PYTHON +

+
# You can also use list() on an iterable object to convert it into a list
+string = 'abcdefg'  
+list_3 = list(string)  
+print(list_3)
+
+
+

OUTPUT +

+
['a', 'b', 'c', 'd', 'e', 'f', 'g']
+
+

Because lists have a 0-based index, we can access individual values +by their list index position. For 0-based indexes, the first value +always starts at position 0 i.e. the first element has an index of 0. +Accessing multiple values by their index positions is also referred to +as slicing or subsetting a list.

+

Note that we can use negative numbers as indices in Python. When we +do so, the index -1 gives us the last element in the list, +-2 gives us the second to last element in the list, and so +on.

+
+

PYTHON +

+
# Extract individual values from list_3
+print('first value:', list_3[0])
+print('second value:', list_3[1])
+print('last value:', list_3[-1])
+
+
+

OUTPUT +

+
first value: a
+second value: b
+last value: g
+
+
+

PYTHON +

+
# A syntax quirk for slicing values is to +1 to the last value's index 
+# To extract from index 0 to 2, we need to slice from [0:2+1] or [0:3]
+
+# Extract the first three values from list_3
+print('first 3 values:', list_3[0:3])
+
+# Start from index 0 and extract values from each subsequent second position
+print('every second value:', list_3[0::2])
+
+# Start from index 1, end at index 3 and extract from each subsequent second position
+print('every second value from index 1 to 3:', list_3[1:4:2])
+
+
+

OUTPUT +

+
first 3 values: ['a', 'b', 'c']
+every second value: ['a', 'c', 'e', 'g']
+every second value from index 1 to 3: ['b', 'd']
+
+

Change list values +

+

Data which can be modified in place is called mutable, while data +which cannot be modified is called immutable. Strings and numbers are +immutable in that when we want to change the value of a string or number +variable, we can only replace the old value with a completely new +value.

+
+

PYTHON +

+
string = 'abcde'
+string[0] = 'b' # Produces a type error as strings are immutable
+
+# TypeError: 'str' object does not support item assignment
+
+

In contrast, lists are mutable and we can modify them after they have +been created. We can change individual values, append new values, or +reorder the whole list through sorting.

+
+

PYTHON +

+
list_4 = ['apple', 'pear', 'plum']
+print('original list_4:', list_4)
+
+# Change the first value i.e. modify the list in place
+list_4[0] = 'banana'
+print('modified list_4:', list_4)
+
+# Add new value to list using the method .insert(index number, value)
+list_4.insert(1, 'apple') # Index 1 refers to the second position
+print('appended list_4:', list_4)
+
+
+

OUTPUT +

+
original list_4: ['apple', 'pear', 'plum']
+modified list_4: ['banana', 'pear', 'plum']
+appended list_4: ['banana', 'apple', 'pear', 'plum']
+
+
+

PYTHON +

+
# Sorting a list also modifies it in place
+list_5 = [2, 1, 3, 7]
+list_5.sort()
+print('list_5:', list_5)
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+
+

However, be careful when modifying data in-place. If two variables +refer to the same list, and you modify the list value, it will change +for both variables!

+
+

PYTHON +

+
# When we assign list_6 to list_5, it means both list_6 and list_5 point to the
+# same list object, not that list_6 is a copy of list_5.  
+
+list_6 = list_5  
+print('list_5:', list_5)
+print('list_6:', list_6)
+
+# Change the first value in list_6 from 1 to 2 
+list_6[0] = 2 
+
+print('modified list_6:', list_6)
+print('unmodified list_5:', list_5)
+
+# Warning: list_5 and list_6 have both been modified in place!
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+list_6: [1, 2, 3, 7]
+modified list_6: [2, 2, 3, 7]
+unmodified list_5: [2, 2, 3, 7]
+
+

Because of this behaviour, code which modifies data in place should +be handled with care. You can also avoid this behaviour by expliciting +creating a copy of the original list and modifying only the object copy. +This is why creating a copy of the original data object can be useful in +Python.

+
+

PYTHON +

+
list_5 = [1, 2, 3, 7]
+list_7 = list_5.copy()  
+print('list_5:', list_5)
+print('list_7:', list_7)
+
+# As list_7 is a completely new object copied from list_5, modifying list_7 does
+# not affect list_5.  
+
+list_7[0] = 2 
+print('modified list_7:', list_7)
+print('unmodified list_5:', list_5)
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+list_7: [1, 2, 3, 7]
+modified list_7: [2, 2, 3, 7]
+unmodified list_5: [1, 2, 3, 7]
+
+

Useful list functions +

+

There are a lot of functions and methods which can be applied to +lists, such as len(), max(), +index() and so forth. Mathematical operations do not work +on lists of integers, with the exception of +.

+

Note that + concatenates two lists into a single longer +list, rather than outputting the sum of two lists of numbers.

+
+

PYTHON +

+
list_8 = [1, 2, 3]
+list_9 = [4, 5, 6]
+
+list_8 + list_9 # This concatenates the lists and does not sum the two lists together
+
+
+

OUTPUT +

+
[1, 2, 3, 4, 5, 6]
+
+

In your spare time after this workshop, you can search for different +list functions and methods and test them out yourselves.

+

Nested lists +

+

We have previously mentioned that lists can be used to store other +Python object types, including lists. This means that we can create +nested lists in Python i.e. lists containing lists containing values. +This property is useful when we have a collection of values that we want +to access or transform as a subgroup.

+

To create a nested list, we also use [] or +list() to contain one or more lists of values of +interest.

+
+

PYTHON +

+
veg_stock = [
+    ['lettuce', 'lettuce', 'tomato', 'zucchini'],
+    ['lettuce', 'lettuce', 'carrot', 'zucchini'],
+    ['lettuce', 'basil', 'tomato', 'zucchini']
+    ]
+
+# Check that veg_stock is a list object
+print(type(veg_stock))
+
+# Check that the first value in veg_stock is itself a list
+print(veg_stock[0], 'has type', type(veg_stock[0]))  
+
+
+

OUTPUT +

+
<class 'list'>
+['lettuce', 'lettuce', 'tomato', 'zucchini'] has type <class 'list'>
+
+

To extract the first sub-list within the veg_stock list +object, we refer to its index like we would with any other value inside +a list i.e. veg_stock[1] points to the second sub-list +within the veg_stock list.

+

To access an individual string value inside a sub-list, we make use +of a second index, which points to an individual value inside the +sub-list.

+
+

PYTHON +

+
print(veg_stock[0]) # Access the first sub-list 
+print(veg_stock[0][0]) # Access the first value in the first sub-list 
+
+print(type(veg_stock[0])) # The first value in veg_stock is a list
+print(type(veg_stock[0][0])) # The first value in the first list in veg_stock is a string
+
+
+

OUTPUT +

+
['lettuce', 'lettuce', 'tomato', 'zucchini']
+lettuce
+<class 'list'>
+<class 'str'>
+
+

In general, however, when we are analysing a large collection of +values, the best practice is to structure those values in columns and +rows as a tabular Pandas data frame object. This is covered in another +Carpentries Course called Python +for Social Sciences.

+

Lists are still incredibly versatile and useful when you have a +collection of values that need to be efficiently accessed or +transformed. For example, data frame column names are commonly extracted +and stored inside a list, so that the same transformation can then be +mapped across multiple columns.

+

Create and access dictionaries +

+

A dictionary is a Python data type that is particularly suited for +enabling quick lookup operations on unstructured data sets.

+

A dictionary can therefore be thought of as an unordered list where +every item or value is associated with a unique key (i.e. a self-defined +index of unique strings or numbers). The index values are called keys +and a dictionary contains key-value pairs with the format +{key: value(s)}.

+

Dictionaries can be created by listing individual key-values pairs +inside {} or using dict().

+
+

PYTHON +

+
# A key-value pair can contain single or multiple values  
+# Keys are treated as case sensitive and unique
+# Multiple values are first stored inside a list  
+
+teams = {
+    'data science': ['Mei Ling', 'Paul', 'Gwen', 'Suresh'],
+    'user design': ['Amy', 'Linh', 'Sasha'],
+    'software dev': ['David', 'Prya'],
+    'comms': 'Taylor' 
+    } 
+
+

When using dict(), we need to indicate which key is +associated with which value. This can be done directly using tuples, +direct association i.e. using = or using +zip(), which creates a set of tuples from an iterable +list.

+
+

PYTHON +

+
# To use dict(), key-value pairs are can be stored inside tuples  
+ds_emp_status = dict([
+        ('Mei Ling', 'full time'),
+        ('Paul', 'full time'),
+        ('Gwen', 'part time'),
+        ('Suresh', 'part time')
+    ])  
+
+# Key-value pairs can also be assigned by direct association  
+# Keys cannot be strings i.e. wrapped in '' using this approach
+ud_emp_status = dict(
+    Amy = 'full time',
+    Linh = 'full time',
+    Sasha = 'casual' 
+    ) 
+
+# zip() can also be used if each key has only one value  
+sd_emp_status = dict(zip(
+    ['David', 'Prya'],
+    ['full time', 'full time']
+    ))
+
+

To access a specific value inside a dictionary, we need to specify +its key using []. This is similar to slicing or subsetting +a list by specifying its index using [].

+
+

PYTHON +

+
# Access the values associated with the key 'data science'
+print(teams['data science'])
+
+print('The object teams is of type', type(teams))
+print('The dict value', teams['data science'], 'is of type', type(teams['data science']))
+
+
+

OUTPUT +

+
['Mei Ling', 'Paul', 'Gwen', 'Suresh']
+The data object teams is of type <class 'dict'>
+The value ['Mei Ling', 'Paul', 'Gwen', 'Suresh'] is of type <class 'list'>
+
+

We can also access a value from a dictionary using the +get() method.

+
+

PYTHON +

+
print(teams.get('user design'))
+
+# get() also enables us to return an alternate string when the key is not found   
+# This prevents our code from returning an error message that halts the analysis
+
+print(teams.get('data engineering', 'WARNING: key does not exist'))
+
+
+

OUTPUT +

+
['Amy', 'Linh', 'Sasha']
+WARNING: key does not exist
+
+

To access data inside a dictionary, we can also perform the following +other actions:

+
  • Check whether a key exists in a dictionary using the keyword +in +
  • +
  • Retrieve unique dictionary keys using dict.keys() +
  • +
  • Retrieve dictionary values using dict.values() +
  • +
  • Retrieve dictionary items using dict.items() +
  • +
+

PYTHON +

+
# Check whether a key exists in a dictionary 
+print('data science' in teams) 
+print('Data Science' in teams) # Keys are case sensitive  
+
+# Retrieve all dictionary keys  
+print(teams.keys())
+print(sd_emp_status.keys())
+
+# Retrieve all dictionary values  
+print(sd_emp_status.values())  
+
+# Retrieve all dictionary key-value pairs
+print(sd_emp_status.items())
+
+
+

OUTPUT +

+
True
+False
+dict_keys(['data science', 'user design', 'software dev', 'comms'])
+dict_keys(['David', 'Prya'])
+dict_values(['full time', 'full time'])
+dict_items([('David', 'full time'), ('Prya', 'full time')])
+
+

To add a new key-value pair to an existing dictionary, we can create +a new key and directly attach a new value to it using = or +alternatively use the method update().

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())  
+
+# Add new key-value pair using direct assignment  
+sd_emp_status['Mohammad'] = 'full time'
+
+# Add new key-value pair using update({'key': 'value'})   
+sd_emp_status.update({'Carrie': 'part time'})
+
+print('updated dict items:', sd_emp_status.items())    
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])
+updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'part time')])
+
+

Because keys are unique, a dictionary cannot contain two keys with +the same name. This means that adding an item using a key that is +already present in the dictionary will cause the previous value to be +overwritten.

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())  
+
+# As the key 'Carrie' already exists, its value will be overwritten
+sd_emp_status['Carrie'] = 'full time'
+print('updated dict items:', sd_emp_status.items())  
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'part time')])
+updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'full time')])
+
+

To remove a key-value pair for an existing dictionary, we can use the +del keyword or the method pop(). Using +pop() also enables us to return an alternate string if we +trt to remove a non-existing key, which prevents our code from returning +an error message that halts the analysis.

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())
+
+# Delete dictionary keys using del and pop()
+del sd_emp_status['Mohammad']
+sd_emp_status.pop('Carrie')
+sd_emp_status.pop('Anuradha', 'WARNING: key does not exist') # Does not generate an error
+
+print('modified dict items:', sd_emp_status.items())  
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'full time')])
+modified dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])
+
+

Nested dictionaries +

+

Similar to lists, dictionaries can be nested as we can also store +dictionaries as values inside a key-value pair using {}. +Nested dictionaries are useful when we need to store unstructured data +in a complex structure. For example, JSON data is commonly used for +transmitting data in web applications and often exists in a nested +structure that can be stored using nested dictionaries in Python.

+
+

PYTHON +

+
# Individual dictionaries are enclosed in {} and separated by a comma
+nested_dict = {
+    'dict_1': { # First key is a dictionary of key-value pairs 
+        'key_1a': 'value_1a',
+        'key_1b': 'value_1b'
+                },
+    'dict_2': { # Second key is another dictionary of key-value pairs
+        'key_2a': 'value_2a',
+        'key_2b': 'value_2b'
+                }
+            }
+
+print(nested_dict)
+
+
+

OUTPUT +

+
{'dict_1': {'key_1a': 'value_1a', 'key_1b': 'value_1b'},
+ 'dict_2': {'key_2a': 'value_2a', 'key_2b': 'value_2b'}}
+
+

Similar to working with nested lists, to extract a value from the +first sub-dictionary, we specify both the main dictionary and +sub-dictionary keys using [].

+
+

PYTHON +

+
# Extract the value for key 2a in dict_2
+print('original value:', nested_dict['dict_2']['key_2a'])
+
+# Adding or updating a value can be done through the same approach
+nested_dict['dict_2']['key_2a'] = "modified_value_2a"  
+
+print('modified value:', nested_dict['dict_2']['key_2a'])
+
+
+

OUTPUT +

+
original value: value_2a
+modified value: modified_value_2a
+
+

Optional: converting lists and dictionaries to Pandas data +frames +

+

Lists and dictionaries can be easily converted into a tabular Pandas +data frame format. This can be useful when you need to create a small +data set for unit testing purposes.

+
+

PYTHON +

+
# Import pandas library
+import pandas as pd
+
+# Create a dictionary with each key-value pair representing a data frame column
+data = {
+    'col_1': [3, 2, 1, 0],
+    'col_2': ['a', 'b', 'c', 'd']
+    }
+
+df = pd.DataFrame.from_dict(data) 
+
+print(df) # Outputs data as a tabular Pandas data frame   
+print(type(df))
+
+
+

OUTPUT +

+
   col_1 col_2
+0      3     a
+1      2     b
+2      1     c
+3      0     d
+<class 'pandas.core.frame.DataFrame'>
+
+
+
+ +
+
+

Key Points +

+
+
  • Lists can contain any Python object including other lists
  • +
  • Lists are ordered i.e. indexed and can therefore be sliced by index +number
  • +
  • Unlike strings and integers, the values inside a list can be +modified in place
  • +
  • A list which contains other lists is referred to as a nested +list
  • +
  • Dictionaries behave like unordered lists and are defined using +key-value pairs
  • +
  • Dictionary keys are unique
  • +
  • A dictionary which contains other dictionaries is referred to as a +nested dictionary
  • +
  • Values inside nested lists and dictionaries can be accessed by an +additional index
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/05-loops.html b/instructor/05-loops.html new file mode 100644 index 0000000..cfd400f --- /dev/null +++ b/instructor/05-loops.html @@ -0,0 +1,1593 @@ + +Python for Official Statistics: Loops and Conditional Logic +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Loops and Conditional Logic

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 60 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I do the same operations on many different values?
  • +
  • How can my programs do different things based on data values?
  • +
+
+
+
+
+
+

Objectives

+
  • identify and create loops
  • +
  • use logical statements to allow for decision-based operations in +code
  • +
+
+
+
+
+

This episode contains two lessons:

+
  1. Repeating Actions with +Loops
  2. +
  3. Making Choices with +Conditional Logic
  4. +

Repeating Actions with Loops +

+

In the episode about visualizing +data, we will see Python code that plots values of interest from our +first inflammation dataset (inflammation-01.csv), which +revealed some suspicious features.

+
Line graphs showing average, maximum, and minimum inflammation across all patients over a 40-day period.

We have a dozen data sets right now and potentially more on the way +if Dr. Maverick can keep up their surprisingly fast clinical trial rate. +We want to create plots for all of our data sets with a single +statement. To do that, we’ll have to teach the computer how to repeat +things.

+

An example task that we might want to repeat is accessing numbers in +a list, which we will do by printing each number on a line of its +own.

+
+

PYTHON +

+
odds = [1, 3, 5, 7]
+
+

In Python, a list is basically an ordered +collection of elements, and every element has a unique number associated +with it — its index. This means that we can access elements in a list +using their indices. For example, we can get the first number in the +list odds, by using odds[0]. One way to print +each number is to use four print statements:

+
+

PYTHON +

+
print(odds[0])
+print(odds[1])
+print(odds[2])
+print(odds[3])
+
+
+

OUTPUT +

+
1
+3
+5
+7
+
+

This is a bad approach for three reasons:

+
  1. Not scalable. Imagine you need to print a list +that has hundreds of elements. It might be easier to type them in +manually.

  2. +
  3. Difficult to maintain. If we want to decorate +each printed element with an asterisk or any other character, we would +have to change four lines of code. While this might not be a problem for +small lists, it would definitely be a problem for longer ones.

  4. +
  5. Fragile. If we use it with a list that has more +elements than what we initially envisioned, it will only display part of +the list’s elements. A shorter list, on the other hand, will cause an +error because it will be trying to display elements of the list that do +not exist.

  6. +
+

PYTHON +

+
odds = [1, 3, 5]
+print(odds[0])
+print(odds[1])
+print(odds[2])
+print(odds[3])
+
+
+

PYTHON +

+
1
+3
+5
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-3-7974b6cdaf14> in <module>()
+      3 print(odds[1])
+      4 print(odds[2])
+----> 5 print(odds[3])
+
+IndexError: list index out of range
+
+

Here’s a better approach: a for +loop

+
+

PYTHON +

+
odds = [1, 3, 5, 7]
+for num in odds:
+    print(num)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+
+

This is shorter — certainly shorter than something that prints every +number in a hundred-number list — and more robust as well:

+
+

PYTHON +

+
odds = [1, 3, 5, 7, 9, 11]
+for num in odds:
+    print(num)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+9
+11
+
+

The improved version uses a for +loop to repeat an operation — in this case, printing — once for each +thing in a sequence. The general form of a loop is:

+
+

PYTHON +

+
for variable in collection:
+    # do things using variable, such as print
+
+

Using the odds example above, the loop might look like this:

+
Loop variable 'num' being assigned the value of each element in the list odds in turn andthen being printed

where each number (num) in the variable +odds is looped through and printed one number after +another. The other numbers in the diagram denote which loop cycle the +number was printed in (1 being the first loop cycle, and 6 being the +final loop cycle).

+

We can call the loop +variable anything we like, but there must be a colon at the end of +the line starting the loop, and we must indent anything we want to run +inside the loop. Unlike many other languages, there is no command to +signify the end of the loop body (e.g., end for); +everything indented after the for statement belongs to the +loop.

+
+
+ +
+
+

What’s in a name? +

+
+

In the example above, the loop variable was given the name +num as a mnemonic; it is short for ‘number’. We can choose +any name we want for variables. We might just as easily have chosen the +name banana for the loop variable, as long as we use the +same name when we invoke the variable inside the loop:

+
+

PYTHON +

+
odds = [1, 3, 5, 7, 9, 11]
+for banana in odds:
+   print(banana)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+9
+11
+
+

It is a good idea to choose variable names that are meaningful, +otherwise it would be more difficult to understand what the loop is +doing.

+
+
+
+

Here’s another loop that repeatedly updates a variable:

+
+

PYTHON +

+
length = 0
+names = ['Curie', 'Darwin', 'Turing']
+for value in names:
+    length = length + 1
+print('There are', length, 'names in the list.')
+
+
+

OUTPUT +

+
There are 3 names in the list.
+
+

It’s worth tracing the execution of this little program step by step. +Since there are three names in names, the statement on line +4 will be executed three times. The first time around, +length is zero (the value assigned to it on line 1) and +value is Curie. The statement adds 1 to the +old value of length, producing 1, and updates +length to refer to that new value. The next time around, +value is Darwin and length is 1, +so length is updated to be 2. After one more update, +length is 3; since there is nothing left in +names for Python to process, the loop finishes and the +print function on line 5 tells us our final answer.

+

Note that a loop variable +is a variable that is being used to record progress in a loop. It still +exists after the loop is over, and we can re-use variables previously +defined as loop variables as +well:

+
+

PYTHON +

+
name = 'Rosalind'
+for name in ['Curie', 'Darwin', 'Turing']:
+    print(name)
+print('after the loop, name is', name)
+
+
+

OUTPUT +

+
Curie
+Darwin
+Turing
+after the loop, name is Turing
+
+

Note also that finding the length of an object is such a common +operation that Python actually has a built-in function to do it called +len:

+
+

PYTHON +

+
print(len([0, 1, 2, 3]))
+
+
+

OUTPUT +

+
4
+
+

len is much faster than any function we could write +ourselves, and much easier to read than a two-line loop; it will also +give us the length of many other data types we haven’t seen yet, so we +should always use it when we can.

+
+
+ +
+
+

From 1 to N +

+
+

Python has a built-in function called range that +generates a sequence of numbers range can accept 1, 2, or 3 +parameters.

+
  • If one parameter is given, range generates a sequence +of that length, starting at zero and incrementing by 1. For example, +range(3) produces the numbers 0, 1, 2.
  • +
  • If two parameters are given, range starts at the first +and ends just before the second, incrementing by one. For example, +range(2, 5) produces 2, 3, 4.
  • +
  • If range is given 3 parameters, it starts at the first +one, ends just before the second one, and increments by the third one. +For example, range(3, 10, 2) produces +3, 5, 7, 9.
  • +

Using range, write a loop that uses range +to print the first 3 natural numbers:

+
+

OUTPUT +

+
1
+2
+3
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
for number in range(1, 4):
+   print(number)
+
+
+
+
+
+
+
+ +
+
+

Understanding the loops +

+
+

Given the following loop:

+
+

PYTHON +

+
word = 'oxygen'
+for letter in word:
+    print(letter)
+
+

How many times is the body of the loop executed?

+
  • 3 times
  • +
  • 4 times
  • +
  • 5 times
  • +
  • 6 times
  • +
+
+
+
+
+ +
+
+

The body of the loop is executed 6 times.

+
+
+
+
+
+
+ +
+
+

Computing Powers With Loops +

+
+

Exponentiation is built into Python:

+
+

PYTHON +

+
print(5 ** 3)
+
+
+

OUTPUT +

+
125
+
+

Write a loop that calculates the same result as 5 ** 3 +using multiplication (and without exponentiation).

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
result = 1
+for number in range(0, 3):
+    result = result * 5
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Summing a List +

+
+

Write a loop that calculates the sum of elements in a list by adding +each element and printing the final value, so +[124, 402, 36] prints 562

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
numbers = [124, 402, 36]
+summed = 0
+for num in numbers:
+    summed = summed + num
+print(summed)
+
+
+
+
+
+
+
+ +
+
+

Computing the Value of a Polynomial +

+
+

The built-in function enumerate takes a sequence (e.g., +a list) and generates a new sequence of the +same length. Each element of the new sequence is a pair composed of the +index (0, 1, 2,…) and the value from the original sequence:

+
+

PYTHON +

+
for idx, val in enumerate(a_list):
+    # Do something using idx and val
+
+

The code above loops through a_list, assigning the index +to idx and the value to val.

+

Suppose you have encoded a polynomial as a list of coefficients in +the following way: the first element is the constant term, the second +element is the coefficient of the linear term, the third is the +coefficient of the quadratic term, etc.

+
+

PYTHON +

+
x = 5
+coefs = [2, 4, 3]
+y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2
+print(y)
+
+
+

OUTPUT +

+
97
+
+

Write a loop using enumerate(coefs) which computes the +value y of any polynomial, given x and +coefs.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
y = 0
+for idx, coef in enumerate(coefs):
+    y = y + coef * x**idx
+
+
+
+
+
+

Making Choices with Conditional Logic +

+

How can we use Python to automatically recognize different situations +we encounter with our data and take a different action for each? In this +lesson, we’ll learn how to write code that runs only when certain +conditions are true.

+
+

Conditionals

+

We can ask Python to take different actions, depending on a +condition, with an if statement:

+
+

PYTHON +

+
num = 37
+if num > 100:
+    print('greater')
+else:
+    print('not greater')
+print('done')
+
+
+

OUTPUT +

+
not greater
+done
+
+

The second line of this code uses the keyword if to tell +Python that we want to make a choice. If the test that follows the +if statement is true, the body of the if +(i.e., the set of lines indented underneath it) is executed, and +“greater” is printed. If the test is false, the body of the +else is executed instead, and “not greater” is printed. +Only one or the other is ever executed before continuing on with program +execution to print “done”:

+
A flowchart diagram of the if-else construct that tests if variable num is greater than 100

Conditional +statements don’t have to include an else. If there +isn’t one, Python simply does nothing if the test is false:

+
+

PYTHON +

+
num = 53
+print('before conditional...')
+if num > 100:
+    print(num, 'is greater than 100')
+print('...after conditional')
+
+
+

OUTPUT +

+
before conditional...
+...after conditional
+
+

We can also chain several tests together using elif, +which is short for “else if”. The following Python code uses +elif to print the sign of a number.

+
+

PYTHON +

+
num = -3
+
+if num > 0:
+    print(num, 'is positive')
+elif num == 0:
+    print(num, 'is zero')
+else:
+    print(num, 'is negative')
+
+
+

OUTPUT +

+
-3 is negative
+
+

Note that to test for equality we use a double equals sign +== rather than a single equals sign = which is +used to assign values.

+
+
+ +
+
+

Comparing in Python +

+
+

Along with the > and == operators we +have already used for comparing values in our conditionals, there are a +few more options to know about:

+
  • +>: greater than
  • +
  • +<: less than
  • +
  • +==: equal to
  • +
  • +!=: does not equal
  • +
  • +>=: greater than or equal to
  • +
  • +<=: less than or equal to
  • +
+
+
+

We can also combine tests using and and or. +and is only true if both parts are true:

+
+

PYTHON +

+
if (1 > 0) and (-1 >= 0):
+    print('both parts are true')
+else:
+    print('at least one part is false')
+
+
+

OUTPUT +

+
at least one part is false
+
+

while or is true if at least one part is true:

+
+

PYTHON +

+
if (1 < 0) or (1 >= 0):
+    print('at least one test is true')
+
+
+

OUTPUT +

+
at least one test is true
+
+
+
+ +
+
+

+True and False +

+
+

True and False are special words in Python +called booleans, which represent truth values. A statement +such as 1 < 0 returns the value False, +while -1 < 0 returns the value True.

+
+
+
+
+
+

Checking Our Data

+

Now that we’ve seen how conditionals work, we can use them to check +for the suspicious features we saw in our inflammation data. We are +about to use functions provided by the numpy module again. +Therefore, if you’re working in a new Python session, make sure to load +the module with:

+
+

PYTHON +

+
import numpy
+
+

From the first couple of plots, we saw that maximum daily +inflammation exhibits a strange behavior and raises one unit a day. +Wouldn’t it be a good idea to detect such behavior and report it as +suspicious? Let’s do that! However, instead of checking every single day +of the study, let’s merely check if maximum inflammation in the +beginning (day 0) and in the middle (day 20) of the study are equal to +the corresponding day numbers.

+
+

PYTHON +

+
max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+
+

We also saw a different problem in the third dataset; the minima per +day were all zero (looks like a healthy person snuck into our study). We +can also check for this with an elif condition:

+
+

PYTHON +

+
elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+
+

And if neither of these conditions are true, we can use +else to give the all-clear:

+
+

PYTHON +

+
else:
+    print('Seems OK!')
+
+

Let’s test that out:

+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+
+
+

OUTPUT +

+
Suspicious looking maxima!
+
+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-03.csv', delimiter=',')
+
+max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+
+
+

OUTPUT +

+
Minima add up to zero!
+
+

In this way, we have asked Python to do something different depending +on the condition of our data. Here we printed messages in all cases, but +we could also imagine not using the else catch-all so that +messages are only printed when something is wrong, freeing us from +having to manually examine every plot for features we’ve seen +before.

+
+
+ +
+
+

How Many Paths? +

+
+

Consider this code:

+
+

PYTHON +

+
if 4 > 5:
+    print('A')
+elif 4 == 5:
+    print('B')
+elif 4 < 5:
+    print('C')
+
+

Which of the following would be printed if you were to run this code? +Why did you pick this answer?

+
  1. A
  2. +
  3. B
  4. +
  5. C
  6. +
  7. B and C
  8. +
+
+
+
+
+ +
+
+

C gets printed because the first two conditions, +4 > 5 and 4 == 5, are not true, but +4 < 5 is true. In this case, only one of these +conditions can be true for at a time, but in other scenarios multiple +elif conditions could be met. In these scenarios, only the +action associated with the first true elif condition will +occur, starting from the top of the conditional section.

+
A flowchart diagram of a conditional section with multiple elif conditions and some > possible outcomes.

This contrasts with the case of multiple if statements, +where every action can occur as long as their condition is met.

+
A flowchart diagram of a conditional section with multiple if statements and some possible outcomes.
+
+
+
+
+
+ +
+
+

What Is Truth? +

+
+

True and False booleans are not the only +values in Python that are true and false. In fact, any value +can be used in an if or elif. After reading +and running the code below, explain what the rule is for which values +are considered true and which are > considered false.

+
+

PYTHON +

+
if '':
+    print('empty string is true')
+if 'word':
+    print('word is true')
+if []:
+    print('empty list is true')
+if [1, 2, 3]:
+    print('non-empty list is true')
+if 0:
+    print('zero is true')
+if 1:
+    print('one is true')
+
+
+
+
+
+
+ +
+
+

That’s Not Not What I Meant +

+
+

Sometimes it is useful to check whether some condition is +not true. The Boolean operator not can do this +explicitly. After reading and running the code below, write some +if statements that use not to test the rule +that you formulated in the previous challenge.

+
+

PYTHON +

+
if not '':
+    print('empty string is not true')
+if not 'word':
+    print('word is not true')
+if not not True:
+    print('not not True is true')
+
+
+
+
+
+
+ +
+
+

Close Enough +

+
+

Write some conditions that print True if the variable +a is within 10% of the variable b and +False otherwise. Compare your implementation with your +partner’s. Do you get the same answer for all possible pairs of +numbers?

+
+
+
+
+
+ +
+
+

There is a built-in +function abs that returns the absolute value of a +number:

+
+

PYTHON +

+
print(abs(-12))
+
+
+

OUTPUT +

+
12
+
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
a = 5
+b = 5.1
+
+if abs(a - b) <= 0.1 * abs(b):
+    print('True')
+else:
+    print('False')
+
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(abs(a - b) <= 0.1 * abs(b))
+
+

This works because the Booleans True and +False have string representations which can be printed.

+
+
+
+
+
+
+ +
+
+

In-Place Operators +

+
+

Python (and most other languages in the C family) provides in-place operators that +work like this:

+
+

PYTHON +

+
x = 1  # original value
+x += 1 # add one to x, assigning result back to x
+x *= 3 # multiply x by 3
+print(x)
+
+
+

OUTPUT +

+
6
+
+

Write some code that sums the positive and negative numbers in a list +separately, using in-place operators. Do you think the result is more or +less readable than writing the same without in-place operators?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
positive_sum = 0
+negative_sum = 0
+test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
+for num in test_list:
+    if num > 0:
+        positive_sum += num
+    elif num == 0:
+        pass
+    else:
+        negative_sum += num
+print(positive_sum, negative_sum)
+
+

Here pass means “don’t do anything”. In this particular +case, it’s not actually needed, since if num == 0 neither +sum needs to change, but it illustrates the use of elif and +pass.

+
+
+
+
+
+
+ +
+
+

Sorting a List Into Buckets +

+
+

In our data folder, large data sets are stored in files +whose names start with “inflammation-” and small data sets – in files +whose names start with “small-”. We also have some other files that we +do not care about at this point. We’d like to break all these files into +three lists called large_files, small_files, +and other_files, respectively.

+

Add code to the template below to do this. Note that the string +method startswith +returns True if and only if the string it is called on +starts with the string passed as an argument, that is:

+
+

PYTHON +

+
'String'.startswith('Str')
+
+
+

OUTPUT +

+
True
+
+

But

+
+

PYTHON +

+
'String'.startswith('str')
+
+
+

OUTPUT +

+
False
+
+

Use the following Python code as your starting point:

+
+

PYTHON +

+
filenames = ['inflammation-01.csv',
+         'myscript.py',
+         'inflammation-02.csv',
+         'small-01.csv',
+         'small-02.csv']
+large_files = []
+small_files = []
+other_files = []
+
+

Your solution should:

+
  1. loop over the names of the files
  2. +
  3. figure out which group each filename belongs in
  4. +
  5. append the filename to that list
  6. +

In the end the three lists should be:

+
+

PYTHON +

+
large_files = ['inflammation-01.csv', 'inflammation-02.csv']
+small_files = ['small-01.csv', 'small-02.csv']
+other_files = ['myscript.py']
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
for filename in filenames:
+    if filename.startswith('inflammation-'):
+        large_files.append(filename)
+    elif filename.startswith('small-'):
+        small_files.append(filename)
+    else:
+        other_files.append(filename)
+
+print('large_files:', large_files)
+print('small_files:', small_files)
+print('other_files:', other_files)
+
+
+
+
+
+
+
+ +
+
+
  1. Write a loop that counts the number of vowels in a character +string.
  2. +
  3. Test it on a few individual words and full sentences.
  4. +
  5. Once you are done, compare your solution to your neighbor’s. Did you +make the same decisions about how to handle the letter ‘y’ (which some +people think is a vowel, and some do not)?
  6. +
+

Solution

+
vowels = 'aeiouAEIOU'
+sentence = 'Mary had a little lamb.'
+count = 0
+for char in sentence:
+   if char in vowels:
+       count += 1
+
+print('The number of vowels in this string is ' + str(count))
+

{.challenge}

+
+
+
+
+
+
+
+ +
+
+

Key Points +

+
+
  • Use for variable in sequence to process the elements of +a sequence one at a time.
  • +
  • The body of a for loop must be indented.
  • +
  • Use len(thing) to determine the length of something +that contains other values.
  • +
  • Use if condition to start a conditional statement, +elif condition to provide additional tests, and +else to provide a default.
  • +
  • The bodies of the branches of conditional statements must be +indented.
  • +
  • Use == to test for equality.
  • +
  • +X and Y is only true if both X and +Y are true.
  • +
  • +X or Y is true if either X or +Y, or both, are true.
  • +
  • Zero, the empty string, and the empty list are considered false; all +other numbers, strings, and lists are considered true.
  • +
  • +True and False represent truth +values.
  • +
+
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/06-alternative_loops.html b/instructor/06-alternative_loops.html new file mode 100644 index 0000000..09acb1d --- /dev/null +++ b/instructor/06-alternative_loops.html @@ -0,0 +1,491 @@ + +Python for Official Statistics: Alternatives to Loops +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Alternatives to Loops

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 30 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I vectorize my loops?
  • +
+
+
+
+
+
+

Objectives

+
  • identify what vectorized operations are
  • +
  • perform basic vectorized operations
  • +
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
  • NULL
  • +
+
+
+ + + +
+
+ + +
+
+ + + diff --git a/instructor/07-functions.html b/instructor/07-functions.html new file mode 100644 index 0000000..41a44d3 --- /dev/null +++ b/instructor/07-functions.html @@ -0,0 +1,1506 @@ + +Python for Official Statistics: Creating Functions +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Creating Functions

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 40 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • What are functions, and how can I use them in Python?
  • +
  • How can I define new functions?
  • +
  • What’s the difference between defining and calling a function?
  • +
  • What happens when I call a function?
  • +
+
+
+
+
+
+

Objectives

+
  • identify what a function is
  • +
  • create new functions
  • +
  • Set default values for function parameters.
  • +
  • Explain why we should divide programs into small, single-purpose +functions.
  • +
+
+
+
+
+

At this point, we’ve seen that code can have Python make decisions +about what it sees in our data. What if we want to convert some of our +data, like taking a temperature in Fahrenheit and converting it to +Celsius. We could write something like this for converting a single +number

+
+

PYTHON +

+
fahrenheit_val = 99
+celsius_val = ((fahrenheit_val - 32) * (5/9))
+
+

and for a second number we could just copy the line and rename the +variables

+
+

PYTHON +

+
fahrenheit_val = 99
+celsius_val = ((fahrenheit_val - 32) * (5/9))
+
+fahrenheit_val2 = 43
+celsius_val2 = ((fahrenheit_val2 - 32) * (5/9))
+
+

But we would be in trouble as soon as we had to do this more than a +couple times. Cutting and pasting it is going to make our code get very +long and very repetitive, very quickly. We’d like a way to package our +code so that it is easier to reuse, a shorthand way of re-executing +longer pieces of code. In Python we can use ‘functions’. Let’s start by +defining a function fahr_to_celsius that converts +temperatures from Fahrenheit to Celsius:

+
+

PYTHON +

+
def explicit_fahr_to_celsius(temp):
+    # Assign the converted value to a variable
+    converted = ((temp - 32) * (5/9))
+    # Return the value of the new variable
+    return converted
+    
+def fahr_to_celsius(temp):
+    # Return converted value more efficiently using the return
+    # function without creating a new variable. This code does
+    # the same thing as the previous function but it is more explicit
+    # in explaining how the return command works.
+    return ((temp - 32) * (5/9))
+
+
Labeled parts of a Python function definition

The function definition opens with the keyword def +followed by the name of the function (fahr_to_celsius) and +a parenthesized list of parameter names (temp). The body of the function — the statements +that are executed when it runs — is indented below the definition line. +The body concludes with a return keyword followed by the +return value.

+

When we call the function, the values we pass to it are assigned to +those variables so that we can use them inside the function. Inside the +function, we use a return +statement to send a result back to whoever asked for it.

+

Let’s try running our function.

+
+

PYTHON +

+
fahr_to_celsius(32)
+
+

This command should call our function, using “32” as the input and +return the function value.

+

In fact, calling our own function is no different from calling any +other function:

+
+

PYTHON +

+
print('freezing point of water:', fahr_to_celsius(32), 'C')
+print('boiling point of water:', fahr_to_celsius(212), 'C')
+
+
+

OUTPUT +

+
freezing point of water: 0.0 C
+boiling point of water: 100.0 C
+
+

We’ve successfully called the function that we defined, and we have +access to the value that we returned.

+

Composing Functions +

+

Now that we’ve seen how to turn Fahrenheit into Celsius, we can also +write the function to turn Celsius into Kelvin:

+
+

PYTHON +

+
def celsius_to_kelvin(temp_c):
+    return temp_c + 273.15
+
+print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))
+
+
+

OUTPUT +

+
freezing point of water in Kelvin: 273.15
+
+

What about converting Fahrenheit to Kelvin? We could write out the +formula, but we don’t need to. Instead, we can compose the two functions we have +already created:

+
+

PYTHON +

+
def fahr_to_kelvin(temp_f):
+    temp_c = fahr_to_celsius(temp_f)
+    temp_k = celsius_to_kelvin(temp_c)
+    return temp_k
+
+print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))
+
+
+

OUTPUT +

+
boiling point of water in Kelvin: 373.15
+
+

This is our first taste of how larger programs are built: we define +basic operations, then combine them in ever-larger chunks to get the +effect we want. Real-life functions will usually be larger than the ones +shown here — typically half a dozen to a few dozen lines — but they +shouldn’t ever be much longer than that, or the next person who reads it +won’t be able to understand what’s going on.

+

Variable Scope +

+

In composing our temperature conversion functions, we created +variables inside of those functions, temp, +temp_c, temp_f, and temp_k. We +refer to these variables as local variables because they no +longer exist once the function is done executing. If we try to access +their values outside of the function, we will encounter an error:

+
+

PYTHON +

+
print('Again, temperature in Kelvin was:', temp_k)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-1-eed2471d229b> in <module>
+----> 1 print('Again, temperature in Kelvin was:', temp_k)
+
+NameError: name 'temp_k' is not defined
+
+

If you want to reuse the temperature in Kelvin after you have +calculated it with fahr_to_kelvin, you can store the result +of the function call in a variable:

+
+

PYTHON +

+
temp_kelvin = fahr_to_kelvin(212.0)
+print('temperature in Kelvin was:', temp_kelvin)
+
+
+

OUTPUT +

+
temperature in Kelvin was: 373.15
+
+

The variable temp_kelvin, being defined outside any +function, is said to be global.

+

Inside a function, one can read the value of such global +variables:

+
+

PYTHON +

+
def print_temperatures():
+  print('temperature in Fahrenheit was:', temp_fahr)
+  print('temperature in Kelvin was:', temp_kelvin)
+
+temp_fahr = 212.0
+temp_kelvin = fahr_to_kelvin(temp_fahr)
+
+print_temperatures()
+
+
+

OUTPUT +

+
temperature in Fahrenheit was: 212.0
+temperature in Kelvin was: 373.15
+
+

By giving our functions human-readable names, we can more easily read +and understand what is happening in the for loop. Even +better, if at some later date we want to use either of those pieces of +code again, we can do so in a single line.

+

Testing and Documenting +

+

Once we start putting things in functions so that we can re-use them, +we need to start testing that those functions are working correctly. To +see how to do this, let’s write a function to offset a dataset so that +it’s mean value shifts to a user-defined value:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    return (data - numpy.mean(data)) + target_mean_value
+
+

We could test this on our actual data, but since we don’t know what +the values ought to be, it will be hard to tell if the result was +correct. Instead, let’s use NumPy to create a matrix of 0’s and then +offset its values to have a mean value of 3:

+
+

PYTHON +

+
z = numpy.zeros((2,2))
+print(offset_mean(z, 3))
+
+
+

OUTPUT +

+
[[ 3.  3.]
+ [ 3.  3.]]
+
+

That looks right, so let’s try offset_mean on our real +data:

+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print(offset_mean(data, 0))
+
+
+

OUTPUT +

+
[[-6.14875 -6.14875 -5.14875 ... -3.14875 -6.14875 -6.14875]
+ [-6.14875 -5.14875 -4.14875 ... -5.14875 -6.14875 -5.14875]
+ [-6.14875 -5.14875 -5.14875 ... -4.14875 -5.14875 -5.14875]
+ ...
+ [-6.14875 -5.14875 -5.14875 ... -5.14875 -5.14875 -5.14875]
+ [-6.14875 -6.14875 -6.14875 ... -6.14875 -4.14875 -6.14875]
+ [-6.14875 -6.14875 -5.14875 ... -5.14875 -5.14875 -6.14875]]
+
+

It’s hard to tell from the default output whether the result is +correct, but there are a few tests that we can run to reassure us:

+
+

PYTHON +

+
print('original min, mean, and max are:', numpy.amin(data), numpy.mean(data), numpy.amax(data))
+offset_data = offset_mean(data, 0)
+print('min, mean, and max of offset data are:',
+      numpy.amin(offset_data),
+      numpy.mean(offset_data),
+      numpy.amax(offset_data))
+
+
+

OUTPUT +

+
original min, mean, and max are: 0.0 6.14875 20.0
+min, mean, and and max of offset data are: -6.14875 2.84217094304e-16 13.85125
+
+

That seems almost right: the original mean was about 6.1, so the +lower bound from zero is now about -6.1. The mean of the offset data +isn’t quite zero — we’ll explore why not in the challenges — but it’s +pretty close. We can even go further and check that the standard +deviation hasn’t changed:

+
+

PYTHON +

+
print('std dev before and after:', numpy.std(data), numpy.std(offset_data))
+
+
+

OUTPUT +

+
std dev before and after: 4.61383319712 4.61383319712
+
+

Those values look the same, but we probably wouldn’t notice if they +were different in the sixth decimal place. Let’s do this instead:

+
+

PYTHON +

+
print('difference in standard deviations before and after:',
+      numpy.std(data) - numpy.std(offset_data))
+
+
+

OUTPUT +

+
difference in standard deviations before and after: -3.5527136788e-15
+
+

Again, the difference is very small. It’s still possible that our +function is wrong, but it seems unlikely enough that we should probably +get back to doing our analysis.

+

Documentation +

+

We have one more task first, though: we should write some documentation for our function +to remind ourselves later what it’s for and how to use it.

+

The usual way to put documentation in software is to add comments like this:

+
+

PYTHON +

+
# offset_mean(data, target_mean_value):
+# return a new array containing the original data with its mean offset to match the desired value.
+def offset_mean(data, target_mean_value):
+    return (data - numpy.mean(data)) + target_mean_value
+
+

There’s a better way, though. If the first thing in a function is a +string that isn’t assigned to a variable, that string is attached to the +function as its documentation:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value."""
+    return (data - numpy.mean(data)) + target_mean_value
+
+

This is better because we can now ask Python’s built-in help system +to show us the documentation for the function:

+
+

PYTHON +

+
help(offset_mean)
+
+
+

OUTPUT +

+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+    Return a new array containing the original data with its mean offset to match the desired value.
+
+

A string like this is called a docstring. We don’t need to use +triple quotes when we write one, but if we do, we can break the string +across multiple lines:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value.
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3], 0)
+    array([-1.,  0.,  1.])
+    """
+    return (data - numpy.mean(data)) + target_mean_value
+
+help(offset_mean)
+
+
+

OUTPUT +

+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+    Return a new array containing the original data
+       with its mean offset to match the desired value.
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3], 0)
+    array([-1.,  0.,  1.])
+
+

Defining Defaults +

+

We have passed parameters to functions in two ways: directly, as in +type(data), and by name, as in +numpy.loadtxt(fname='something.csv', delimiter=','). In +fact, we can pass the filename to loadtxt without the +fname=:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', delimiter=',')
+
+
+

OUTPUT +

+
array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+
+

but we still need to say delimiter=:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', ',')
+
+
+

ERROR +

+
Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1041, in loa
+dtxt
+    dtype = np.dtype(dtype)
+  File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/core/_internal.py", line 199, in
+_commastring
+    newitem = (dtype, eval(repeats))
+  File "<string>", line 1
+    ,
+    ^
+SyntaxError: unexpected EOF while parsing
+
+

To understand what’s going on, and make our own functions easier to +use, let’s re-define our offset_mean function like +this:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value=0.0):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value, (0 by default).
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3])
+    array([-1.,  0.,  1.])
+    """
+    return (data - numpy.mean(data)) + target_mean_value
+
+

The key change is that the second parameter is now written +target_mean_value=0.0 instead of just +target_mean_value. If we call the function with two +arguments, it works as it did before:

+
+

PYTHON +

+
test_data = numpy.zeros((2, 2))
+print(offset_mean(test_data, 3))
+
+
+

OUTPUT +

+
[[ 3.  3.]
+ [ 3.  3.]]
+
+

But we can also now call it with just one parameter, in which case +target_mean_value is automatically assigned the default value of 0.0:

+
+

PYTHON +

+
more_data = 5 + numpy.zeros((2, 2))
+print('data before mean offset:')
+print(more_data)
+print('offset data:')
+print(offset_mean(more_data))
+
+
+

OUTPUT +

+
data before mean offset:
+[[ 5.  5.]
+ [ 5.  5.]]
+offset data:
+[[ 0.  0.]
+ [ 0.  0.]]
+
+

This is handy: if we usually want a function to work one way, but +occasionally need it to do something else, we can allow people to pass a +parameter when they need to but provide a default to make the normal +case easier. The example below shows how Python matches values to +parameters:

+
+

PYTHON +

+
def display(a=1, b=2, c=3):
+    print('a:', a, 'b:', b, 'c:', c)
+
+print('no parameters:')
+display()
+print('one parameter:')
+display(55)
+print('two parameters:')
+display(55, 66)
+
+
+

OUTPUT +

+
no parameters:
+a: 1 b: 2 c: 3
+one parameter:
+a: 55 b: 2 c: 3
+two parameters:
+a: 55 b: 66 c: 3
+
+

As this example shows, parameters are matched up from left to right, +and any that haven’t been given a value explicitly get their default +value. We can override this behavior by naming the value as we pass it +in:

+
+

PYTHON +

+
print('only setting the value of c')
+display(c=77)
+
+
+

OUTPUT +

+
only setting the value of c
+a: 1 b: 2 c: 77
+
+

With that in hand, let’s look at the help for +numpy.loadtxt:

+
+

PYTHON +

+
help(numpy.loadtxt)
+
+
+

OUTPUT +

+
Help on function loadtxt in module numpy.lib.npyio:
+
+loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+    Load data from a text file.
+
+    Each row in the text file must have the same number of values.
+
+    Parameters
+    ----------
+...
+
+

There’s a lot of information here, but the most important part is the +first couple of lines:

+
+

OUTPUT +

+
loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+
+

This tells us that loadtxt has one parameter called +fname that doesn’t have a default value, and eight others +that do. If we call the function like this:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', ',')
+
+

then the filename is assigned to fname (which is what we +want), but the delimiter string ',' is assigned to +dtype rather than delimiter, because +dtype is the second parameter in the list. However +',' isn’t a known dtype so our code produced +an error message when we tried to run it. When we call +loadtxt we don’t have to provide fname= for +the filename because it’s the first item in the list, but if we want the +',' to be assigned to the variable delimiter, +we do have to provide delimiter= for the second +parameter since delimiter is not the second parameter in +the list.

+

Readable functions +

+

Consider these two functions:

+
+

PYTHON +

+
def s(p):
+    a = 0
+    for v in p:
+        a += v
+    m = a / len(p)
+    d = 0
+    for v in p:
+        d += (v - m) * (v - m)
+    return numpy.sqrt(d / (len(p) - 1))
+
+def std_dev(sample):
+    sample_sum = 0
+    for value in sample:
+        sample_sum += value
+
+    sample_mean = sample_sum / len(sample)
+
+    sum_squared_devs = 0
+    for value in sample:
+        sum_squared_devs += (value - sample_mean) * (value - sample_mean)
+
+    return numpy.sqrt(sum_squared_devs / (len(sample) - 1))
+
+

The functions s and std_dev are +computationally equivalent (they both calculate the sample standard +deviation), but to a human reader, they look very different. You +probably found std_dev much easier to read and understand +than s.

+

As this example illustrates, both documentation and a programmer’s +coding style combine to determine how easy it is for others to +read and understand the programmer’s code. Choosing meaningful variable +names and using blank spaces to break the code into logical “chunks” are +helpful techniques for producing readable code. This is useful +not only for sharing code with others, but also for the original +programmer. If you need to revisit code that you wrote months ago and +haven’t thought about since then, you will appreciate the value of +readable code!

+
+
+ +
+
+

Combining Strings +

+
+

“Adding” two strings produces their concatenation: +'a' + 'b' is 'ab'. Write a function called +fence that takes two parameters called +original and wrapper and returns a new string +that has the wrapper character at the beginning and end of the original. +A call to your function should look like this:

+
+

PYTHON +

+
print(fence('name', '*'))
+
+
+

OUTPUT +

+
*name*
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def fence(original, wrapper):
+    return wrapper + original + wrapper
+
+
+
+
+
+
+
+ +
+
+

Return versus print +

+
+

Note that return and print are not +interchangeable. print is a Python function that +prints data to the screen. It enables us, users, see +the data. return statement, on the other hand, makes data +visible to the program. Let’s have a look at the following function:

+
+

PYTHON +

+
def add(a, b):
+    print(a + b)
+
+

Question: What will we see if we execute the +following commands?

+
+

PYTHON +

+
A = add(7, 3)
+print(A)
+
+
+
+
+
+
+ +
+
+

Python will first execute the function add with +a = 7 and b = 3, and, therefore, print +10. However, because function add does not +have a line that starts with return (no return +“statement”), it will, by default, return nothing which, in Python +world, is called None. Therefore, A will be +assigned to None and the last line (print(A)) +will print None. As a result, we will see:

+
+

OUTPUT +

+
10
+None
+
+
+
+
+
+
+
+ +
+
+

Selecting Characters From Strings +

+
+

If the variable s refers to a string, then +s[0] is the string’s first character and s[-1] +is its last. Write a function called outer that returns a +string made up of just the first and last characters of its input. A +call to your function should look like this:

+
+

PYTHON +

+
print(outer('helium'))
+
+
+

OUTPUT +

+
hm
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def outer(input_string):
+    return input_string[0] + input_string[-1]
+
+
+
+
+
+
+
+ +
+
+

Rescaling an Array +

+
+

Write a function rescale that takes an array as input +and returns a corresponding array of values scaled to lie in the range +0.0 to 1.0. (Hint: If L and H are the lowest +and highest values in the original array, then the replacement for a +value v should be (v-L) / (H-L).)

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def rescale(input_array):
+    L = numpy.amin(input_array)
+    H = numpy.amax(input_array)
+    output_array = (input_array - L) / (H - L)
+    return output_array
+
+
+
+
+
+
+
+ +
+
+

Testing and Documenting Your Function +

+
+

Run the commands help(numpy.arange) and +help(numpy.linspace) to see how to use these functions to +generate regularly-spaced values, then use those values to test your +rescale function. Once you’ve successfully tested your +function, add a docstring that explains what it does.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
"""Takes an array as input, and returns a corresponding array scaled so
+that 0 corresponds to the minimum and 1 to the maximum value of the input array.
+
+Examples:
+>>> rescale(numpy.arange(10.0))
+array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
+       0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])
+>>> rescale(numpy.linspace(0, 100, 5))
+array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
+"""
+
+
+
+
+
+
+
+ +
+
+

Defining Defaults +

+
+

Rewrite the rescale function so that it scales data to +lie between 0.0 and 1.0 by default, but will +allow the caller to specify lower and upper bounds if they want. Compare +your implementation to your neighbor’s: do the two functions always +behave the same way?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def rescale(input_array, low_val=0.0, high_val=1.0):
+    """rescales input array values to lie between low_val and high_val"""
+    L = numpy.amin(input_array)
+    H = numpy.amax(input_array)
+    intermed_array = (input_array - L) / (H - L)
+    output_array = intermed_array * (high_val - low_val) + low_val
+    return output_array
+
+
+
+
+
+
+
+ +
+
+

Variables Inside and Outside Functions +

+
+

What does the following piece of code display when run — and why?

+
+

PYTHON +

+
f = 0
+k = 0
+
+def f2k(f):
+    k = ((f - 32) * (5.0 / 9.0)) + 273.15
+    return k
+
+print(f2k(8))
+print(f2k(41))
+print(f2k(32))
+
+print(k)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
259.81666666666666
+278.15
+273.15
+0
+
+

k is 0 because the k inside the function +f2k doesn’t know about the k defined outside +the function. When the f2k function is called, it creates a +local variable +k. The function does not return any values and does not +alter k outside of its local copy. Therefore the original +value of k remains unchanged. Beware that a local +k is created because f2k internal statements +affect a new value to it. If k was only +read, it would simply retrieve the global k +value.

+
+
+
+
+
+
+ +
+
+

Mixing Default and Non-Default Parameters +

+
+

Given the following code:

+
+

PYTHON +

+
def numbers(one, two=2, three, four=4):
+    n = str(one) + str(two) + str(three) + str(four)
+    return n
+
+print(numbers(1, three=3))
+
+

what do you expect will be printed? What is actually printed? What +rule do you think Python is following?

+
  1. 1234
  2. +
  3. one2three4
  4. +
  5. 1239
  6. +
  7. SyntaxError
  8. +

Given that, what does the following piece of code display when +run?

+
+

PYTHON +

+
def func(a, b=3, c=6):
+    print('a: ', a, 'b: ', b, 'c:', c)
+
+func(-1, 2)
+
+
  1. a: b: 3 c: 6
  2. +
  3. a: -1 b: 3 c: 6
  4. +
  5. a: -1 b: 2 c: 6
  6. +
  7. a: b: -1 c: 2
  8. +
+
+
+
+
+ +
+
+

Attempting to define the numbers function results in +4. SyntaxError. The defined parameters two and +four are given default values. Because one and +three are not given default values, they are required to be +included as arguments when the function is called and must be placed +before any parameters that have default values in the function +definition.

+

The given call to func displays +a: -1 b: 2 c: 6. -1 is assigned to the first parameter +a, 2 is assigned to the next parameter b, and +c is not passed a value, so it uses its default value +6.

+
+
+
+
+
+
+ +
+
+

Readable Code +

+
+

Revise a function you wrote for one of the previous exercises to try +to make the code more readable. Then, collaborate with one of your +neighbors to critique each other’s functions and discuss how your +function implementations could be further improved to make them more +readable.

+
+
+
+
+
+ +
+
+

Key Points +

+
+
  • Define a function using +def function_name(parameter).
  • +
  • The body of a function must be indented.
  • +
  • Call a function using function_name(value).
  • +
  • Numbers are stored as integers or floating-point numbers.
  • +
  • Variables defined within a function can only be seen and used within +the body of the function.
  • +
  • Variables created outside of any function are called global +variables.
  • +
  • Within a function, we can access global variables.
  • +
  • Variables created within a function override global variables if +their names match.
  • +
  • Use help(thing) to view help for something.
  • +
  • Put docstrings in functions to provide help for that function.
  • +
  • Specify default values for parameters when defining a function using +name=value in the parameter list.
  • +
  • Parameters can be passed by matching based on name, by position, or +by omitting them (in which case the default value is used).
  • +
  • Put code whose parameters change frequently in a function, then call +it with different parameter values to customize its behavior.
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/08-data_analysis.html b/instructor/08-data_analysis.html new file mode 100644 index 0000000..3421e46 --- /dev/null +++ b/instructor/08-data_analysis.html @@ -0,0 +1,493 @@ + +Python for Official Statistics: Data Analysis +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Data Analysis

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 60 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I process tabular data files in Python?
  • +
  • How can I do the same operations on many different files?
  • +
+
+
+
+
+
+

Objectives

+
  • read in data files to Python
  • +
  • perform common operations on tabular data
  • +
  • write code to perform the same operation on multiple files
  • +
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
  • NULL
  • +
+
+
+ + + +
+
+ + +
+
+ + + diff --git a/instructor/09-visualizations.html b/instructor/09-visualizations.html new file mode 100644 index 0000000..24c0b98 --- /dev/null +++ b/instructor/09-visualizations.html @@ -0,0 +1,492 @@ + +Python for Official Statistics: Visualizations +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Visualizations

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 60 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How can I visualize tabular data in Python?
  • +
  • How can I group several plots together?
  • +
+
+
+
+
+
+

Objectives

+
  • create graphs and other visualizations using tabular data
  • +
  • group plots together to make comparative visualizations
  • +
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
  • NULL
  • +
+
+
+ + + +
+
+ + +
+
+ + + diff --git a/instructor/10-errors_exceptions.html b/instructor/10-errors_exceptions.html new file mode 100644 index 0000000..26bcb19 --- /dev/null +++ b/instructor/10-errors_exceptions.html @@ -0,0 +1,1186 @@ + +Python for Official Statistics: Errors and Exceptions +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Errors and Exceptions

+

Last updated on 2024-07-11 | + + Edit this page

+ + + +

Estimated time: 50 minutes

+ +
+ +
+ + + +
+

Overview

+
+
+
+
+

Questions

+
  • How does Python report errors?
  • +
  • How can I handle errors in Python programs?
  • +
+
+
+
+
+
+

Objectives

+
  • identify different errors and correct bugs associated with them
  • +
+
+
+
+
+

Every programmer encounters errors, both those who are just +beginning, and those who have been programming for years. Encountering +errors and exceptions can be very frustrating at times, and can make +coding feel like a hopeless endeavour. However, understanding what the +different types of errors are and when you are likely to encounter them +can help a lot. Once you know why you get certain types of +errors, they become much easier to fix.

+

Errors in Python have a very specific form, called a traceback. Let’s examine one:

+
+

PYTHON +

+
# This code has an intentional error. You can type it directly or
+# use it for reference to understand the error message below.
+def favorite_ice_cream():
+    ice_creams = [
+        'chocolate',
+        'vanilla',
+        'strawberry'
+    ]
+    print(ice_creams[3])
+
+favorite_ice_cream()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-1-70bd89baa4df> in <module>()
+      9     print(ice_creams[3])
+      10
+----> 11 favorite_ice_cream()
+
+<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
+      7         'strawberry'
+      8     ]
+----> 9     print(ice_creams[3])
+      10
+      11 favorite_ice_cream()
+
+IndexError: list index out of range
+
+

This particular traceback has two levels. You can determine the +number of levels by looking for the number of arrows on the left hand +side. In this case:

+
  1. The first shows code from the cell above, with an arrow pointing +to Line 11 (which is favorite_ice_cream()).

  2. +
  3. The second shows some code in the function +favorite_ice_cream, with an arrow pointing to Line 9 (which +is print(ice_creams[3])).

  4. +

The last level is the actual place where the error occurred. The +other level(s) show what function the program executed to get to the +next level down. So, in this case, the program first performed a function call to the function +favorite_ice_cream. Inside this function, the program +encountered an error on Line 6, when it tried to run the code +print(ice_creams[3]).

+
+
+ +
+
+

Long Tracebacks +

+
+

Sometimes, you might see a traceback that is very long -- sometimes +they might even be 20 levels deep! This can make it seem like something +horrible happened, but the length of the error message does not reflect +severity, rather, it indicates that your program called many functions +before it encountered the error. Most of the time, the actual place +where the error occurred is at the bottom-most level, so you can skip +down the traceback to the bottom.

+
+
+
+

So what error did the program actually encounter? In the last line of +the traceback, Python helpfully tells us the category or type of error +(in this case, it is an IndexError) and a more detailed +error message (in this case, it says “list index out of range”).

+

If you encounter an error and don’t know what it means, it is still +important to read the traceback closely. That way, if you fix the error, +but encounter a new one, you can tell that the error changed. +Additionally, sometimes knowing where the error occurred is +enough to fix it, even if you don’t entirely understand the message.

+

If you do encounter an error you don’t recognize, try looking at the +official +documentation on errors. However, note that you may not always be +able to find the error there, as it is possible to create custom errors. +In that case, hopefully the custom error message is informative enough +to help you figure out what went wrong. Libraries like pandas and numpy +have these custom errors, but the procedure to figure them out is the +same: go to the earliest line in the error, and look at the error +message for it. The documentation for these libraries will often provide +the information you need about any functions you are using. There are +also large communities of users for data libraries that can help as +well!

+
+
+ +
+
+

Reading Error Messages +

+
+

Read the Python code and the resulting traceback below, and answer +the following questions:

+
  1. How many levels does the traceback have?
  2. +
  3. What is the function name where the error occurred?
  4. +
  5. On which line number in this function did the error occur?
  6. +
  7. What is the type of error?
  8. +
  9. What is the error message?
  10. +
+

PYTHON +

+
# This code has an intentional error. Do not type it directly;
+# use it for reference to understand the error message below.
+def print_message(day):
+    messages = [
+        'Hello, world!',
+        'Today is Tuesday!',
+        'It is the middle of the week.',
+        'Today is Donnerstag in German!',
+        'Last day of the week!',
+        'Hooray for the weekend!',
+        'Aw, the weekend is almost over.'
+    ]
+    print(messages[day])
+
+def print_sunday_message():
+    print_message(7)
+
+print_sunday_message()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-7-3ad455d81842> in <module>
+     16     print_message(7)
+     17
+---> 18 print_sunday_message()
+     19
+
+<ipython-input-7-3ad455d81842> in print_sunday_message()
+     14
+     15 def print_sunday_message():
+---> 16     print_message(7)
+     17
+     18 print_sunday_message()
+
+<ipython-input-7-3ad455d81842> in print_message(day)
+     11         'Aw, the weekend is almost over.'
+     12     ]
+---> 13     print(messages[day])
+     14
+     15 def print_sunday_message():
+
+IndexError: list index out of range
+
+
+
+
+
+
+ +
+
+
  1. 3 levels
  2. +
  3. print_message
  4. +
  5. 13
  6. +
  7. IndexError
  8. +
  9. +list index out of range You can then infer that +7 is not the right index to use with +messages.
  10. +
+
+
+
+
+
+ +
+
+

Better errors on newer Pythons +

+
+

Newer versions of Python have improved error printouts. If you are +debugging errors, it is often helpful to use the latest Python version, +even if you support older versions of Python.

+
+
+
+

Type Errors +

+

One of the most common types of errors in Python are called type +errors. These errors occur when you try to perform an operation on +an object in python that cannot support it. This happens easily when +working with large datasets where there are expected value types like +either strings or integers. When we write a function expecting integers, +we will not get an error until we encounter an operation that cannot +handle strings. For example:

+
+

PYTHON +

+

+def our_function()
+  my_string="Hello World"
+  letter=my_string["e""]
+
+
+

ERROR +

+
  File "<ipython-input-3-6bb841ea1423>", line 3
+    letter=my_string["e"]
+                       ^
+TypeError: string indices must be integers
+
+

We get this error because we are trying to use an index to access +part of our string, which requires an integer. Instead, we entered a +character and received a type error. This is fixed by replacing “e” with +2.

+

In the case of datasets, we often see type errors when a mathematical +operation, such as taking a mean, is performed on a column that contains +characters, either as a result of formatting or introduced through +error. As a result, correcting the error can involve simply removing the +characters from the strings using regular expressions, or if the +characters have resulted in incorrect data, removing those observations +from the dataset.

+

Syntax Errors +

+

When you forget a colon at the end of a line, accidentally add one +space too many when indenting under an if statement, or +forget a parenthesis, you will encounter a syntax error. This means that +Python couldn’t figure out how to read your program. This is similar to +forgetting punctuation in English: for example, this text is difficult +to read there is no punctuation there is also no capitalization why is +this hard because you have to figure out where each sentence ends you +also have to figure out where each sentence begins to some extent it +might be ambiguous if there should be a sentence break or not

+

People can typically figure out what is meant by text with no +punctuation, but people are much smarter than computers. If Python +doesn’t know how to read the program, it will give up and inform you +with an error. For example:

+
+

PYTHON +

+
def some_function()
+    msg = 'hello, world!'
+    print(msg)
+     return msg
+
+
+

ERROR +

+
  File "<ipython-input-3-6bb841ea1423>", line 1
+    def some_function()
+                       ^
+SyntaxError: invalid syntax
+
+

Here, Python tells us that there is a SyntaxError on +line 1, and even puts a little arrow in the place where there is an +issue. In this case the problem is that the function definition is +missing a colon at the end.

+

Actually, the function above has two issues with syntax. If +we fix the problem with the colon, we see that there is also an +IndentationError, which means that the lines in the +function definition do not all have the same indentation:

+
+

PYTHON +

+
def some_function():
+    msg = 'hello, world!'
+    print(msg)
+     return msg
+
+
+

ERROR +

+
  File "<ipython-input-4-ae290e7659cb>", line 4
+    return msg
+    ^
+IndentationError: unexpected indent
+
+

Both SyntaxError and IndentationError +indicate a problem with the syntax of your program, but an +IndentationError is more specific: it always means +that there is a problem with how your code is indented.

+
+
+ +
+
+

Tabs and Spaces +

+
+

Some indentation errors are harder to spot than others. In +particular, mixing spaces and tabs can be difficult to spot because they +are both whitespace. In the +example below, the first two lines in the body of the function +some_function are indented with tabs, while the third line +— with spaces. If you’re working in a Jupyter notebook, be sure to copy +and paste this example rather than trying to type it in manually because +Jupyter automatically replaces tabs with spaces.

+
+

PYTHON +

+
def some_function():
+	msg = 'hello, world!'
+	print(msg)
+        return msg
+
+

Visually it is impossible to spot the error. Fortunately, Python does +not allow you to mix tabs and spaces.

+
+

ERROR +

+
  File "<ipython-input-5-653b36fbcd41>", line 4
+    return msg
+              ^
+TabError: inconsistent use of tabs and spaces in indentation
+
+
+
+
+

Variable Name Errors +

+

Another very common type of error is called a NameError, +and occurs when you try to use a variable that does not exist. For +example:

+
+

PYTHON +

+
print(a)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-7-9d7b17ad5387> in <module>()
+----> 1 print(a)
+
+NameError: name 'a' is not defined
+
+

Variable name errors come with some of the most informative error +messages, which are usually of the form “name ‘the_variable_name’ is not +defined”.

+

Why does this error message occur? That’s a harder question to +answer, because it depends on what your code is supposed to do. However, +there are a few very common reasons why you might have an undefined +variable. The first is that you meant to use a string, but forgot to put quotes around +it:

+
+

PYTHON +

+
print(hello)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-8-9553ee03b645> in <module>()
+----> 1 print(hello)
+
+NameError: name 'hello' is not defined
+
+

The second reason is that you might be trying to use a variable that +does not yet exist. In the following example, count should +have been defined (e.g., with count = 0) before the for +loop:

+
+

PYTHON +

+
for number in range(10):
+    count = count + number
+print('The count is:', count)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-9-dd6a12d7ca5c> in <module>()
+      1 for number in range(10):
+----> 2     count = count + number
+      3 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+

Finally, the third possibility is that you made a typo when you were +writing your code. Let’s say we fixed the error above by adding the line +Count = 0 before the for loop. Frustratingly, this actually +does not fix the error. Remember that variables are case-sensitive, so the variable +count is different from Count. We still get +the same error, because we still have not defined +count:

+
+

PYTHON +

+
Count = 0
+for number in range(10):
+    count = count + number
+print('The count is:', count)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-10-d77d40059aea> in <module>()
+      1 Count = 0
+      2 for number in range(10):
+----> 3     count = count + number
+      4 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+

Index Errors +

+

Next up are errors having to do with containers (like lists and +strings) and the items within them. If you try to access an item in a +list or a string that does not exist, then you will get an error. This +makes sense: if you asked someone what day they would like to get +coffee, and they answered “caturday”, you might be a bit annoyed. Python +gets similarly annoyed if you try to ask it for an item that doesn’t +exist:

+
+

PYTHON +

+
letters = ['a', 'b', 'c']
+print('Letter #1 is', letters[0])
+print('Letter #2 is', letters[1])
+print('Letter #3 is', letters[2])
+print('Letter #4 is', letters[3])
+
+
+

OUTPUT +

+
Letter #1 is a
+Letter #2 is b
+Letter #3 is c
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-11-d817f55b7d6c> in <module>()
+      3 print('Letter #2 is', letters[1])
+      4 print('Letter #3 is', letters[2])
+----> 5 print('Letter #4 is', letters[3])
+
+IndexError: list index out of range
+
+

Here, Python is telling us that there is an IndexError +in our code, meaning we tried to access a list index that did not +exist.

+

File Errors +

+

The last type of error we’ll cover today are the most common type of +error when using Python with data, those associated with reading and +writing files: FileNotFoundError. If you try to read a file +that does not exist, you will receive a FileNotFoundError +telling you so. If you attempt to write to a file that was opened +read-only, Python 3 returns an UnsupportedOperationError. +More generally, problems with input and output manifest as +OSErrors, which may show up as a more specific subclass; +you can see the +list in the Python docs. They all have a unique UNIX +errno, which is you can see in the error message.

+
+

PYTHON +

+
file_handle = open('myfile.txt', 'r')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+FileNotFoundError                         Traceback (most recent call last)
+<ipython-input-14-f6e1ac4aee96> in <module>()
+----> 1 file_handle = open('myfile.txt', 'r')
+
+FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
+
+

One reason for receiving this error is that you specified an +incorrect path to the file. For example, if I am currently in a folder +called myproject, and I have a file in +myproject/writing/myfile.txt, but I try to open +myfile.txt, this will fail. The correct path would be +writing/myfile.txt. It is also possible that the file name +or its path contains a typo. There may also be specific settings based +on your organization if you are using shared, networked, or cloud-based +drives. It is best to check with your IT administrators if you are still +encountering issues reading in a file after troubleshooting.

+

A related issue can occur if you use the “read” flag instead of the +“write” flag. Python will not give you an error if you try to open a +file for writing when the file does not exist. However, if you meant to +open a file for reading, but accidentally opened it for writing, and +then try to read from it, you will get an +UnsupportedOperation error telling you that the file was +not opened for reading:

+
+

PYTHON +

+
file_handle = open('myfile.txt', 'w')
+file_handle.read()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+UnsupportedOperation                      Traceback (most recent call last)
+<ipython-input-15-b846479bc61f> in <module>()
+      1 file_handle = open('myfile.txt', 'w')
+----> 2 file_handle.read()
+
+UnsupportedOperation: not readable
+
+

If you are getting a read or write error on file or folder that you +are able to open and/or edit with other programs, you may need to +contact an IT administrator to check the permissions granted to you and +any programs you are using.

+

These are the most common errors with files, though many others +exist. If you get an error that you’ve never seen before, searching the +Internet for that error type often reveals common reasons why you might +get that error.

+
+
+ +
+
+

Identifying Syntax Errors +

+
+
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. Is it a +SyntaxError or an IndentationError?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+

PYTHON +

+
def another_function
+  print('Syntax errors are annoying.')
+   print('But at least Python tells us about them!')
+  print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+ +
+
+

SyntaxError for missing (): at end of first +line, IndentationError for mismatch between second and +third lines. A fixed version is:

+
+

PYTHON +

+
def another_function():
+    print('Syntax errors are annoying.')
+    print('But at least Python tells us about them!')
+    print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+ +
+
+

Identifying Variable Name Errors +

+
+
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. What type of +NameError do you think this is? In other words, is it a +string with no quotes, a misspelled variable, or a variable that should +have been defined but was not?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+

PYTHON +

+
for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (Number % 3) == 0:
+        message = message + a
+    else:
+        message = message + 'b'
+print(message)
+
+
+
+
+
+
+ +
+
+

3 NameErrors for number being misspelled, +for message not defined, and for a not being +in quotes.

+

Fixed version:

+
+

PYTHON +

+
message = ''
+for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (number % 3) == 0:
+        message = message + 'a'
+    else:
+        message = message + 'b'
+print(message)
+
+
+
+
+
+
+
+ +
+
+

Identifying Index Errors +

+
+
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
  5. Fix the error.
  6. +
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+ +
+
+

IndexError; the last entry is seasons[3], +so seasons[4] doesn’t make sense. A fixed version is:

+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[-1])
+
+
+
+
+
+

A Final Note About Correcting Errors +

+

There are a lot of very helpful answers for many error messages, +however when working with official statistics, we need to also exercise +some caution. Be aware and be wary of any answers that ask you to +download a package from someone’s personal GitHub repository or other +file sharing service. Try to find the type of error first and understand +what the issue is before downloading anything claiming to fix the error. +If the error is the result of an issue with a version of a package, +check if there are any security vulnerabilities with that version, and +use a package manager to move between package versions.

+
+
+ +
+
+

Key Points +

+
+
  • NULL
  • +
+
+
+
+
+ + +
+
+ + + diff --git a/instructor/404.html b/instructor/404.html new file mode 100644 index 0000000..c1d20b6 --- /dev/null +++ b/instructor/404.html @@ -0,0 +1,445 @@ + +Python for Official Statistics: Page not found +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Page not found

+ +

Our apologies! +

+

We cannot seem to find the page you are looking for. Here are some +tips that may help:

+
  1. try going back to the previous +page or
  2. +
  3. navigate to any other page using the navigation bar on the +left.
  4. +
  5. if the URL ends with /index.html, try removing +that.
  6. +
  7. head over to the home page of this +lesson +
  8. +

If you came here from a link in this lesson, please contact the +lesson maintainers using the links at the foot of this page.

+
+
+ + +
+
+ + + diff --git a/instructor/CODE_OF_CONDUCT.html b/instructor/CODE_OF_CONDUCT.html new file mode 100644 index 0000000..6cc6dee --- /dev/null +++ b/instructor/CODE_OF_CONDUCT.html @@ -0,0 +1,458 @@ + +Python for Official Statistics: Contributor Code of Conduct +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Contributor Code of Conduct

+

Last updated on 2024-07-11 | + + Edit this page

+ + + + + +
+ +
+ + + +

As contributors and maintainers of this project, we pledge to follow +the The +Carpentries Code of Conduct.

+

Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by following our reporting +guidelines.

+ + + +
+
+ + +
+
+ + + diff --git a/instructor/LICENSE.html b/instructor/LICENSE.html new file mode 100644 index 0000000..df9a220 --- /dev/null +++ b/instructor/LICENSE.html @@ -0,0 +1,509 @@ + +Python for Official Statistics: Licenses +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Licenses

+

Last updated on 2024-07-11 | + + Edit this page

+ + + + + +
+ +
+ + + +

Instructional Material +

+

All Carpentries (Software Carpentry, Data Carpentry, and Library +Carpentry) instructional material is made available under the Creative Commons +Attribution license. The following is a human-readable summary of +(and not a substitute for) the full legal +text of the CC BY 4.0 license.

+

You are free:

+
  • to Share—copy and redistribute the material in any +medium or format
  • +
  • to Adapt—remix, transform, and build upon the +material
  • +

for any purpose, even commercially.

+

The licensor cannot revoke these freedoms as long as you follow the +license terms.

+

Under the following terms:

+
  • Attribution—You must give appropriate credit +(mentioning that your work is derived from work that is Copyright (c) +The Carpentries and, where practical, linking to https://carpentries.org/), provide a link to the +license, and indicate if changes were made. You may do so in any +reasonable manner, but not in any way that suggests the licensor +endorses you or your use.

  • +
  • No additional restrictions—You may not apply +legal terms or technological measures that legally restrict others from +doing anything the license permits. With the understanding +that:

  • +

Notices:

+
  • You do not have to comply with the license for elements of the +material in the public domain or where your use is permitted by an +applicable exception or limitation.
  • +
  • No warranties are given. The license may not give you all of the +permissions necessary for your intended use. For example, other rights +such as publicity, privacy, or moral rights may limit how you use the +material.
  • +

Software +

+

Except where otherwise noted, the example programs and other software +provided by The Carpentries are made available under the OSI-approved MIT +license.

+

Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +“Software”), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions:

+

The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software.

+

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

+

Trademark +

+

“The Carpentries”, “Software Carpentry”, “Data Carpentry”, and +“Library Carpentry” and their respective logos are registered trademarks +of Community Initiatives.

+
+
+ + +
+
+ + + diff --git a/instructor/aio.html b/instructor/aio.html new file mode 100644 index 0000000..a58ca57 --- /dev/null +++ b/instructor/aio.html @@ -0,0 +1,5383 @@ + + + + + +Python for Official Statistics: All in One View + + + + + + + + + + + + +
+ Python for Official Statistics +
+ +
+
+ + + + + + +
+
+ + +

Content from Introduction

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 15 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What is programming?
  • +
  • How do I document code?
  • +
  • How do I find reliable and safe resources or code online?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify basic concepts in programming
  • +
+
+
+
+
+
+

Programming in Python + +

+
+

In most general terms, programming is the process of writing +instructions for a computer. In this course we will be using Python as +the language to communicate with the computer.

+
+

Strictly speaking, Python is an interpreted language, rather than a +compiled language, meaning we are not communicating directly with the +computer when we use Python. When we run Python code, our Python source +code is first translated into byte code, which is then executed by the +Python virtual machine.

+
+

Programming is a wide topic including a variety of techniques and +tools. In this course we’ll be focusing on programming for statistical +analysis.

+
+

IDEs +

+

IDE stands for Integrated Development Environment. IDEs are where you +will write, edit, and debug python scripts, so you want to choose one +that makes you feel comfortable and includes the functionality that you +need. Some open-source IDEs for Python include JupyterLab and Visual Studio +Code.

+
+
+

Packages +

+

Packages, or libraries, are extensions to the statistical programming +language. They contain code, data, and documentation in a standardised +collection format that can be installed by users, typically via a +centralised software repository. A typical Python workflow will use base +Python (the core operations and functions provided by your Python +installation) as well as specialised data analysis and scientific +packages like NumPy, SciPy and Pandas.

+
+

Best Practices + +

+
+

Let’s overview some base concepts that any programmer should always +keep in mind.

+
+

Documentation +

+

Have you ever returned to a task and tried to read a note that you +quickly scrawled for yourself the last time you were working on it? Have +you ever inherited a project from a colleague and found you have no idea +what remains to be done?

+

It can be very challenging to return to your own work or a +colleague’s and this goes doubly for programming. Documentation is one +way we can reduce the burden on future selves and our colleagues.

+
+

Inline Documentation +

+

As a new programmer, inline documentation can be the most helpful. +Inline documentation refers to writing comments on the same line as your +code. For example, if we wrote a line of code to sum 1+1, we might +document it as follows:

+
+

PYTHON +

+
1+1         # adding the numbers 1 and 1 together.
+
+

Although this is a very simple line of code and it might seem like +overkill to document it in this way, these types of comments can be very +helpful in jogging your memory when returning to a project. Inline +comments can also help you to break multi-step programs into digestible +and readable pieces.

+
+
+

External Documentation +

+

Sometimes you require more detail than you can comfortably fit in +your inline documentation. In this case it can be helpful to create +separate files to document your project. This type of documentation will +typically focus on the goals, scope, and any special instructions +relating to your project rather than the details fo your code. The most +common type of external documentation is a README file. It is best +practice to create a basic README file for any project. A basic README +should include:

+
    +
  • a brief description of the project,
  • +
  • any special instructions for installation or use,
  • +
  • the authors and any references.
  • +
+

README files are just text files and it is best practice is to save +your README file as a README.md markdown document. This +file format is automatically recognised by code repositories like +GitHub, so your README contents are displayed alongside your code +repository.

+
+
+

DocStrings +

+

In chapter 7: functions we’ll learn +about documentation specific to functions known as DocStrings.

+
+
+

Getting Help + +

+
+

Later on, in chapter 10: Errors +and Exceptions we will cover errors in more detail. However, before +we get there it’s very likely you’ll need some assistance writing Python +code.

+
+

Built-in Help +

+

There is a help +function built into base Python. You can use it to investigate +built-in functions, data types, and more. For example, say we want to +know more about the print() function in Python:

+
+

PYTHON +

+
help(print)
+
+
+

OUTPUT +

+
Help on built-in function print in module builtins:
+
+print(...)
+    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
+
+    Prints the values to a stream, or to sys.stdout by default.
+    Optional keyword arguments:
+    file:  a file-like object (stream); defaults to the current sys.stdout.
+    sep:   string inserted between values, default a space.
+    end:   string appended after the last value, default a newline.
+-- More  --
+
+
+
+

Finding Resources online +

+

Stack Overflow is a valuable +resource for programmers of all levels. It can be daunting to post your +own question! Fortunately, chances are someone else has already asked a +similar question!

+

The Official Python +Documentation is another great resource.

+

It can also be helpful to do a general search for a particular topic +or error message. It’s very likely the first few results will be from +StackOverflow, followed by a few from official documentation and then +you may start seeing results from personal blogs or third parties. These +third party results can sometime be valuable but we should be cautious! +Here are a few things to keep in mind when you are looking for online +resources:

+
    +
  1. Don’t download or install anything unless you are certain of what it +is and why you need it.
  2. +
  3. Don’t copy or run code unless you fully understand what it +does.
  4. +
  5. Python is an open-source language; official documentation and +resources will not be behind a paywall.
  6. +
  7. You may not find a resource or solution to fit your exact needs. Try +to be flexible and adapt online solutions to fit your needs.
  8. +
+
+
+ +
+
+

Key Points +

+
+
    +
  • Python is an interpreted language.
  • +
  • Code is commonly developed inside an integrated development +environment.
  • +
  • A typical Python workflow uses base Python and additional Python +packages developed for statistical programming purposes.
  • +
  • In-line and external documentation helps ensure that your code is +readable.
  • +
  • You can find help through the built-in help function and external +resources.
  • +
+
+
+
+
+

Content from Python Fundamentals

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 30 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What basic data types can I work with in Python?
  • +
  • How can I create a new variable in Python?
  • +
  • How do I use a function?
  • +
  • Can I change the value associated with a variable after I create +it?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Assign values to variables.
  • +
+
+
+
+
+
+

Variables + +

+
+

Any Python interpreter can be used as a calculator:

+
+

PYTHON +

+
3 + 5 * 4
+
+
+

OUTPUT +

+
23
+
+

This is great but not very interesting. To do anything useful with +data, we need to assign its value to a variable. In Python, we +can assign a value to a variable, using the equals sign +=. For example, we can track the weight of a patient who +weighs 60 kilograms by assigning the value 60 to a variable +weight_kg:

+
+

PYTHON +

+
weight_kg = 60
+
+

From now on, whenever we use weight_kg, Python will +substitute the value we assigned to it. In layperson’s terms, a +variable is a name for a value.

+

In Python, variable names:

+
    +
  • can include letters, digits, and underscores
  • +
  • cannot start with a digit
  • +
  • are case sensitive.
  • +
+

This means that, for example:

+
    +
  • +weight0 is a valid variable name, whereas +0weight is not
  • +
  • +weight and Weight are different +variables
  • +

Types of data + +

+
+

Python knows various types of data. Three common ones are:

+
    +
  • integer numbers
  • +
  • floating point numbers, and
  • +
  • strings.
  • +
+

In the example above, variable weight_kg has an integer +value of 60. If we want to more precisely track the weight +of our patient, we can use a floating point value by executing:

+
+

PYTHON +

+
weight_kg = 60.3
+
+

To create a string, we add single or double quotes around some text. +To identify and track a patient throughout our study, we can assign each +person a unique identifier by storing it in a string:

+
+

PYTHON +

+
patient_id = '001'
+
+

Using Variables in Python + +

+
+

Once we have data stored with variable names, we can make use of it +in calculations. We may want to store our patient’s weight in pounds as +well as kilograms:

+
+

PYTHON +

+
weight_lb = 2.2 * weight_kg
+
+

We might decide to add a prefix to our patient identifier:

+
+

PYTHON +

+
patient_id = 'inflam_' + patient_id
+
+

Built-in Python functions + +

+
+

To carry out common tasks with data and variables in Python, the +language provides us with several built-in functions. To display information to +the screen, we use the print function:

+
+

PYTHON +

+
print(weight_lb)
+print(patient_id)
+
+
+

OUTPUT +

+
132.66
+inflam_001
+
+

When we want to make use of a function, referred to as calling the +function, we follow its name by parentheses. The parentheses are +important: if you leave them off, the function doesn’t actually run! +Sometimes you will include values or variables inside the parentheses +for the function to use. In the case of print, we use the +parentheses to tell the function what value we want to display. We will +learn more about how functions work and how to create our own in later +episodes.

+

We can display multiple things at once using only one +print call:

+
+

PYTHON +

+
print(patient_id, 'weight in kilograms:', weight_kg)
+
+
+

OUTPUT +

+
inflam_001 weight in kilograms: 60.3
+
+

We can also call a function inside of another function call. For example, +Python has a built-in function called type that tells you a +value’s data type:

+
+

PYTHON +

+
print(type(60.3))
+print(type(patient_id))
+
+
+

OUTPUT +

+
<class 'float'>
+<class 'str'>
+
+

Moreover, we can do arithmetic with variables right inside the +print function:

+
+

PYTHON +

+
print('weight in pounds:', 2.2 * weight_kg)
+
+
+

OUTPUT +

+
weight in pounds: 132.66
+
+

The above command, however, did not change the value of +weight_kg:

+
+

PYTHON +

+
print(weight_kg)
+
+
+

OUTPUT +

+
60.3
+
+

To change the value of the weight_kg variable, we have +to assign weight_kg a new value using the +equals = sign:

+
+

PYTHON +

+
weight_kg = 65.0
+print('weight in kilograms is now:', weight_kg)
+
+
+

OUTPUT +

+
weight in kilograms is now: 65.0
+
+
+
+ +
+
+

Variables as Sticky Notes +

+
+

A variable in Python is analogous to a sticky note with a name +written on it: assigning a value to a variable is like putting that +sticky note on a particular value.

+
Value of 65.0 with weight_kg label stuck on it

Using this analogy, we can investigate how assigning a value to one +variable does not change values of other, seemingly +related, variables. For example, let’s store the subject’s weight in +pounds in its own variable:

+
+

PYTHON +

+
# There are 2.2 pounds per kilogram
+weight_lb = 2.2 * weight_kg
+print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
+
+
+

OUTPUT +

+
weight in kilograms: 65.0 and in pounds: 143.0
+
+

Everything in a line of code following the ‘#’ symbol is a comment that is ignored by Python. +Comments allow programmers to leave explanatory notes for other +programmers or their future selves.

+
Value of 65.0 with weight_kg label stuck on it, and value of 143.0 with weight_lb label stuck on it

Similar to above, the expression 2.2 * weight_kg is +evaluated to 143.0, and then this value is assigned to the +variable weight_lb (i.e. the sticky note +weight_lb is placed on 143.0). At this point, +each variable is “stuck” to completely distinct and unrelated +values.

+

Let’s now change weight_kg:

+
+

PYTHON +

+
weight_kg = 100.0
+print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
+
+
+

OUTPUT +

+
weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
+
+
Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lbstuck on it

Since weight_lb doesn’t “remember” where its value comes +from, it is not updated when we change weight_kg.

+
+
+
+
+
+ +
+
+

Check Your Understanding +

+
+

What values do the variables mass and age +have after each of the following statements? Test your answer by +executing the lines.

+
+

PYTHON +

+
mass = 47.5
+age = 122
+mass = mass * 2.0
+age = age - 20
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
`mass` holds a value of 47.5, `age` does not exist
+`mass` still holds a value of 47.5, `age` holds a value of 122
+`mass` now has a value of 95.0, `age`'s value is still 122
+`mass` still has a value of 95.0, `age` now holds 102
+
+
+
+
+
+
+
+ +
+
+

Sorting Out References +

+
+

Python allows you to assign multiple values to multiple variables in +one line by separating the variables and values with commas. What does +the following program print out?

+
+

PYTHON +

+
first, second = 'Grace', 'Hopper'
+third, fourth = second, first
+print(third, fourth)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
Hopper Grace
+
+
+
+
+
+
+
+ +
+
+

Seeing Data Types +

+
+

What are the data types of the following variables?

+
+

PYTHON +

+
planet = 'Earth'
+apples = 5
+distance = 10.5
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(type(planet))
+print(type(apples))
+print(type(distance))
+
+
+

OUTPUT +

+
<class 'str'>
+<class 'int'>
+<class 'float'>
+
+
+
+
+
+
+
+ +
+
+

Key Points +

+
+
    +
  • Basic data types in Python include integers, strings, and +floating-point numbers.
  • +
  • Use variable = value to assign a value to a variable in +order to record it in memory.
  • +
  • Variables are created on demand whenever a value is assigned to +them.
  • +
  • Use print(something) to display the value of +something.
  • +
  • Use # some kind of explanation to add comments to +programs.
  • +
  • Built-in functions are always available to use.
  • +
+
+
+
+

Content from Data Transformation

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 60 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I process tabular data files in Python?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Explain what a library is and what libraries are used for.
  • +
  • Import a Python library and use the functions it contains.
  • +
  • Read tabular data from a file into a program.
  • +
  • Select individual values and subsections from data.
  • +
  • Perform operations on arrays of data.
  • +
+
+
+
+
+
+

Words are useful, but what’s more useful are the sentences and +stories we build with them. Similarly, while a lot of powerful, general +tools are built into Python, specialized tools built up from these basic +units live in libraries that can be +called upon when needed.

+

Loading data into Python + +

+
+

To begin processing the clinical trial inflammation data, we need to +load it into Python. Python can work with many different file types. +Text files can be loaded into Python by using the base Python +function

+
+

PYTHON +

+
Open("filename.txt", "r") 
+
+

where “r” means read only, or if you want to write to the file, you +can use “w”.

+

However, our patient data is in a csv. file, which is more commonly +loaded by using a library. Python has hundreds of thousands of libraries +to choose from to help carry out your work. Importing a library is like +getting a piece of lab equipment out of a storage locker and setting it +up on the bench. Libraries provide additional functionality to the basic +Python package, much like a new piece of equipment adds functionality to +a lab space. Just like in the lab, importing too many libraries can +sometimes complicate and slow down your programs - so we only import +what we need for each program. There are a couple common Python +libraries to load (and work with data).

+

pandas + +

+
+

The first library we will present is called pandas pandas is a +Python library containing a set of functions and specialised data +structures that have been designed to help Python programmers to perform +data analysis tasks in a structured way.

+

Most of the things that pandas can do can be done with basic Python, +but the collected set of pandas functions and data structure makes the +data analysis tasks more consistent in terms of syntax and therefore +aids readabilty.

+

Remember to write the library name with a lower case ‘p’ because the +name of the package and Python is case sensitive.

+
+

Importing the pandas library +

+

Importing the pandas library is done in exactly the same way as for +any other library. In almost all examples of Python code using the +pandas library, it will have been imported and given an alias of +pd. We will follow the same convention.

+
+

PYTHON +

+
import pandas as pd
+
+
+
+

Pandas data structures +

+

There are two main data structure used by pandas, they are the Series +and the Dataframe. The Series equates in general to a vector or a list. +The Dataframe is equivalent to a table. Each column in a pandas +Dataframe is a pandas Series data structure.

+

We will mainly be looking at the Dataframe.

+

We can easily create a Pandas Dataframe by reading a .csv file

+
+
+

Reading a csv file +

+

When we read a csv dataset in base Python we did so by opening the +dataset, reading and processing a record at a time and then closing the +dataset after we had read the last record. Reading datasets in this way +is slow and places all of the responsibility for extracting individual +data items of information from the records on the programmer.

+

The main advantage of this approach, however, is that you only have +to store one dataset record in memory at a time. This means that if you +have the time, you can process datasets of any size.

+

In Pandas, csv files are read as complete datasets. You do not have +to explicitly open and close the dataset. All of the dataset records are +assembled into a Dataframe. If your dataset has column headers in the +first record then these can be used as the Dataframe column names. You +can explicitly state this in the parameters to the call, but pandas is +usually able to infer that there ia a header row and use it +automatically.

+

To tell Python that we’d like to start using pandas, we need to import it:

+
+

PYTHON +

+
import pandas as pd
+
+

Often, libraries are given an alias or a short form name, in this +case pandas is given the alias “pd”. Aliases for common data analysis +libraries include:

+
+

PYTHON +

+
import pandas as pd
+import numpy as np
+import matplotlib as plt
+import seaborn as sns
+
+

Once we’ve imported the library, we can ask the library to read our +data file for us:

+
+

PYTHON +

+
pd.read_csv("filename.csv)
+
+

pandas is a commonly used library for working with and analysing +data. However, we will be working with a different package for the +remainder of this course. If you would like to learn more about data +manipulation and analysis using pandas, we recommend checking out Data Analysis and +Visualization with Python for Social Scientists.

+
+

numpy + +

+
+

The second package that we will present is called NumPy, which stands for Numerical +Python. In general, you should use this library when you want to do +fancy things with lots of numbers, especially if you have matrices or +arrays. Numpy matrices are typically lighter weight with better +performance, particularly when working with large datasets.

+

We will be using this package to work with our clinical trial +inflammation data.

+

To tell Python that we’d like to start using NumPy, we need to import it:

+
+

PYTHON +

+
import numpy as np
+
+

Now that we have imported the library, we can ask the library (by +using the alisa np) to read our data file for us:

+
+

PYTHON +

+
np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+
+

OUTPUT +

+
array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+
+

The expression np.loadtxt(...) is a function call that asks Python +to run the function +loadtxt which belongs to the np library. The +dot notation in Python is used most of all as an object +attribute/property specifier or for invoking its method. +object.property will give you the object.property value, +object_name.method() will invoke on object_name method.

+

As an example, John Smith is the John that belongs to the Smith +family. We could use the dot notation to write his name +smith.john, just as loadtxt is a function that +belongs to the np library.

+

np.loadtxt has two parameters: the name of the file we +want to read and the delimiter +that separates values on a line. These both need to be character strings +(or strings for short), so we put +them in quotes.

+

Since we haven’t told it to do anything else with the function’s +output, the notebook displays it. +In this case, that output is the data we just loaded. By default, only a +few rows and columns are shown (with ... to omit elements +when displaying big arrays). Note that, to save space when displaying +NumPy arrays, Python does not show us trailing zeros, so +1.0 becomes 1..

+

Our call to np.loadtxt read our file but didn’t save the +data in memory. To do that, we need to assign the array to a variable. +In a similar manner to how we assign a single value to a variable, we +can also assign an array of values to a variable using the same syntax. +Let’s re-run np.loadtxt and save the returned data:

+
+

PYTHON +

+
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+

This statement doesn’t produce any output because we’ve assigned the +output to the variable data. If we want to check that the +data have been loaded, we can print the variable’s value:

+
+

PYTHON +

+
print(data)
+
+
+

OUTPUT +

+
[[ 0.  0.  1. ...,  3.  0.  0.]
+ [ 0.  1.  2. ...,  1.  0.  1.]
+ [ 0.  1.  1. ...,  2.  1.  1.]
+ ...,
+ [ 0.  1.  1. ...,  1.  1.  1.]
+ [ 0.  0.  0. ...,  0.  2.  0.]
+ [ 0.  0.  1. ...,  1.  1.  0.]]
+
+

Now that the data are in memory, we can manipulate them. First, let’s +ask what type of thing +data refers to:

+
+

PYTHON +

+
print(type(data))
+
+
+

OUTPUT +

+
<class 'np.ndarray'>
+
+

The output tells us that data currently refers to an +N-dimensional array, the functionality for which is provided by the +NumPy library. These data correspond to arthritis patients’ +inflammation. The rows are the individual patients, and the columns are +their daily inflammation measurements.

+
+
+ +
+
+

Data Type +

+
+

A Numpy array contains one or more elements of the same type. The +type function will only tell you that a variable is a NumPy +array but won’t tell you the type of thing inside the array. We can find +out the type of the data contained in the NumPy array.

+
+

PYTHON +

+
print(data.dtype)
+
+
+

OUTPUT +

+
float64
+
+

This tells us that the NumPy array’s elements are floating-point +numbers.

+
+
+
+

With the following command, we can see the array’s shape:

+
+

PYTHON +

+
print(data.shape)
+
+
+

OUTPUT +

+
(60, 40)
+
+

The output tells us that the data array variable +contains 60 rows and 40 columns. When we created the variable +data to store our arthritis data, we did not only create +the array; we also created information about the array, called members or attributes. This extra +information describes data in the same way an adjective +describes a noun. data.shape is an attribute of +data which describes the dimensions of data. +We use the same dotted notation for the attributes of variables that we +use for the functions in libraries because they have the same +part-and-whole relationship.

+

If we want to get a single number from the array, we must provide an +index in square brackets after the +variable name, just as we do in math when referring to an element of a +matrix. Our inflammation data has two dimensions, so we will need to use +two indices to refer to one specific value:

+
+

PYTHON +

+
print('first value in data:', data[0, 0])
+
+
+

OUTPUT +

+
first value in data: 0.0
+
+
+

PYTHON +

+
print('middle value in data:', data[29, 19])
+
+
+

OUTPUT +

+
middle value in data: 16.0
+
+

The expression data[29, 19] accesses the element at row +30, column 20. While this expression may not surprise you, +data[0, 0] might. Programming languages like Fortran, +MATLAB and R start counting at 1 because that’s what human beings have +done for thousands of years. Languages in the C family (including C++, +Java, Perl, and Python) count from 0 because it represents an offset +from the first value in the array (the second value is offset by one +index from the first value). This is closer to the way that computers +represent arrays (if you are interested in the historical reasons behind +counting indices from zero, you can read Mike +Hoye’s blog post). As a result, if we have an M×N array in Python, +its indices go from 0 to M-1 on the first axis and 0 to N-1 on the +second. It takes a bit of getting used to, but one way to remember the +rule is that the index is how many steps we have to take from the start +to get the item we want.

+
'data' is a 3 by 3 numpy array containing row 0: ['A', 'B', 'C'], row 1: ['D', 'E', 'F'], androw 2: ['G', 'H', 'I']. Starting in the upper left hand corner, data[0, 0] = 'A', data[0, 1] = 'B',data[0, 2] = 'C', data[1, 0] = 'D', data[1, 1] = 'E', data[1, 2] = 'F', data[2, 0] = 'G',data[2, 1] = 'H', and data[2, 2] = 'I',in the bottom right hand corner.
+
+ +
+
+

In the Corner +

+
+

What may also surprise you is that when Python displays an array, it +shows the element with index [0, 0] in the upper left +corner rather than the lower left. This is consistent with the way +mathematicians draw matrices but different from the Cartesian +coordinates. The indices are (row, column) instead of (column, row) for +the same reason, which can be confusing when plotting data.

+
+
+
+

Slicing data + +

+
+

An index like [30, 20] selects a single element of an +array, but we can select whole sections as well. For example, we can +select the first ten days (columns) of values for the first four +patients (rows) like this:

+
+

PYTHON +

+
print(data[0:4, 0:10])
+
+
+

OUTPUT +

+
[[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.]
+ [ 0.  1.  2.  1.  2.  1.  3.  2.  2.  6.]
+ [ 0.  1.  1.  3.  3.  2.  6.  2.  5.  9.]
+ [ 0.  0.  2.  0.  4.  2.  2.  1.  6.  7.]]
+
+

The slice 0:4 means, +“Start at index 0 and go up to, but not including, index 4”. Again, the +up-to-but-not-including takes a bit of getting used to, but the rule is +that the difference between the upper and lower bounds is the number of +values in the slice.

+

We don’t have to start slices at 0:

+
+

PYTHON +

+
print(data[5:10, 0:10])
+
+
+

OUTPUT +

+
[[ 0.  0.  1.  2.  2.  4.  2.  1.  6.  4.]
+ [ 0.  0.  2.  2.  4.  2.  2.  5.  5.  8.]
+ [ 0.  0.  1.  2.  3.  1.  2.  3.  5.  3.]
+ [ 0.  0.  0.  3.  1.  5.  6.  5.  5.  8.]
+ [ 0.  1.  1.  2.  1.  3.  5.  3.  5.  8.]]
+
+

We also don’t have to include the upper and lower bound on the slice. +If we don’t include the lower bound, Python uses 0 by default; if we +don’t include the upper, the slice runs to the end of the axis, and if +we don’t include either (i.e., if we use ‘:’ on its own), the slice +includes everything:

+
+

PYTHON +

+
small = data[:3, 36:]
+print('small is:')
+print(small)
+
+

The above example selects rows 0 through 2 and columns 36 through to +the end of the array.

+
+

OUTPUT +

+
small is:
+[[ 2.  3.  0.  0.]
+ [ 1.  1.  0.  1.]
+ [ 2.  2.  1.  1.]]
+
+

Content from List and Dictionary Methods

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 40 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I store many values together?
  • +
  • How can I create a list succinctly?
  • +
  • How can I efficiently access nested data?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • Identify and create lists and dictionaries
  • +
  • Understand the properties and behaviours of lists and +dictionaries
  • +
  • Access values in lists and dictionaries
  • +
  • Create and access values from nest lists and dictionaries
  • +
+
+
+
+
+
+

Values can also be stored in other Python data types such as lists, +dictionaries, sets and tuples. Storing objects in a list is a fast and +versatile way to apply transformations across a sequence of values. +Storing objects in dictionary as key-value pairs is useful for +extracting specific values i.e. performing lookup operations.

+

Create and access lists + +

+
+

Lists have the following properties and behaviours:

+
    +
  • A single list can store different primitive object types and even +other lists
  • +
  • Lists are ordered and have a 0-based index
  • +
  • Lists can be appended to using the methods append() or +insert() +
  • +
  • Values inside a list can be removed using the methods +remove() or pop() +
  • +
  • Two lists can be concatenated with the operator + +
  • +
  • Values inside a list can be conditionally iterated through
  • +
  • A list is mutable i.e. the values inside a list can be modified in +place
  • +
+

To create a list, values are contained within square brackets +i.e. [] and individually separated by commas. The function +list() can also be used to create a list of values from an +iterable object like a string, set or tuple.

+
+

PYTHON +

+
# Create a list of integers using []
+list_1 = [1, 3, 5, 7]
+print(list_1)
+
+
+

OUTPUT +

+
[1, 3, 5, 7]
+
+
+

PYTHON +

+
# Unlike atomic vectors in R, a list can contain multiple primitive object types
+list_2 = [1, "one", 1.0, True]
+print(list_2)
+
+
+

OUTPUT +

+
[1, 'one', 1.0, True]
+
+
+

PYTHON +

+
# You can also use list() on an iterable object to convert it into a list
+string = 'abcdefg'  
+list_3 = list(string)  
+print(list_3)
+
+
+

OUTPUT +

+
['a', 'b', 'c', 'd', 'e', 'f', 'g']
+
+

Because lists have a 0-based index, we can access individual values +by their list index position. For 0-based indexes, the first value +always starts at position 0 i.e. the first element has an index of 0. +Accessing multiple values by their index positions is also referred to +as slicing or subsetting a list.

+

Note that we can use negative numbers as indices in Python. When we +do so, the index -1 gives us the last element in the list, +-2 gives us the second to last element in the list, and so +on.

+
+

PYTHON +

+
# Extract individual values from list_3
+print('first value:', list_3[0])
+print('second value:', list_3[1])
+print('last value:', list_3[-1])
+
+
+

OUTPUT +

+
first value: a
+second value: b
+last value: g
+
+
+

PYTHON +

+
# A syntax quirk for slicing values is to +1 to the last value's index 
+# To extract from index 0 to 2, we need to slice from [0:2+1] or [0:3]
+
+# Extract the first three values from list_3
+print('first 3 values:', list_3[0:3])
+
+# Start from index 0 and extract values from each subsequent second position
+print('every second value:', list_3[0::2])
+
+# Start from index 1, end at index 3 and extract from each subsequent second position
+print('every second value from index 1 to 3:', list_3[1:4:2])
+
+
+

OUTPUT +

+
first 3 values: ['a', 'b', 'c']
+every second value: ['a', 'c', 'e', 'g']
+every second value from index 1 to 3: ['b', 'd']
+
+

Change list values + +

+
+

Data which can be modified in place is called mutable, while data +which cannot be modified is called immutable. Strings and numbers are +immutable in that when we want to change the value of a string or number +variable, we can only replace the old value with a completely new +value.

+
+

PYTHON +

+
string = 'abcde'
+string[0] = 'b' # Produces a type error as strings are immutable
+
+# TypeError: 'str' object does not support item assignment
+
+

In contrast, lists are mutable and we can modify them after they have +been created. We can change individual values, append new values, or +reorder the whole list through sorting.

+
+

PYTHON +

+
list_4 = ['apple', 'pear', 'plum']
+print('original list_4:', list_4)
+
+# Change the first value i.e. modify the list in place
+list_4[0] = 'banana'
+print('modified list_4:', list_4)
+
+# Add new value to list using the method .insert(index number, value)
+list_4.insert(1, 'apple') # Index 1 refers to the second position
+print('appended list_4:', list_4)
+
+
+

OUTPUT +

+
original list_4: ['apple', 'pear', 'plum']
+modified list_4: ['banana', 'pear', 'plum']
+appended list_4: ['banana', 'apple', 'pear', 'plum']
+
+
+

PYTHON +

+
# Sorting a list also modifies it in place
+list_5 = [2, 1, 3, 7]
+list_5.sort()
+print('list_5:', list_5)
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+
+

However, be careful when modifying data in-place. If two variables +refer to the same list, and you modify the list value, it will change +for both variables!

+
+

PYTHON +

+
# When we assign list_6 to list_5, it means both list_6 and list_5 point to the
+# same list object, not that list_6 is a copy of list_5.  
+
+list_6 = list_5  
+print('list_5:', list_5)
+print('list_6:', list_6)
+
+# Change the first value in list_6 from 1 to 2 
+list_6[0] = 2 
+
+print('modified list_6:', list_6)
+print('unmodified list_5:', list_5)
+
+# Warning: list_5 and list_6 have both been modified in place!
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+list_6: [1, 2, 3, 7]
+modified list_6: [2, 2, 3, 7]
+unmodified list_5: [2, 2, 3, 7]
+
+

Because of this behaviour, code which modifies data in place should +be handled with care. You can also avoid this behaviour by expliciting +creating a copy of the original list and modifying only the object copy. +This is why creating a copy of the original data object can be useful in +Python.

+
+

PYTHON +

+
list_5 = [1, 2, 3, 7]
+list_7 = list_5.copy()  
+print('list_5:', list_5)
+print('list_7:', list_7)
+
+# As list_7 is a completely new object copied from list_5, modifying list_7 does
+# not affect list_5.  
+
+list_7[0] = 2 
+print('modified list_7:', list_7)
+print('unmodified list_5:', list_5)
+
+
+

OUTPUT +

+
list_5: [1, 2, 3, 7]
+list_7: [1, 2, 3, 7]
+modified list_7: [2, 2, 3, 7]
+unmodified list_5: [1, 2, 3, 7]
+
+

Useful list functions + +

+
+

There are a lot of functions and methods which can be applied to +lists, such as len(), max(), +index() and so forth. Mathematical operations do not work +on lists of integers, with the exception of +.

+

Note that + concatenates two lists into a single longer +list, rather than outputting the sum of two lists of numbers.

+
+

PYTHON +

+
list_8 = [1, 2, 3]
+list_9 = [4, 5, 6]
+
+list_8 + list_9 # This concatenates the lists and does not sum the two lists together
+
+
+

OUTPUT +

+
[1, 2, 3, 4, 5, 6]
+
+

In your spare time after this workshop, you can search for different +list functions and methods and test them out yourselves.

+

Nested lists + +

+
+

We have previously mentioned that lists can be used to store other +Python object types, including lists. This means that we can create +nested lists in Python i.e. lists containing lists containing values. +This property is useful when we have a collection of values that we want +to access or transform as a subgroup.

+

To create a nested list, we also use [] or +list() to contain one or more lists of values of +interest.

+
+

PYTHON +

+
veg_stock = [
+    ['lettuce', 'lettuce', 'tomato', 'zucchini'],
+    ['lettuce', 'lettuce', 'carrot', 'zucchini'],
+    ['lettuce', 'basil', 'tomato', 'zucchini']
+    ]
+
+# Check that veg_stock is a list object
+print(type(veg_stock))
+
+# Check that the first value in veg_stock is itself a list
+print(veg_stock[0], 'has type', type(veg_stock[0]))  
+
+
+

OUTPUT +

+
<class 'list'>
+['lettuce', 'lettuce', 'tomato', 'zucchini'] has type <class 'list'>
+
+

To extract the first sub-list within the veg_stock list +object, we refer to its index like we would with any other value inside +a list i.e. veg_stock[1] points to the second sub-list +within the veg_stock list.

+

To access an individual string value inside a sub-list, we make use +of a second index, which points to an individual value inside the +sub-list.

+
+

PYTHON +

+
print(veg_stock[0]) # Access the first sub-list 
+print(veg_stock[0][0]) # Access the first value in the first sub-list 
+
+print(type(veg_stock[0])) # The first value in veg_stock is a list
+print(type(veg_stock[0][0])) # The first value in the first list in veg_stock is a string
+
+
+

OUTPUT +

+
['lettuce', 'lettuce', 'tomato', 'zucchini']
+lettuce
+<class 'list'>
+<class 'str'>
+
+

In general, however, when we are analysing a large collection of +values, the best practice is to structure those values in columns and +rows as a tabular Pandas data frame object. This is covered in another +Carpentries Course called Python +for Social Sciences.

+

Lists are still incredibly versatile and useful when you have a +collection of values that need to be efficiently accessed or +transformed. For example, data frame column names are commonly extracted +and stored inside a list, so that the same transformation can then be +mapped across multiple columns.

+

Create and access dictionaries + +

+
+

A dictionary is a Python data type that is particularly suited for +enabling quick lookup operations on unstructured data sets.

+

A dictionary can therefore be thought of as an unordered list where +every item or value is associated with a unique key (i.e. a self-defined +index of unique strings or numbers). The index values are called keys +and a dictionary contains key-value pairs with the format +{key: value(s)}.

+

Dictionaries can be created by listing individual key-values pairs +inside {} or using dict().

+
+

PYTHON +

+
# A key-value pair can contain single or multiple values  
+# Keys are treated as case sensitive and unique
+# Multiple values are first stored inside a list  
+
+teams = {
+    'data science': ['Mei Ling', 'Paul', 'Gwen', 'Suresh'],
+    'user design': ['Amy', 'Linh', 'Sasha'],
+    'software dev': ['David', 'Prya'],
+    'comms': 'Taylor' 
+    } 
+
+

When using dict(), we need to indicate which key is +associated with which value. This can be done directly using tuples, +direct association i.e. using = or using +zip(), which creates a set of tuples from an iterable +list.

+
+

PYTHON +

+
# To use dict(), key-value pairs are can be stored inside tuples  
+ds_emp_status = dict([
+        ('Mei Ling', 'full time'),
+        ('Paul', 'full time'),
+        ('Gwen', 'part time'),
+        ('Suresh', 'part time')
+    ])  
+
+# Key-value pairs can also be assigned by direct association  
+# Keys cannot be strings i.e. wrapped in '' using this approach
+ud_emp_status = dict(
+    Amy = 'full time',
+    Linh = 'full time',
+    Sasha = 'casual' 
+    ) 
+
+# zip() can also be used if each key has only one value  
+sd_emp_status = dict(zip(
+    ['David', 'Prya'],
+    ['full time', 'full time']
+    ))
+
+

To access a specific value inside a dictionary, we need to specify +its key using []. This is similar to slicing or subsetting +a list by specifying its index using [].

+
+

PYTHON +

+
# Access the values associated with the key 'data science'
+print(teams['data science'])
+
+print('The object teams is of type', type(teams))
+print('The dict value', teams['data science'], 'is of type', type(teams['data science']))
+
+
+

OUTPUT +

+
['Mei Ling', 'Paul', 'Gwen', 'Suresh']
+The data object teams is of type <class 'dict'>
+The value ['Mei Ling', 'Paul', 'Gwen', 'Suresh'] is of type <class 'list'>
+
+

We can also access a value from a dictionary using the +get() method.

+
+

PYTHON +

+
print(teams.get('user design'))
+
+# get() also enables us to return an alternate string when the key is not found   
+# This prevents our code from returning an error message that halts the analysis
+
+print(teams.get('data engineering', 'WARNING: key does not exist'))
+
+
+

OUTPUT +

+
['Amy', 'Linh', 'Sasha']
+WARNING: key does not exist
+
+

To access data inside a dictionary, we can also perform the following +other actions:

+
    +
  • Check whether a key exists in a dictionary using the keyword +in +
  • +
  • Retrieve unique dictionary keys using dict.keys() +
  • +
  • Retrieve dictionary values using dict.values() +
  • +
  • Retrieve dictionary items using dict.items() +
  • +
+
+

PYTHON +

+
# Check whether a key exists in a dictionary 
+print('data science' in teams) 
+print('Data Science' in teams) # Keys are case sensitive  
+
+# Retrieve all dictionary keys  
+print(teams.keys())
+print(sd_emp_status.keys())
+
+# Retrieve all dictionary values  
+print(sd_emp_status.values())  
+
+# Retrieve all dictionary key-value pairs
+print(sd_emp_status.items())
+
+
+

OUTPUT +

+
True
+False
+dict_keys(['data science', 'user design', 'software dev', 'comms'])
+dict_keys(['David', 'Prya'])
+dict_values(['full time', 'full time'])
+dict_items([('David', 'full time'), ('Prya', 'full time')])
+
+

To add a new key-value pair to an existing dictionary, we can create +a new key and directly attach a new value to it using = or +alternatively use the method update().

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())  
+
+# Add new key-value pair using direct assignment  
+sd_emp_status['Mohammad'] = 'full time'
+
+# Add new key-value pair using update({'key': 'value'})   
+sd_emp_status.update({'Carrie': 'part time'})
+
+print('updated dict items:', sd_emp_status.items())    
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])
+updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'part time')])
+
+

Because keys are unique, a dictionary cannot contain two keys with +the same name. This means that adding an item using a key that is +already present in the dictionary will cause the previous value to be +overwritten.

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())  
+
+# As the key 'Carrie' already exists, its value will be overwritten
+sd_emp_status['Carrie'] = 'full time'
+print('updated dict items:', sd_emp_status.items())  
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'part time')])
+updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'full time')])
+
+

To remove a key-value pair for an existing dictionary, we can use the +del keyword or the method pop(). Using +pop() also enables us to return an alternate string if we +trt to remove a non-existing key, which prevents our code from returning +an error message that halts the analysis.

+
+

PYTHON +

+
print('original dict items:', sd_emp_status.items())
+
+# Delete dictionary keys using del and pop()
+del sd_emp_status['Mohammad']
+sd_emp_status.pop('Carrie')
+sd_emp_status.pop('Anuradha', 'WARNING: key does not exist') # Does not generate an error
+
+print('modified dict items:', sd_emp_status.items())  
+
+
+

OUTPUT +

+
original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
+('Mohammad', 'full time'), ('Carrie', 'full time')])
+modified dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])
+
+

Nested dictionaries + +

+
+

Similar to lists, dictionaries can be nested as we can also store +dictionaries as values inside a key-value pair using {}. +Nested dictionaries are useful when we need to store unstructured data +in a complex structure. For example, JSON data is commonly used for +transmitting data in web applications and often exists in a nested +structure that can be stored using nested dictionaries in Python.

+
+

PYTHON +

+
# Individual dictionaries are enclosed in {} and separated by a comma
+nested_dict = {
+    'dict_1': { # First key is a dictionary of key-value pairs 
+        'key_1a': 'value_1a',
+        'key_1b': 'value_1b'
+                },
+    'dict_2': { # Second key is another dictionary of key-value pairs
+        'key_2a': 'value_2a',
+        'key_2b': 'value_2b'
+                }
+            }
+
+print(nested_dict)
+
+
+

OUTPUT +

+
{'dict_1': {'key_1a': 'value_1a', 'key_1b': 'value_1b'},
+ 'dict_2': {'key_2a': 'value_2a', 'key_2b': 'value_2b'}}
+
+

Similar to working with nested lists, to extract a value from the +first sub-dictionary, we specify both the main dictionary and +sub-dictionary keys using [].

+
+

PYTHON +

+
# Extract the value for key 2a in dict_2
+print('original value:', nested_dict['dict_2']['key_2a'])
+
+# Adding or updating a value can be done through the same approach
+nested_dict['dict_2']['key_2a'] = "modified_value_2a"  
+
+print('modified value:', nested_dict['dict_2']['key_2a'])
+
+
+

OUTPUT +

+
original value: value_2a
+modified value: modified_value_2a
+
+

Optional: converting lists and dictionaries to Pandas data +frames + +

+
+

Lists and dictionaries can be easily converted into a tabular Pandas +data frame format. This can be useful when you need to create a small +data set for unit testing purposes.

+
+

PYTHON +

+
# Import pandas library
+import pandas as pd
+
+# Create a dictionary with each key-value pair representing a data frame column
+data = {
+    'col_1': [3, 2, 1, 0],
+    'col_2': ['a', 'b', 'c', 'd']
+    }
+
+df = pd.DataFrame.from_dict(data) 
+
+print(df) # Outputs data as a tabular Pandas data frame   
+print(type(df))
+
+
+

OUTPUT +

+
   col_1 col_2
+0      3     a
+1      2     b
+2      1     c
+3      0     d
+<class 'pandas.core.frame.DataFrame'>
+
+
+
+ +
+
+

Key Points +

+
+
    +
  • Lists can contain any Python object including other lists
  • +
  • Lists are ordered i.e. indexed and can therefore be sliced by index +number
  • +
  • Unlike strings and integers, the values inside a list can be +modified in place
  • +
  • A list which contains other lists is referred to as a nested +list
  • +
  • Dictionaries behave like unordered lists and are defined using +key-value pairs
  • +
  • Dictionary keys are unique
  • +
  • A dictionary which contains other dictionaries is referred to as a +nested dictionary
  • +
  • Values inside nested lists and dictionaries can be accessed by an +additional index
  • +
+
+
+
+

Content from Loops and Conditional Logic

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 60 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I do the same operations on many different values?
  • +
  • How can my programs do different things based on data values?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify and create loops
  • +
  • use logical statements to allow for decision-based operations in +code
  • +
+
+
+
+
+
+

This episode contains two lessons:

+
    +
  1. Repeating Actions with +Loops
  2. +
  3. Making Choices with +Conditional Logic
  4. +
+

Repeating Actions with Loops + +

+
+

In the episode about visualizing +data, we will see Python code that plots values of interest from our +first inflammation dataset (inflammation-01.csv), which +revealed some suspicious features.

+
Line graphs showing average, maximum, and minimum inflammation across all patients over a 40-day period.

We have a dozen data sets right now and potentially more on the way +if Dr. Maverick can keep up their surprisingly fast clinical trial rate. +We want to create plots for all of our data sets with a single +statement. To do that, we’ll have to teach the computer how to repeat +things.

+

An example task that we might want to repeat is accessing numbers in +a list, which we will do by printing each number on a line of its +own.

+
+

PYTHON +

+
odds = [1, 3, 5, 7]
+
+

In Python, a list is basically an ordered +collection of elements, and every element has a unique number associated +with it — its index. This means that we can access elements in a list +using their indices. For example, we can get the first number in the +list odds, by using odds[0]. One way to print +each number is to use four print statements:

+
+

PYTHON +

+
print(odds[0])
+print(odds[1])
+print(odds[2])
+print(odds[3])
+
+
+

OUTPUT +

+
1
+3
+5
+7
+
+

This is a bad approach for three reasons:

+
    +
  1. Not scalable. Imagine you need to print a list +that has hundreds of elements. It might be easier to type them in +manually.

  2. +
  3. Difficult to maintain. If we want to decorate +each printed element with an asterisk or any other character, we would +have to change four lines of code. While this might not be a problem for +small lists, it would definitely be a problem for longer ones.

  4. +
  5. Fragile. If we use it with a list that has more +elements than what we initially envisioned, it will only display part of +the list’s elements. A shorter list, on the other hand, will cause an +error because it will be trying to display elements of the list that do +not exist.

  6. +
+
+

PYTHON +

+
odds = [1, 3, 5]
+print(odds[0])
+print(odds[1])
+print(odds[2])
+print(odds[3])
+
+
+

PYTHON +

+
1
+3
+5
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-3-7974b6cdaf14> in <module>()
+      3 print(odds[1])
+      4 print(odds[2])
+----> 5 print(odds[3])
+
+IndexError: list index out of range
+
+

Here’s a better approach: a for +loop

+
+

PYTHON +

+
odds = [1, 3, 5, 7]
+for num in odds:
+    print(num)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+
+

This is shorter — certainly shorter than something that prints every +number in a hundred-number list — and more robust as well:

+
+

PYTHON +

+
odds = [1, 3, 5, 7, 9, 11]
+for num in odds:
+    print(num)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+9
+11
+
+

The improved version uses a for +loop to repeat an operation — in this case, printing — once for each +thing in a sequence. The general form of a loop is:

+
+

PYTHON +

+
for variable in collection:
+    # do things using variable, such as print
+
+

Using the odds example above, the loop might look like this:

+
Loop variable 'num' being assigned the value of each element in the list odds in turn andthen being printed

where each number (num) in the variable +odds is looped through and printed one number after +another. The other numbers in the diagram denote which loop cycle the +number was printed in (1 being the first loop cycle, and 6 being the +final loop cycle).

+

We can call the loop +variable anything we like, but there must be a colon at the end of +the line starting the loop, and we must indent anything we want to run +inside the loop. Unlike many other languages, there is no command to +signify the end of the loop body (e.g., end for); +everything indented after the for statement belongs to the +loop.

+
+
+ +
+
+

What’s in a name? +

+
+

In the example above, the loop variable was given the name +num as a mnemonic; it is short for ‘number’. We can choose +any name we want for variables. We might just as easily have chosen the +name banana for the loop variable, as long as we use the +same name when we invoke the variable inside the loop:

+
+

PYTHON +

+
odds = [1, 3, 5, 7, 9, 11]
+for banana in odds:
+   print(banana)
+
+
+

OUTPUT +

+
1
+3
+5
+7
+9
+11
+
+

It is a good idea to choose variable names that are meaningful, +otherwise it would be more difficult to understand what the loop is +doing.

+
+
+
+

Here’s another loop that repeatedly updates a variable:

+
+

PYTHON +

+
length = 0
+names = ['Curie', 'Darwin', 'Turing']
+for value in names:
+    length = length + 1
+print('There are', length, 'names in the list.')
+
+
+

OUTPUT +

+
There are 3 names in the list.
+
+

It’s worth tracing the execution of this little program step by step. +Since there are three names in names, the statement on line +4 will be executed three times. The first time around, +length is zero (the value assigned to it on line 1) and +value is Curie. The statement adds 1 to the +old value of length, producing 1, and updates +length to refer to that new value. The next time around, +value is Darwin and length is 1, +so length is updated to be 2. After one more update, +length is 3; since there is nothing left in +names for Python to process, the loop finishes and the +print function on line 5 tells us our final answer.

+

Note that a loop variable +is a variable that is being used to record progress in a loop. It still +exists after the loop is over, and we can re-use variables previously +defined as loop variables as +well:

+
+

PYTHON +

+
name = 'Rosalind'
+for name in ['Curie', 'Darwin', 'Turing']:
+    print(name)
+print('after the loop, name is', name)
+
+
+

OUTPUT +

+
Curie
+Darwin
+Turing
+after the loop, name is Turing
+
+

Note also that finding the length of an object is such a common +operation that Python actually has a built-in function to do it called +len:

+
+

PYTHON +

+
print(len([0, 1, 2, 3]))
+
+
+

OUTPUT +

+
4
+
+

len is much faster than any function we could write +ourselves, and much easier to read than a two-line loop; it will also +give us the length of many other data types we haven’t seen yet, so we +should always use it when we can.

+
+
+ +
+
+

From 1 to N +

+
+

Python has a built-in function called range that +generates a sequence of numbers range can accept 1, 2, or 3 +parameters.

+
    +
  • If one parameter is given, range generates a sequence +of that length, starting at zero and incrementing by 1. For example, +range(3) produces the numbers 0, 1, 2.
  • +
  • If two parameters are given, range starts at the first +and ends just before the second, incrementing by one. For example, +range(2, 5) produces 2, 3, 4.
  • +
  • If range is given 3 parameters, it starts at the first +one, ends just before the second one, and increments by the third one. +For example, range(3, 10, 2) produces +3, 5, 7, 9.
  • +
+

Using range, write a loop that uses range +to print the first 3 natural numbers:

+
+

OUTPUT +

+
1
+2
+3
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
for number in range(1, 4):
+   print(number)
+
+
+
+
+
+
+
+ +
+
+

Understanding the loops +

+
+

Given the following loop:

+
+

PYTHON +

+
word = 'oxygen'
+for letter in word:
+    print(letter)
+
+

How many times is the body of the loop executed?

+
    +
  • 3 times
  • +
  • 4 times
  • +
  • 5 times
  • +
  • 6 times
  • +
+
+
+
+
+
+ +
+
+

The body of the loop is executed 6 times.

+
+
+
+
+
+
+ +
+
+

Computing Powers With Loops +

+
+

Exponentiation is built into Python:

+
+

PYTHON +

+
print(5 ** 3)
+
+
+

OUTPUT +

+
125
+
+

Write a loop that calculates the same result as 5 ** 3 +using multiplication (and without exponentiation).

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
result = 1
+for number in range(0, 3):
+    result = result * 5
+print(result)
+
+
+
+
+
+
+
+ +
+
+

Summing a List +

+
+

Write a loop that calculates the sum of elements in a list by adding +each element and printing the final value, so +[124, 402, 36] prints 562

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
numbers = [124, 402, 36]
+summed = 0
+for num in numbers:
+    summed = summed + num
+print(summed)
+
+
+
+
+
+
+
+ +
+
+

Computing the Value of a Polynomial +

+
+

The built-in function enumerate takes a sequence (e.g., +a list) and generates a new sequence of the +same length. Each element of the new sequence is a pair composed of the +index (0, 1, 2,…) and the value from the original sequence:

+
+

PYTHON +

+
for idx, val in enumerate(a_list):
+    # Do something using idx and val
+
+

The code above loops through a_list, assigning the index +to idx and the value to val.

+

Suppose you have encoded a polynomial as a list of coefficients in +the following way: the first element is the constant term, the second +element is the coefficient of the linear term, the third is the +coefficient of the quadratic term, etc.

+
+

PYTHON +

+
x = 5
+coefs = [2, 4, 3]
+y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2
+print(y)
+
+
+

OUTPUT +

+
97
+
+

Write a loop using enumerate(coefs) which computes the +value y of any polynomial, given x and +coefs.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
y = 0
+for idx, coef in enumerate(coefs):
+    y = y + coef * x**idx
+
+
+
+
+
+

Making Choices with Conditional Logic + +

+
+

How can we use Python to automatically recognize different situations +we encounter with our data and take a different action for each? In this +lesson, we’ll learn how to write code that runs only when certain +conditions are true.

+
+

Conditionals +

+

We can ask Python to take different actions, depending on a +condition, with an if statement:

+
+

PYTHON +

+
num = 37
+if num > 100:
+    print('greater')
+else:
+    print('not greater')
+print('done')
+
+
+

OUTPUT +

+
not greater
+done
+
+

The second line of this code uses the keyword if to tell +Python that we want to make a choice. If the test that follows the +if statement is true, the body of the if +(i.e., the set of lines indented underneath it) is executed, and +“greater” is printed. If the test is false, the body of the +else is executed instead, and “not greater” is printed. +Only one or the other is ever executed before continuing on with program +execution to print “done”:

+
A flowchart diagram of the if-else construct that tests if variable num is greater than 100

Conditional +statements don’t have to include an else. If there +isn’t one, Python simply does nothing if the test is false:

+
+

PYTHON +

+
num = 53
+print('before conditional...')
+if num > 100:
+    print(num, 'is greater than 100')
+print('...after conditional')
+
+
+

OUTPUT +

+
before conditional...
+...after conditional
+
+

We can also chain several tests together using elif, +which is short for “else if”. The following Python code uses +elif to print the sign of a number.

+
+

PYTHON +

+
num = -3
+
+if num > 0:
+    print(num, 'is positive')
+elif num == 0:
+    print(num, 'is zero')
+else:
+    print(num, 'is negative')
+
+
+

OUTPUT +

+
-3 is negative
+
+

Note that to test for equality we use a double equals sign +== rather than a single equals sign = which is +used to assign values.

+
+
+ +
+
+

Comparing in Python +

+
+

Along with the > and == operators we +have already used for comparing values in our conditionals, there are a +few more options to know about:

+
    +
  • +>: greater than
  • +
  • +<: less than
  • +
  • +==: equal to
  • +
  • +!=: does not equal
  • +
  • +>=: greater than or equal to
  • +
  • +<=: less than or equal to
  • +
+
+
+
+

We can also combine tests using and and or. +and is only true if both parts are true:

+
+

PYTHON +

+
if (1 > 0) and (-1 >= 0):
+    print('both parts are true')
+else:
+    print('at least one part is false')
+
+
+

OUTPUT +

+
at least one part is false
+
+

while or is true if at least one part is true:

+
+

PYTHON +

+
if (1 < 0) or (1 >= 0):
+    print('at least one test is true')
+
+
+

OUTPUT +

+
at least one test is true
+
+
+
+ +
+
+

+True and False +

+
+

True and False are special words in Python +called booleans, which represent truth values. A statement +such as 1 < 0 returns the value False, +while -1 < 0 returns the value True.

+
+
+
+
+
+

Checking Our Data +

+

Now that we’ve seen how conditionals work, we can use them to check +for the suspicious features we saw in our inflammation data. We are +about to use functions provided by the numpy module again. +Therefore, if you’re working in a new Python session, make sure to load +the module with:

+
+

PYTHON +

+
import numpy
+
+

From the first couple of plots, we saw that maximum daily +inflammation exhibits a strange behavior and raises one unit a day. +Wouldn’t it be a good idea to detect such behavior and report it as +suspicious? Let’s do that! However, instead of checking every single day +of the study, let’s merely check if maximum inflammation in the +beginning (day 0) and in the middle (day 20) of the study are equal to +the corresponding day numbers.

+
+

PYTHON +

+
max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+
+

We also saw a different problem in the third dataset; the minima per +day were all zero (looks like a healthy person snuck into our study). We +can also check for this with an elif condition:

+
+

PYTHON +

+
elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+
+

And if neither of these conditions are true, we can use +else to give the all-clear:

+
+

PYTHON +

+
else:
+    print('Seems OK!')
+
+

Let’s test that out:

+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+
+
+

OUTPUT +

+
Suspicious looking maxima!
+
+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-03.csv', delimiter=',')
+
+max_inflammation_0 = numpy.amax(data, axis=0)[0]
+max_inflammation_20 = numpy.amax(data, axis=0)[20]
+
+if max_inflammation_0 == 0 and max_inflammation_20 == 20:
+    print('Suspicious looking maxima!')
+elif numpy.sum(numpy.amin(data, axis=0)) == 0:
+    print('Minima add up to zero!')
+else:
+    print('Seems OK!')
+
+
+

OUTPUT +

+
Minima add up to zero!
+
+

In this way, we have asked Python to do something different depending +on the condition of our data. Here we printed messages in all cases, but +we could also imagine not using the else catch-all so that +messages are only printed when something is wrong, freeing us from +having to manually examine every plot for features we’ve seen +before.

+
+
+ +
+
+

How Many Paths? +

+
+

Consider this code:

+
+

PYTHON +

+
if 4 > 5:
+    print('A')
+elif 4 == 5:
+    print('B')
+elif 4 < 5:
+    print('C')
+
+

Which of the following would be printed if you were to run this code? +Why did you pick this answer?

+
    +
  1. A
  2. +
  3. B
  4. +
  5. C
  6. +
  7. B and C
  8. +
+
+
+
+
+
+ +
+
+

C gets printed because the first two conditions, +4 > 5 and 4 == 5, are not true, but +4 < 5 is true. In this case, only one of these +conditions can be true for at a time, but in other scenarios multiple +elif conditions could be met. In these scenarios, only the +action associated with the first true elif condition will +occur, starting from the top of the conditional section.

+
A flowchart diagram of a conditional section with multiple elif conditions and some > possible outcomes.

This contrasts with the case of multiple if statements, +where every action can occur as long as their condition is met.

+
A flowchart diagram of a conditional section with multiple if statements and some possible outcomes.
+
+
+
+
+
+
+ +
+
+

What Is Truth? +

+
+

True and False booleans are not the only +values in Python that are true and false. In fact, any value +can be used in an if or elif. After reading +and running the code below, explain what the rule is for which values +are considered true and which are > considered false.

+
+

PYTHON +

+
if '':
+    print('empty string is true')
+if 'word':
+    print('word is true')
+if []:
+    print('empty list is true')
+if [1, 2, 3]:
+    print('non-empty list is true')
+if 0:
+    print('zero is true')
+if 1:
+    print('one is true')
+
+
+
+
+
+
+ +
+
+

That’s Not Not What I Meant +

+
+

Sometimes it is useful to check whether some condition is +not true. The Boolean operator not can do this +explicitly. After reading and running the code below, write some +if statements that use not to test the rule +that you formulated in the previous challenge.

+
+

PYTHON +

+
if not '':
+    print('empty string is not true')
+if not 'word':
+    print('word is not true')
+if not not True:
+    print('not not True is true')
+
+
+
+
+
+
+ +
+
+

Close Enough +

+
+

Write some conditions that print True if the variable +a is within 10% of the variable b and +False otherwise. Compare your implementation with your +partner’s. Do you get the same answer for all possible pairs of +numbers?

+
+
+
+
+
+ +
+
+

There is a built-in +function abs that returns the absolute value of a +number:

+
+

PYTHON +

+
print(abs(-12))
+
+
+

OUTPUT +

+
12
+
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
a = 5
+b = 5.1
+
+if abs(a - b) <= 0.1 * abs(b):
+    print('True')
+else:
+    print('False')
+
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
print(abs(a - b) <= 0.1 * abs(b))
+
+

This works because the Booleans True and +False have string representations which can be printed.

+
+
+
+
+
+
+ +
+
+

In-Place Operators +

+
+

Python (and most other languages in the C family) provides in-place operators that +work like this:

+
+

PYTHON +

+
x = 1  # original value
+x += 1 # add one to x, assigning result back to x
+x *= 3 # multiply x by 3
+print(x)
+
+
+

OUTPUT +

+
6
+
+

Write some code that sums the positive and negative numbers in a list +separately, using in-place operators. Do you think the result is more or +less readable than writing the same without in-place operators?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
positive_sum = 0
+negative_sum = 0
+test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
+for num in test_list:
+    if num > 0:
+        positive_sum += num
+    elif num == 0:
+        pass
+    else:
+        negative_sum += num
+print(positive_sum, negative_sum)
+
+

Here pass means “don’t do anything”. In this particular +case, it’s not actually needed, since if num == 0 neither +sum needs to change, but it illustrates the use of elif and +pass.

+
+
+
+
+
+
+ +
+
+

Sorting a List Into Buckets +

+
+

In our data folder, large data sets are stored in files +whose names start with “inflammation-” and small data sets – in files +whose names start with “small-”. We also have some other files that we +do not care about at this point. We’d like to break all these files into +three lists called large_files, small_files, +and other_files, respectively.

+

Add code to the template below to do this. Note that the string +method startswith +returns True if and only if the string it is called on +starts with the string passed as an argument, that is:

+
+

PYTHON +

+
'String'.startswith('Str')
+
+
+

OUTPUT +

+
True
+
+

But

+
+

PYTHON +

+
'String'.startswith('str')
+
+
+

OUTPUT +

+
False
+
+

Use the following Python code as your starting point:

+
+

PYTHON +

+
filenames = ['inflammation-01.csv',
+         'myscript.py',
+         'inflammation-02.csv',
+         'small-01.csv',
+         'small-02.csv']
+large_files = []
+small_files = []
+other_files = []
+
+

Your solution should:

+
    +
  1. loop over the names of the files
  2. +
  3. figure out which group each filename belongs in
  4. +
  5. append the filename to that list
  6. +
+

In the end the three lists should be:

+
+

PYTHON +

+
large_files = ['inflammation-01.csv', 'inflammation-02.csv']
+small_files = ['small-01.csv', 'small-02.csv']
+other_files = ['myscript.py']
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
for filename in filenames:
+    if filename.startswith('inflammation-'):
+        large_files.append(filename)
+    elif filename.startswith('small-'):
+        small_files.append(filename)
+    else:
+        other_files.append(filename)
+
+print('large_files:', large_files)
+print('small_files:', small_files)
+print('other_files:', other_files)
+
+
+
+
+
+
+
+ +
+
+
    +
  1. Write a loop that counts the number of vowels in a character +string.
  2. +
  3. Test it on a few individual words and full sentences.
  4. +
  5. Once you are done, compare your solution to your neighbor’s. Did you +make the same decisions about how to handle the letter ‘y’ (which some +people think is a vowel, and some do not)?
  6. +
+
+

Solution +

+
vowels = 'aeiouAEIOU'
+sentence = 'Mary had a little lamb.'
+count = 0
+for char in sentence:
+   if char in vowels:
+       count += 1
+
+print('The number of vowels in this string is ' + str(count))
+

{.challenge}

+
+
+
+
+
+
+
+ +
+
+

Key Points +

+
+
    +
  • Use for variable in sequence to process the elements of +a sequence one at a time.
  • +
  • The body of a for loop must be indented.
  • +
  • Use len(thing) to determine the length of something +that contains other values.
  • +
  • Use if condition to start a conditional statement, +elif condition to provide additional tests, and +else to provide a default.
  • +
  • The bodies of the branches of conditional statements must be +indented.
  • +
  • Use == to test for equality.
  • +
  • +X and Y is only true if both X and +Y are true.
  • +
  • +X or Y is true if either X or +Y, or both, are true.
  • +
  • Zero, the empty string, and the empty list are considered false; all +other numbers, strings, and lists are considered true.
  • +
  • +True and False represent truth +values.
  • +
+
+
+
+
+

Content from Alternatives to Loops

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 30 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I vectorize my loops?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify what vectorized operations are
  • +
  • perform basic vectorized operations
  • +
+
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
    +
  • NULL
  • +
+
+
+

Content from Creating Functions

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 40 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • What are functions, and how can I use them in Python?
  • +
  • How can I define new functions?
  • +
  • What’s the difference between defining and calling a function?
  • +
  • What happens when I call a function?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify what a function is
  • +
  • create new functions
  • +
  • Set default values for function parameters.
  • +
  • Explain why we should divide programs into small, single-purpose +functions.
  • +
+
+
+
+
+
+

At this point, we’ve seen that code can have Python make decisions +about what it sees in our data. What if we want to convert some of our +data, like taking a temperature in Fahrenheit and converting it to +Celsius. We could write something like this for converting a single +number

+
+

PYTHON +

+
fahrenheit_val = 99
+celsius_val = ((fahrenheit_val - 32) * (5/9))
+
+

and for a second number we could just copy the line and rename the +variables

+
+

PYTHON +

+
fahrenheit_val = 99
+celsius_val = ((fahrenheit_val - 32) * (5/9))
+
+fahrenheit_val2 = 43
+celsius_val2 = ((fahrenheit_val2 - 32) * (5/9))
+
+

But we would be in trouble as soon as we had to do this more than a +couple times. Cutting and pasting it is going to make our code get very +long and very repetitive, very quickly. We’d like a way to package our +code so that it is easier to reuse, a shorthand way of re-executing +longer pieces of code. In Python we can use ‘functions’. Let’s start by +defining a function fahr_to_celsius that converts +temperatures from Fahrenheit to Celsius:

+
+

PYTHON +

+
def explicit_fahr_to_celsius(temp):
+    # Assign the converted value to a variable
+    converted = ((temp - 32) * (5/9))
+    # Return the value of the new variable
+    return converted
+    
+def fahr_to_celsius(temp):
+    # Return converted value more efficiently using the return
+    # function without creating a new variable. This code does
+    # the same thing as the previous function but it is more explicit
+    # in explaining how the return command works.
+    return ((temp - 32) * (5/9))
+
+
Labeled parts of a Python function definition

The function definition opens with the keyword def +followed by the name of the function (fahr_to_celsius) and +a parenthesized list of parameter names (temp). The body of the function — the statements +that are executed when it runs — is indented below the definition line. +The body concludes with a return keyword followed by the +return value.

+

When we call the function, the values we pass to it are assigned to +those variables so that we can use them inside the function. Inside the +function, we use a return +statement to send a result back to whoever asked for it.

+

Let’s try running our function.

+
+

PYTHON +

+
fahr_to_celsius(32)
+
+

This command should call our function, using “32” as the input and +return the function value.

+

In fact, calling our own function is no different from calling any +other function:

+
+

PYTHON +

+
print('freezing point of water:', fahr_to_celsius(32), 'C')
+print('boiling point of water:', fahr_to_celsius(212), 'C')
+
+
+

OUTPUT +

+
freezing point of water: 0.0 C
+boiling point of water: 100.0 C
+
+

We’ve successfully called the function that we defined, and we have +access to the value that we returned.

+

Composing Functions + +

+
+

Now that we’ve seen how to turn Fahrenheit into Celsius, we can also +write the function to turn Celsius into Kelvin:

+
+

PYTHON +

+
def celsius_to_kelvin(temp_c):
+    return temp_c + 273.15
+
+print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))
+
+
+

OUTPUT +

+
freezing point of water in Kelvin: 273.15
+
+

What about converting Fahrenheit to Kelvin? We could write out the +formula, but we don’t need to. Instead, we can compose the two functions we have +already created:

+
+

PYTHON +

+
def fahr_to_kelvin(temp_f):
+    temp_c = fahr_to_celsius(temp_f)
+    temp_k = celsius_to_kelvin(temp_c)
+    return temp_k
+
+print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))
+
+
+

OUTPUT +

+
boiling point of water in Kelvin: 373.15
+
+

This is our first taste of how larger programs are built: we define +basic operations, then combine them in ever-larger chunks to get the +effect we want. Real-life functions will usually be larger than the ones +shown here — typically half a dozen to a few dozen lines — but they +shouldn’t ever be much longer than that, or the next person who reads it +won’t be able to understand what’s going on.

+

Variable Scope + +

+
+

In composing our temperature conversion functions, we created +variables inside of those functions, temp, +temp_c, temp_f, and temp_k. We +refer to these variables as local variables because they no +longer exist once the function is done executing. If we try to access +their values outside of the function, we will encounter an error:

+
+

PYTHON +

+
print('Again, temperature in Kelvin was:', temp_k)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-1-eed2471d229b> in <module>
+----> 1 print('Again, temperature in Kelvin was:', temp_k)
+
+NameError: name 'temp_k' is not defined
+
+

If you want to reuse the temperature in Kelvin after you have +calculated it with fahr_to_kelvin, you can store the result +of the function call in a variable:

+
+

PYTHON +

+
temp_kelvin = fahr_to_kelvin(212.0)
+print('temperature in Kelvin was:', temp_kelvin)
+
+
+

OUTPUT +

+
temperature in Kelvin was: 373.15
+
+

The variable temp_kelvin, being defined outside any +function, is said to be global.

+

Inside a function, one can read the value of such global +variables:

+
+

PYTHON +

+
def print_temperatures():
+  print('temperature in Fahrenheit was:', temp_fahr)
+  print('temperature in Kelvin was:', temp_kelvin)
+
+temp_fahr = 212.0
+temp_kelvin = fahr_to_kelvin(temp_fahr)
+
+print_temperatures()
+
+
+

OUTPUT +

+
temperature in Fahrenheit was: 212.0
+temperature in Kelvin was: 373.15
+
+

By giving our functions human-readable names, we can more easily read +and understand what is happening in the for loop. Even +better, if at some later date we want to use either of those pieces of +code again, we can do so in a single line.

+

Testing and Documenting + +

+
+

Once we start putting things in functions so that we can re-use them, +we need to start testing that those functions are working correctly. To +see how to do this, let’s write a function to offset a dataset so that +it’s mean value shifts to a user-defined value:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    return (data - numpy.mean(data)) + target_mean_value
+
+

We could test this on our actual data, but since we don’t know what +the values ought to be, it will be hard to tell if the result was +correct. Instead, let’s use NumPy to create a matrix of 0’s and then +offset its values to have a mean value of 3:

+
+

PYTHON +

+
z = numpy.zeros((2,2))
+print(offset_mean(z, 3))
+
+
+

OUTPUT +

+
[[ 3.  3.]
+ [ 3.  3.]]
+
+

That looks right, so let’s try offset_mean on our real +data:

+
+

PYTHON +

+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print(offset_mean(data, 0))
+
+
+

OUTPUT +

+
[[-6.14875 -6.14875 -5.14875 ... -3.14875 -6.14875 -6.14875]
+ [-6.14875 -5.14875 -4.14875 ... -5.14875 -6.14875 -5.14875]
+ [-6.14875 -5.14875 -5.14875 ... -4.14875 -5.14875 -5.14875]
+ ...
+ [-6.14875 -5.14875 -5.14875 ... -5.14875 -5.14875 -5.14875]
+ [-6.14875 -6.14875 -6.14875 ... -6.14875 -4.14875 -6.14875]
+ [-6.14875 -6.14875 -5.14875 ... -5.14875 -5.14875 -6.14875]]
+
+

It’s hard to tell from the default output whether the result is +correct, but there are a few tests that we can run to reassure us:

+
+

PYTHON +

+
print('original min, mean, and max are:', numpy.amin(data), numpy.mean(data), numpy.amax(data))
+offset_data = offset_mean(data, 0)
+print('min, mean, and max of offset data are:',
+      numpy.amin(offset_data),
+      numpy.mean(offset_data),
+      numpy.amax(offset_data))
+
+
+

OUTPUT +

+
original min, mean, and max are: 0.0 6.14875 20.0
+min, mean, and and max of offset data are: -6.14875 2.84217094304e-16 13.85125
+
+

That seems almost right: the original mean was about 6.1, so the +lower bound from zero is now about -6.1. The mean of the offset data +isn’t quite zero — we’ll explore why not in the challenges — but it’s +pretty close. We can even go further and check that the standard +deviation hasn’t changed:

+
+

PYTHON +

+
print('std dev before and after:', numpy.std(data), numpy.std(offset_data))
+
+
+

OUTPUT +

+
std dev before and after: 4.61383319712 4.61383319712
+
+

Those values look the same, but we probably wouldn’t notice if they +were different in the sixth decimal place. Let’s do this instead:

+
+

PYTHON +

+
print('difference in standard deviations before and after:',
+      numpy.std(data) - numpy.std(offset_data))
+
+
+

OUTPUT +

+
difference in standard deviations before and after: -3.5527136788e-15
+
+

Again, the difference is very small. It’s still possible that our +function is wrong, but it seems unlikely enough that we should probably +get back to doing our analysis.

+

Documentation + +

+
+

We have one more task first, though: we should write some documentation for our function +to remind ourselves later what it’s for and how to use it.

+

The usual way to put documentation in software is to add comments like this:

+
+

PYTHON +

+
# offset_mean(data, target_mean_value):
+# return a new array containing the original data with its mean offset to match the desired value.
+def offset_mean(data, target_mean_value):
+    return (data - numpy.mean(data)) + target_mean_value
+
+

There’s a better way, though. If the first thing in a function is a +string that isn’t assigned to a variable, that string is attached to the +function as its documentation:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value."""
+    return (data - numpy.mean(data)) + target_mean_value
+
+

This is better because we can now ask Python’s built-in help system +to show us the documentation for the function:

+
+

PYTHON +

+
help(offset_mean)
+
+
+

OUTPUT +

+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+    Return a new array containing the original data with its mean offset to match the desired value.
+
+

A string like this is called a docstring. We don’t need to use +triple quotes when we write one, but if we do, we can break the string +across multiple lines:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value.
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3], 0)
+    array([-1.,  0.,  1.])
+    """
+    return (data - numpy.mean(data)) + target_mean_value
+
+help(offset_mean)
+
+
+

OUTPUT +

+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+    Return a new array containing the original data
+       with its mean offset to match the desired value.
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3], 0)
+    array([-1.,  0.,  1.])
+
+

Defining Defaults + +

+
+

We have passed parameters to functions in two ways: directly, as in +type(data), and by name, as in +numpy.loadtxt(fname='something.csv', delimiter=','). In +fact, we can pass the filename to loadtxt without the +fname=:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', delimiter=',')
+
+
+

OUTPUT +

+
array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
+       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
+       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
+       ...,
+       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
+       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
+       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
+
+

but we still need to say delimiter=:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', ',')
+
+
+

ERROR +

+
Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1041, in loa
+dtxt
+    dtype = np.dtype(dtype)
+  File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/core/_internal.py", line 199, in
+_commastring
+    newitem = (dtype, eval(repeats))
+  File "<string>", line 1
+    ,
+    ^
+SyntaxError: unexpected EOF while parsing
+
+

To understand what’s going on, and make our own functions easier to +use, let’s re-define our offset_mean function like +this:

+
+

PYTHON +

+
def offset_mean(data, target_mean_value=0.0):
+    """Return a new array containing the original data
+       with its mean offset to match the desired value, (0 by default).
+
+    Examples
+    --------
+    >>> offset_mean([1, 2, 3])
+    array([-1.,  0.,  1.])
+    """
+    return (data - numpy.mean(data)) + target_mean_value
+
+

The key change is that the second parameter is now written +target_mean_value=0.0 instead of just +target_mean_value. If we call the function with two +arguments, it works as it did before:

+
+

PYTHON +

+
test_data = numpy.zeros((2, 2))
+print(offset_mean(test_data, 3))
+
+
+

OUTPUT +

+
[[ 3.  3.]
+ [ 3.  3.]]
+
+

But we can also now call it with just one parameter, in which case +target_mean_value is automatically assigned the default value of 0.0:

+
+

PYTHON +

+
more_data = 5 + numpy.zeros((2, 2))
+print('data before mean offset:')
+print(more_data)
+print('offset data:')
+print(offset_mean(more_data))
+
+
+

OUTPUT +

+
data before mean offset:
+[[ 5.  5.]
+ [ 5.  5.]]
+offset data:
+[[ 0.  0.]
+ [ 0.  0.]]
+
+

This is handy: if we usually want a function to work one way, but +occasionally need it to do something else, we can allow people to pass a +parameter when they need to but provide a default to make the normal +case easier. The example below shows how Python matches values to +parameters:

+
+

PYTHON +

+
def display(a=1, b=2, c=3):
+    print('a:', a, 'b:', b, 'c:', c)
+
+print('no parameters:')
+display()
+print('one parameter:')
+display(55)
+print('two parameters:')
+display(55, 66)
+
+
+

OUTPUT +

+
no parameters:
+a: 1 b: 2 c: 3
+one parameter:
+a: 55 b: 2 c: 3
+two parameters:
+a: 55 b: 66 c: 3
+
+

As this example shows, parameters are matched up from left to right, +and any that haven’t been given a value explicitly get their default +value. We can override this behavior by naming the value as we pass it +in:

+
+

PYTHON +

+
print('only setting the value of c')
+display(c=77)
+
+
+

OUTPUT +

+
only setting the value of c
+a: 1 b: 2 c: 77
+
+

With that in hand, let’s look at the help for +numpy.loadtxt:

+
+

PYTHON +

+
help(numpy.loadtxt)
+
+
+

OUTPUT +

+
Help on function loadtxt in module numpy.lib.npyio:
+
+loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+    Load data from a text file.
+
+    Each row in the text file must have the same number of values.
+
+    Parameters
+    ----------
+...
+
+

There’s a lot of information here, but the most important part is the +first couple of lines:

+
+

OUTPUT +

+
loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+
+

This tells us that loadtxt has one parameter called +fname that doesn’t have a default value, and eight others +that do. If we call the function like this:

+
+

PYTHON +

+
numpy.loadtxt('inflammation-01.csv', ',')
+
+

then the filename is assigned to fname (which is what we +want), but the delimiter string ',' is assigned to +dtype rather than delimiter, because +dtype is the second parameter in the list. However +',' isn’t a known dtype so our code produced +an error message when we tried to run it. When we call +loadtxt we don’t have to provide fname= for +the filename because it’s the first item in the list, but if we want the +',' to be assigned to the variable delimiter, +we do have to provide delimiter= for the second +parameter since delimiter is not the second parameter in +the list.

+

Readable functions + +

+
+

Consider these two functions:

+
+

PYTHON +

+
def s(p):
+    a = 0
+    for v in p:
+        a += v
+    m = a / len(p)
+    d = 0
+    for v in p:
+        d += (v - m) * (v - m)
+    return numpy.sqrt(d / (len(p) - 1))
+
+def std_dev(sample):
+    sample_sum = 0
+    for value in sample:
+        sample_sum += value
+
+    sample_mean = sample_sum / len(sample)
+
+    sum_squared_devs = 0
+    for value in sample:
+        sum_squared_devs += (value - sample_mean) * (value - sample_mean)
+
+    return numpy.sqrt(sum_squared_devs / (len(sample) - 1))
+
+

The functions s and std_dev are +computationally equivalent (they both calculate the sample standard +deviation), but to a human reader, they look very different. You +probably found std_dev much easier to read and understand +than s.

+

As this example illustrates, both documentation and a programmer’s +coding style combine to determine how easy it is for others to +read and understand the programmer’s code. Choosing meaningful variable +names and using blank spaces to break the code into logical “chunks” are +helpful techniques for producing readable code. This is useful +not only for sharing code with others, but also for the original +programmer. If you need to revisit code that you wrote months ago and +haven’t thought about since then, you will appreciate the value of +readable code!

+
+
+ +
+
+

Combining Strings +

+
+

“Adding” two strings produces their concatenation: +'a' + 'b' is 'ab'. Write a function called +fence that takes two parameters called +original and wrapper and returns a new string +that has the wrapper character at the beginning and end of the original. +A call to your function should look like this:

+
+

PYTHON +

+
print(fence('name', '*'))
+
+
+

OUTPUT +

+
*name*
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def fence(original, wrapper):
+    return wrapper + original + wrapper
+
+
+
+
+
+
+
+ +
+
+

Return versus print +

+
+

Note that return and print are not +interchangeable. print is a Python function that +prints data to the screen. It enables us, users, see +the data. return statement, on the other hand, makes data +visible to the program. Let’s have a look at the following function:

+
+

PYTHON +

+
def add(a, b):
+    print(a + b)
+
+

Question: What will we see if we execute the +following commands?

+
+

PYTHON +

+
A = add(7, 3)
+print(A)
+
+
+
+
+
+
+ +
+
+

Python will first execute the function add with +a = 7 and b = 3, and, therefore, print +10. However, because function add does not +have a line that starts with return (no return +“statement”), it will, by default, return nothing which, in Python +world, is called None. Therefore, A will be +assigned to None and the last line (print(A)) +will print None. As a result, we will see:

+
+

OUTPUT +

+
10
+None
+
+
+
+
+
+
+
+ +
+
+

Selecting Characters From Strings +

+
+

If the variable s refers to a string, then +s[0] is the string’s first character and s[-1] +is its last. Write a function called outer that returns a +string made up of just the first and last characters of its input. A +call to your function should look like this:

+
+

PYTHON +

+
print(outer('helium'))
+
+
+

OUTPUT +

+
hm
+
+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def outer(input_string):
+    return input_string[0] + input_string[-1]
+
+
+
+
+
+
+
+ +
+
+

Rescaling an Array +

+
+

Write a function rescale that takes an array as input +and returns a corresponding array of values scaled to lie in the range +0.0 to 1.0. (Hint: If L and H are the lowest +and highest values in the original array, then the replacement for a +value v should be (v-L) / (H-L).)

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def rescale(input_array):
+    L = numpy.amin(input_array)
+    H = numpy.amax(input_array)
+    output_array = (input_array - L) / (H - L)
+    return output_array
+
+
+
+
+
+
+
+ +
+
+

Testing and Documenting Your Function +

+
+

Run the commands help(numpy.arange) and +help(numpy.linspace) to see how to use these functions to +generate regularly-spaced values, then use those values to test your +rescale function. Once you’ve successfully tested your +function, add a docstring that explains what it does.

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
"""Takes an array as input, and returns a corresponding array scaled so
+that 0 corresponds to the minimum and 1 to the maximum value of the input array.
+
+Examples:
+>>> rescale(numpy.arange(10.0))
+array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
+       0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])
+>>> rescale(numpy.linspace(0, 100, 5))
+array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
+"""
+
+
+
+
+
+
+
+ +
+
+

Defining Defaults +

+
+

Rewrite the rescale function so that it scales data to +lie between 0.0 and 1.0 by default, but will +allow the caller to specify lower and upper bounds if they want. Compare +your implementation to your neighbor’s: do the two functions always +behave the same way?

+
+
+
+
+
+ +
+
+
+

PYTHON +

+
def rescale(input_array, low_val=0.0, high_val=1.0):
+    """rescales input array values to lie between low_val and high_val"""
+    L = numpy.amin(input_array)
+    H = numpy.amax(input_array)
+    intermed_array = (input_array - L) / (H - L)
+    output_array = intermed_array * (high_val - low_val) + low_val
+    return output_array
+
+
+
+
+
+
+
+ +
+
+

Variables Inside and Outside Functions +

+
+

What does the following piece of code display when run — and why?

+
+

PYTHON +

+
f = 0
+k = 0
+
+def f2k(f):
+    k = ((f - 32) * (5.0 / 9.0)) + 273.15
+    return k
+
+print(f2k(8))
+print(f2k(41))
+print(f2k(32))
+
+print(k)
+
+
+
+
+
+
+ +
+
+
+

OUTPUT +

+
259.81666666666666
+278.15
+273.15
+0
+
+

k is 0 because the k inside the function +f2k doesn’t know about the k defined outside +the function. When the f2k function is called, it creates a +local variable +k. The function does not return any values and does not +alter k outside of its local copy. Therefore the original +value of k remains unchanged. Beware that a local +k is created because f2k internal statements +affect a new value to it. If k was only +read, it would simply retrieve the global k +value.

+
+
+
+
+
+
+ +
+
+

Mixing Default and Non-Default Parameters +

+
+

Given the following code:

+
+

PYTHON +

+
def numbers(one, two=2, three, four=4):
+    n = str(one) + str(two) + str(three) + str(four)
+    return n
+
+print(numbers(1, three=3))
+
+

what do you expect will be printed? What is actually printed? What +rule do you think Python is following?

+
    +
  1. 1234
  2. +
  3. one2three4
  4. +
  5. 1239
  6. +
  7. SyntaxError
  8. +
+

Given that, what does the following piece of code display when +run?

+
+

PYTHON +

+
def func(a, b=3, c=6):
+    print('a: ', a, 'b: ', b, 'c:', c)
+
+func(-1, 2)
+
+
    +
  1. a: b: 3 c: 6
  2. +
  3. a: -1 b: 3 c: 6
  4. +
  5. a: -1 b: 2 c: 6
  6. +
  7. a: b: -1 c: 2
  8. +
+
+
+
+
+
+ +
+
+

Attempting to define the numbers function results in +4. SyntaxError. The defined parameters two and +four are given default values. Because one and +three are not given default values, they are required to be +included as arguments when the function is called and must be placed +before any parameters that have default values in the function +definition.

+

The given call to func displays +a: -1 b: 2 c: 6. -1 is assigned to the first parameter +a, 2 is assigned to the next parameter b, and +c is not passed a value, so it uses its default value +6.

+
+
+
+
+
+
+ +
+
+

Readable Code +

+
+

Revise a function you wrote for one of the previous exercises to try +to make the code more readable. Then, collaborate with one of your +neighbors to critique each other’s functions and discuss how your +function implementations could be further improved to make them more +readable.

+
+
+
+
+
+ +
+
+

Key Points +

+
+
    +
  • Define a function using +def function_name(parameter).
  • +
  • The body of a function must be indented.
  • +
  • Call a function using function_name(value).
  • +
  • Numbers are stored as integers or floating-point numbers.
  • +
  • Variables defined within a function can only be seen and used within +the body of the function.
  • +
  • Variables created outside of any function are called global +variables.
  • +
  • Within a function, we can access global variables.
  • +
  • Variables created within a function override global variables if +their names match.
  • +
  • Use help(thing) to view help for something.
  • +
  • Put docstrings in functions to provide help for that function.
  • +
  • Specify default values for parameters when defining a function using +name=value in the parameter list.
  • +
  • Parameters can be passed by matching based on name, by position, or +by omitting them (in which case the default value is used).
  • +
  • Put code whose parameters change frequently in a function, then call +it with different parameter values to customize its behavior.
  • +
+
+
+
+

Content from Data Analysis

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 60 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I process tabular data files in Python?
  • +
  • How can I do the same operations on many different files?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • read in data files to Python
  • +
  • perform common operations on tabular data
  • +
  • write code to perform the same operation on multiple files
  • +
+
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
    +
  • NULL
  • +
+
+
+

Content from Visualizations

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 60 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How can I visualize tabular data in Python?
  • +
  • How can I group several plots together?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • create graphs and other visualizations using tabular data
  • +
  • group plots together to make comparative visualizations
  • +
+
+
+
+
+
+

FIXME

+
+
+ +
+
+

Key Points +

+
+
    +
  • NULL
  • +
+
+
+

Content from Errors and Exceptions

+
+

Last updated on 2024-07-11 | + + Edit this page

+

Estimated time: 50 minutes

+
+ +
+
+

Overview

+
+
+
+
+

Questions

+
    +
  • How does Python report errors?
  • +
  • How can I handle errors in Python programs?
  • +
+
+
+
+
+
+
+

Objectives

+
    +
  • identify different errors and correct bugs associated with them
  • +
+
+
+
+
+
+

Every programmer encounters errors, both those who are just +beginning, and those who have been programming for years. Encountering +errors and exceptions can be very frustrating at times, and can make +coding feel like a hopeless endeavour. However, understanding what the +different types of errors are and when you are likely to encounter them +can help a lot. Once you know why you get certain types of +errors, they become much easier to fix.

+

Errors in Python have a very specific form, called a traceback. Let’s examine one:

+
+

PYTHON +

+
# This code has an intentional error. You can type it directly or
+# use it for reference to understand the error message below.
+def favorite_ice_cream():
+    ice_creams = [
+        'chocolate',
+        'vanilla',
+        'strawberry'
+    ]
+    print(ice_creams[3])
+
+favorite_ice_cream()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-1-70bd89baa4df> in <module>()
+      9     print(ice_creams[3])
+      10
+----> 11 favorite_ice_cream()
+
+<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
+      7         'strawberry'
+      8     ]
+----> 9     print(ice_creams[3])
+      10
+      11 favorite_ice_cream()
+
+IndexError: list index out of range
+
+

This particular traceback has two levels. You can determine the +number of levels by looking for the number of arrows on the left hand +side. In this case:

+
    +
  1. The first shows code from the cell above, with an arrow pointing +to Line 11 (which is favorite_ice_cream()).

  2. +
  3. The second shows some code in the function +favorite_ice_cream, with an arrow pointing to Line 9 (which +is print(ice_creams[3])).

  4. +
+

The last level is the actual place where the error occurred. The +other level(s) show what function the program executed to get to the +next level down. So, in this case, the program first performed a function call to the function +favorite_ice_cream. Inside this function, the program +encountered an error on Line 6, when it tried to run the code +print(ice_creams[3]).

+
+
+ +
+
+

Long Tracebacks +

+
+

Sometimes, you might see a traceback that is very long -- sometimes +they might even be 20 levels deep! This can make it seem like something +horrible happened, but the length of the error message does not reflect +severity, rather, it indicates that your program called many functions +before it encountered the error. Most of the time, the actual place +where the error occurred is at the bottom-most level, so you can skip +down the traceback to the bottom.

+
+
+
+

So what error did the program actually encounter? In the last line of +the traceback, Python helpfully tells us the category or type of error +(in this case, it is an IndexError) and a more detailed +error message (in this case, it says “list index out of range”).

+

If you encounter an error and don’t know what it means, it is still +important to read the traceback closely. That way, if you fix the error, +but encounter a new one, you can tell that the error changed. +Additionally, sometimes knowing where the error occurred is +enough to fix it, even if you don’t entirely understand the message.

+

If you do encounter an error you don’t recognize, try looking at the +official +documentation on errors. However, note that you may not always be +able to find the error there, as it is possible to create custom errors. +In that case, hopefully the custom error message is informative enough +to help you figure out what went wrong. Libraries like pandas and numpy +have these custom errors, but the procedure to figure them out is the +same: go to the earliest line in the error, and look at the error +message for it. The documentation for these libraries will often provide +the information you need about any functions you are using. There are +also large communities of users for data libraries that can help as +well!

+
+
+ +
+
+

Reading Error Messages +

+
+

Read the Python code and the resulting traceback below, and answer +the following questions:

+
    +
  1. How many levels does the traceback have?
  2. +
  3. What is the function name where the error occurred?
  4. +
  5. On which line number in this function did the error occur?
  6. +
  7. What is the type of error?
  8. +
  9. What is the error message?
  10. +
+
+

PYTHON +

+
# This code has an intentional error. Do not type it directly;
+# use it for reference to understand the error message below.
+def print_message(day):
+    messages = [
+        'Hello, world!',
+        'Today is Tuesday!',
+        'It is the middle of the week.',
+        'Today is Donnerstag in German!',
+        'Last day of the week!',
+        'Hooray for the weekend!',
+        'Aw, the weekend is almost over.'
+    ]
+    print(messages[day])
+
+def print_sunday_message():
+    print_message(7)
+
+print_sunday_message()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-7-3ad455d81842> in <module>
+     16     print_message(7)
+     17
+---> 18 print_sunday_message()
+     19
+
+<ipython-input-7-3ad455d81842> in print_sunday_message()
+     14
+     15 def print_sunday_message():
+---> 16     print_message(7)
+     17
+     18 print_sunday_message()
+
+<ipython-input-7-3ad455d81842> in print_message(day)
+     11         'Aw, the weekend is almost over.'
+     12     ]
+---> 13     print(messages[day])
+     14
+     15 def print_sunday_message():
+
+IndexError: list index out of range
+
+
+
+
+
+
+ +
+
+
    +
  1. 3 levels
  2. +
  3. print_message
  4. +
  5. 13
  6. +
  7. IndexError
  8. +
  9. +list index out of range You can then infer that +7 is not the right index to use with +messages.
  10. +
+
+
+
+
+
+
+ +
+
+

Better errors on newer Pythons +

+
+

Newer versions of Python have improved error printouts. If you are +debugging errors, it is often helpful to use the latest Python version, +even if you support older versions of Python.

+
+
+
+

Type Errors + +

+
+

One of the most common types of errors in Python are called type +errors. These errors occur when you try to perform an operation on +an object in python that cannot support it. This happens easily when +working with large datasets where there are expected value types like +either strings or integers. When we write a function expecting integers, +we will not get an error until we encounter an operation that cannot +handle strings. For example:

+
+

PYTHON +

+

+def our_function()
+  my_string="Hello World"
+  letter=my_string["e""]
+
+
+

ERROR +

+
  File "<ipython-input-3-6bb841ea1423>", line 3
+    letter=my_string["e"]
+                       ^
+TypeError: string indices must be integers
+
+

We get this error because we are trying to use an index to access +part of our string, which requires an integer. Instead, we entered a +character and received a type error. This is fixed by replacing “e” with +2.

+

In the case of datasets, we often see type errors when a mathematical +operation, such as taking a mean, is performed on a column that contains +characters, either as a result of formatting or introduced through +error. As a result, correcting the error can involve simply removing the +characters from the strings using regular expressions, or if the +characters have resulted in incorrect data, removing those observations +from the dataset.

+

Syntax Errors + +

+
+

When you forget a colon at the end of a line, accidentally add one +space too many when indenting under an if statement, or +forget a parenthesis, you will encounter a syntax error. This means that +Python couldn’t figure out how to read your program. This is similar to +forgetting punctuation in English: for example, this text is difficult +to read there is no punctuation there is also no capitalization why is +this hard because you have to figure out where each sentence ends you +also have to figure out where each sentence begins to some extent it +might be ambiguous if there should be a sentence break or not

+

People can typically figure out what is meant by text with no +punctuation, but people are much smarter than computers. If Python +doesn’t know how to read the program, it will give up and inform you +with an error. For example:

+
+

PYTHON +

+
def some_function()
+    msg = 'hello, world!'
+    print(msg)
+     return msg
+
+
+

ERROR +

+
  File "<ipython-input-3-6bb841ea1423>", line 1
+    def some_function()
+                       ^
+SyntaxError: invalid syntax
+
+

Here, Python tells us that there is a SyntaxError on +line 1, and even puts a little arrow in the place where there is an +issue. In this case the problem is that the function definition is +missing a colon at the end.

+

Actually, the function above has two issues with syntax. If +we fix the problem with the colon, we see that there is also an +IndentationError, which means that the lines in the +function definition do not all have the same indentation:

+
+

PYTHON +

+
def some_function():
+    msg = 'hello, world!'
+    print(msg)
+     return msg
+
+
+

ERROR +

+
  File "<ipython-input-4-ae290e7659cb>", line 4
+    return msg
+    ^
+IndentationError: unexpected indent
+
+

Both SyntaxError and IndentationError +indicate a problem with the syntax of your program, but an +IndentationError is more specific: it always means +that there is a problem with how your code is indented.

+
+
+ +
+
+

Tabs and Spaces +

+
+

Some indentation errors are harder to spot than others. In +particular, mixing spaces and tabs can be difficult to spot because they +are both whitespace. In the +example below, the first two lines in the body of the function +some_function are indented with tabs, while the third line +— with spaces. If you’re working in a Jupyter notebook, be sure to copy +and paste this example rather than trying to type it in manually because +Jupyter automatically replaces tabs with spaces.

+
+

PYTHON +

+
def some_function():
+	msg = 'hello, world!'
+	print(msg)
+        return msg
+
+

Visually it is impossible to spot the error. Fortunately, Python does +not allow you to mix tabs and spaces.

+
+

ERROR +

+
  File "<ipython-input-5-653b36fbcd41>", line 4
+    return msg
+              ^
+TabError: inconsistent use of tabs and spaces in indentation
+
+
+
+
+

Variable Name Errors + +

+
+

Another very common type of error is called a NameError, +and occurs when you try to use a variable that does not exist. For +example:

+
+

PYTHON +

+
print(a)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-7-9d7b17ad5387> in <module>()
+----> 1 print(a)
+
+NameError: name 'a' is not defined
+
+

Variable name errors come with some of the most informative error +messages, which are usually of the form “name ‘the_variable_name’ is not +defined”.

+

Why does this error message occur? That’s a harder question to +answer, because it depends on what your code is supposed to do. However, +there are a few very common reasons why you might have an undefined +variable. The first is that you meant to use a string, but forgot to put quotes around +it:

+
+

PYTHON +

+
print(hello)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-8-9553ee03b645> in <module>()
+----> 1 print(hello)
+
+NameError: name 'hello' is not defined
+
+

The second reason is that you might be trying to use a variable that +does not yet exist. In the following example, count should +have been defined (e.g., with count = 0) before the for +loop:

+
+

PYTHON +

+
for number in range(10):
+    count = count + number
+print('The count is:', count)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-9-dd6a12d7ca5c> in <module>()
+      1 for number in range(10):
+----> 2     count = count + number
+      3 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+

Finally, the third possibility is that you made a typo when you were +writing your code. Let’s say we fixed the error above by adding the line +Count = 0 before the for loop. Frustratingly, this actually +does not fix the error. Remember that variables are case-sensitive, so the variable +count is different from Count. We still get +the same error, because we still have not defined +count:

+
+

PYTHON +

+
Count = 0
+for number in range(10):
+    count = count + number
+print('The count is:', count)
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+NameError                                 Traceback (most recent call last)
+<ipython-input-10-d77d40059aea> in <module>()
+      1 Count = 0
+      2 for number in range(10):
+----> 3     count = count + number
+      4 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+

Index Errors + +

+
+

Next up are errors having to do with containers (like lists and +strings) and the items within them. If you try to access an item in a +list or a string that does not exist, then you will get an error. This +makes sense: if you asked someone what day they would like to get +coffee, and they answered “caturday”, you might be a bit annoyed. Python +gets similarly annoyed if you try to ask it for an item that doesn’t +exist:

+
+

PYTHON +

+
letters = ['a', 'b', 'c']
+print('Letter #1 is', letters[0])
+print('Letter #2 is', letters[1])
+print('Letter #3 is', letters[2])
+print('Letter #4 is', letters[3])
+
+
+

OUTPUT +

+
Letter #1 is a
+Letter #2 is b
+Letter #3 is c
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+IndexError                                Traceback (most recent call last)
+<ipython-input-11-d817f55b7d6c> in <module>()
+      3 print('Letter #2 is', letters[1])
+      4 print('Letter #3 is', letters[2])
+----> 5 print('Letter #4 is', letters[3])
+
+IndexError: list index out of range
+
+

Here, Python is telling us that there is an IndexError +in our code, meaning we tried to access a list index that did not +exist.

+

File Errors + +

+
+

The last type of error we’ll cover today are the most common type of +error when using Python with data, those associated with reading and +writing files: FileNotFoundError. If you try to read a file +that does not exist, you will receive a FileNotFoundError +telling you so. If you attempt to write to a file that was opened +read-only, Python 3 returns an UnsupportedOperationError. +More generally, problems with input and output manifest as +OSErrors, which may show up as a more specific subclass; +you can see the +list in the Python docs. They all have a unique UNIX +errno, which is you can see in the error message.

+
+

PYTHON +

+
file_handle = open('myfile.txt', 'r')
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+FileNotFoundError                         Traceback (most recent call last)
+<ipython-input-14-f6e1ac4aee96> in <module>()
+----> 1 file_handle = open('myfile.txt', 'r')
+
+FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
+
+

One reason for receiving this error is that you specified an +incorrect path to the file. For example, if I am currently in a folder +called myproject, and I have a file in +myproject/writing/myfile.txt, but I try to open +myfile.txt, this will fail. The correct path would be +writing/myfile.txt. It is also possible that the file name +or its path contains a typo. There may also be specific settings based +on your organization if you are using shared, networked, or cloud-based +drives. It is best to check with your IT administrators if you are still +encountering issues reading in a file after troubleshooting.

+

A related issue can occur if you use the “read” flag instead of the +“write” flag. Python will not give you an error if you try to open a +file for writing when the file does not exist. However, if you meant to +open a file for reading, but accidentally opened it for writing, and +then try to read from it, you will get an +UnsupportedOperation error telling you that the file was +not opened for reading:

+
+

PYTHON +

+
file_handle = open('myfile.txt', 'w')
+file_handle.read()
+
+
+

ERROR +

+
---------------------------------------------------------------------------
+UnsupportedOperation                      Traceback (most recent call last)
+<ipython-input-15-b846479bc61f> in <module>()
+      1 file_handle = open('myfile.txt', 'w')
+----> 2 file_handle.read()
+
+UnsupportedOperation: not readable
+
+

If you are getting a read or write error on file or folder that you +are able to open and/or edit with other programs, you may need to +contact an IT administrator to check the permissions granted to you and +any programs you are using.

+

These are the most common errors with files, though many others +exist. If you get an error that you’ve never seen before, searching the +Internet for that error type often reveals common reasons why you might +get that error.

+
+
+ +
+
+

Identifying Syntax Errors +

+
+
    +
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. Is it a +SyntaxError or an IndentationError?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+
+

PYTHON +

+
def another_function
+  print('Syntax errors are annoying.')
+   print('But at least Python tells us about them!')
+  print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+ +
+
+

SyntaxError for missing (): at end of first +line, IndentationError for mismatch between second and +third lines. A fixed version is:

+
+

PYTHON +

+
def another_function():
+    print('Syntax errors are annoying.')
+    print('But at least Python tells us about them!')
+    print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+ +
+
+

Identifying Variable Name Errors +

+
+
    +
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. What type of +NameError do you think this is? In other words, is it a +string with no quotes, a misspelled variable, or a variable that should +have been defined but was not?
  4. +
  5. Fix the error.
  6. +
  7. Repeat steps 2 and 3, until you have fixed all the errors.
  8. +
+
+

PYTHON +

+
for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (Number % 3) == 0:
+        message = message + a
+    else:
+        message = message + 'b'
+print(message)
+
+
+
+
+
+
+ +
+
+

3 NameErrors for number being misspelled, +for message not defined, and for a not being +in quotes.

+

Fixed version:

+
+

PYTHON +

+
message = ''
+for number in range(10):
+    # use a if the number is a multiple of 3, otherwise use b
+    if (number % 3) == 0:
+        message = message + 'a'
+    else:
+        message = message + 'b'
+print(message)
+
+
+
+
+
+
+
+ +
+
+

Identifying Index Errors +

+
+
    +
  1. Read the code below, and (without running it) try to identify what +the errors are.
  2. +
  3. Run the code, and read the error message. What type of error is +it?
  4. +
  5. Fix the error.
  6. +
+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+ +
+
+

IndexError; the last entry is seasons[3], +so seasons[4] doesn’t make sense. A fixed version is:

+
+

PYTHON +

+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[-1])
+
+
+
+
+
+

A Final Note About Correcting Errors + +

+
+

There are a lot of very helpful answers for many error messages, +however when working with official statistics, we need to also exercise +some caution. Be aware and be wary of any answers that ask you to +download a package from someone’s personal GitHub repository or other +file sharing service. Try to find the type of error first and understand +what the issue is before downloading anything claiming to fix the error. +If the error is the result of an issue with a version of a package, +check if there are any security vulnerabilities with that version, and +use a package manager to move between package versions.

+
+
+ +
+
+

Key Points +

+
+
    +
  • NULL
  • +
+
+
+
+
+
+
+
+ + +
+ + +
+ + + + + diff --git a/instructor/discuss.html b/instructor/discuss.html new file mode 100644 index 0000000..86f8727 --- /dev/null +++ b/instructor/discuss.html @@ -0,0 +1,451 @@ + +Python for Official Statistics: Discussion +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Discussion

+

Last updated on 2024-07-11 | + + Edit this page

+ + + + + +
+ +
+ + +

FIXME

+ + +
+
+ + +
+
+ + + diff --git a/instructor/images.html b/instructor/images.html new file mode 100644 index 0000000..d223bc9 --- /dev/null +++ b/instructor/images.html @@ -0,0 +1,534 @@ + + + + + +Python for Official Statistics: All Images + + + + + + + + + + + + +
+ Python for Official Statistics +
+ +
+
+ + + + + + +
+
+ + +

Introduction

+

Python Fundamentals

+
+

Figure 1

+ +
Value of 65.0 with weight_kg label stuck on it

+

Figure 2

+ +
Value of 65.0 with weight_kg label stuck on it, and value of 143.0 with weight_lb label stuck on it

+

Figure 3

+ +
Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lbstuck on it

Data Transformation

+
+

Figure 1

+ +
'data' is a 3 by 3 numpy array containing row 0: ['A', 'B', 'C'], row 1: ['D', 'E', 'F'], androw 2: ['G', 'H', 'I']. Starting in the upper left hand corner, data[0, 0] = 'A', data[0, 1] = 'B',data[0, 2] = 'C', data[1, 0] = 'D', data[1, 1] = 'E', data[1, 2] = 'F', data[2, 0] = 'G',data[2, 1] = 'H', and data[2, 2] = 'I',in the bottom right hand corner.

List and Dictionary Methods

+

Loops and Conditional Logic

+
+

Figure 1

+ +
Line graphs showing average, maximum, and minimum inflammation across all patients over a 40-day period.

+

Figure 2

+ +
Loop variable 'num' being assigned the value of each element in the list odds in turn andthen being printed

+

Figure 3

+ +
A flowchart diagram of the if-else construct that tests if variable num is greater than 100

+

Figure 4

+ +
A flowchart diagram of a conditional section with multiple elif conditions and some > possible outcomes.

+

Figure 5

+ +
A flowchart diagram of a conditional section with multiple if statements and some possible outcomes.

Alternatives to Loops

+

Creating Functions

+
+

Figure 1

+ +
Labeled parts of a Python function definition

Data Analysis

+

Visualizations

+

Errors and Exceptions

+
+
+
+
+ + +
+ + +
+ + + + + diff --git a/instructor/index.html b/instructor/index.html new file mode 100644 index 0000000..b8b7fe1 --- /dev/null +++ b/instructor/index.html @@ -0,0 +1,555 @@ + +Python for Official Statistics: Summary and Schedule +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+

Summary and Schedule

+ + +

Python for Official Statistics will teach participants the basics of +Python for its use in creating Official Statistics. Participants will +learn basic programming principles, and employ them in the manipulation +of data and data structures.

+
+
+ +
+
+

Prerequisites +

+
+

FIXME

+
+
+
+ + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor. +

+

FIXME

+ +
+ + +
+
+ + + diff --git a/instructor/instructor-notes.html b/instructor/instructor-notes.html new file mode 100644 index 0000000..f8568c1 --- /dev/null +++ b/instructor/instructor-notes.html @@ -0,0 +1,506 @@ + + + + + +Python for Official Statistics: Instructor Notes + + + + + + + + + + + + +
+ Python for Official Statistics +
+ +
+
+ + + + + + + + + +
+ + +
+ + + + + diff --git a/instructor/key-points.html b/instructor/key-points.html new file mode 100644 index 0000000..7c01c04 --- /dev/null +++ b/instructor/key-points.html @@ -0,0 +1,602 @@ + + + + + +Python for Official Statistics: Key Points + + + + + + + + + + + + +
+ Python for Official Statistics +
+ +
+
+ + + + + + +
+
+ + +

Introduction

+
+
    +
  • Python is an interpreted language.
  • +
  • Code is commonly developed inside an integrated development +environment.
  • +
  • A typical Python workflow uses base Python and additional Python +packages developed for statistical programming purposes.
  • +
  • In-line and external documentation helps ensure that your code is +readable.
  • +
  • You can find help through the built-in help function and external +resources.
  • +

Python Fundamentals

+
+
    +
  • Basic data types in Python include integers, strings, and +floating-point numbers.
  • +
  • Use variable = value to assign a value to a variable in +order to record it in memory.
  • +
  • Variables are created on demand whenever a value is assigned to +them.
  • +
  • Use print(something) to display the value of +something.
  • +
  • Use # some kind of explanation to add comments to +programs.
  • +
  • Built-in functions are always available to use.
  • +

Data Transformation

+

List and Dictionary Methods

+
+
    +
  • Lists can contain any Python object including other lists
  • +
  • Lists are ordered i.e. indexed and can therefore be sliced by index +number
  • +
  • Unlike strings and integers, the values inside a list can be +modified in place
  • +
  • A list which contains other lists is referred to as a nested +list
  • +
  • Dictionaries behave like unordered lists and are defined using +key-value pairs
  • +
  • Dictionary keys are unique
  • +
  • A dictionary which contains other dictionaries is referred to as a +nested dictionary
  • +
  • Values inside nested lists and dictionaries can be accessed by an +additional index
  • +

Loops and Conditional Logic

+
+
    +
  • Use for variable in sequence to process the elements of +a sequence one at a time.
  • +
  • The body of a for loop must be indented.
  • +
  • Use len(thing) to determine the length of something +that contains other values.
  • +
  • Use if condition to start a conditional statement, +elif condition to provide additional tests, and +else to provide a default.
  • +
  • The bodies of the branches of conditional statements must be +indented.
  • +
  • Use == to test for equality.
  • +
  • +X and Y is only true if both X and +Y are true.
  • +
  • +X or Y is true if either X or +Y, or both, are true.
  • +
  • Zero, the empty string, and the empty list are considered false; all +other numbers, strings, and lists are considered true.
  • +
  • +True and False represent truth +values.
  • +

Alternatives to Loops

+
+
    +
  • NULL
  • +

Creating Functions

+
+
    +
  • Define a function using +def function_name(parameter).
  • +
  • The body of a function must be indented.
  • +
  • Call a function using function_name(value).
  • +
  • Numbers are stored as integers or floating-point numbers.
  • +
  • Variables defined within a function can only be seen and used within +the body of the function.
  • +
  • Variables created outside of any function are called global +variables.
  • +
  • Within a function, we can access global variables.
  • +
  • Variables created within a function override global variables if +their names match.
  • +
  • Use help(thing) to view help for something.
  • +
  • Put docstrings in functions to provide help for that function.
  • +
  • Specify default values for parameters when defining a function using +name=value in the parameter list.
  • +
  • Parameters can be passed by matching based on name, by position, or +by omitting them (in which case the default value is used).
  • +
  • Put code whose parameters change frequently in a function, then call +it with different parameter values to customize its behavior.
  • +

Data Analysis

+
+
    +
  • NULL
  • +

Visualizations

+
+
    +
  • NULL
  • +

Errors and Exceptions

+
+
    +
  • NULL
  • +
+
+
+
+ + +
+ + +
+ + + + + diff --git a/instructor/profiles.html b/instructor/profiles.html new file mode 100644 index 0000000..b09e68f --- /dev/null +++ b/instructor/profiles.html @@ -0,0 +1,408 @@ + +Python for Official Statistics: Learner Profiles +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Learner Profiles

+ +

This is a placeholder file. Please add content here.

+ +
+
+ + +
+
+ + + diff --git a/instructor/reference.html b/instructor/reference.html new file mode 100644 index 0000000..5865281 --- /dev/null +++ b/instructor/reference.html @@ -0,0 +1,808 @@ + +Python for Official Statistics: Glossary +
+ Python for Official Statistics +
+ +
+
+ + + + + +
+
+

Glossary

+

Last updated on 2024-07-11 | + + Edit this page

+ + + + + +
+ +
+ + +

Glossary +

+
argument
+
+A value given to a function or program when it runs. The term is often +used interchangeably (and inconsistently) with parameter. +
+
assertion
+
+An expression which is supposed to be true at a particular point in a +program. Programmers typically put assertions in their code to check for +errors; if the assertion fails (i.e., if the expression evaluates as +false), the program halts and produces an error message. See also: invariant, precondition, postcondition. +
+
assign
+
+To give a value a name by associating a variable with it. +
+
body
+
+(of a function): the statements that are executed when a function runs. +
+
call stack
+
+A data structure inside a running program that keeps track of active +function calls. +
+
case-insensitive
+
+Treating text as if upper and lower case characters of the same letter +were the same. See also: case-sensitive. +
+
case-sensitive
+
+Treating text as if upper and lower case characters of the same letter +are different. See also: case-insensitive. +
+
comment
+
+A remark in a program that is intended to help human readers understand +what is going on, but is ignored by the computer. Comments in Python, R, +and the Unix shell start with a # character and run to the +end of the line; comments in SQL start with --, and other +languages have other conventions. +
+
compose
+
+To apply one function to the result of another, such as +f(g(x)). +
+
conditional statement
+
+A statement in a program that might or might not be executed depending +on whether a test is true or false. +
+
comma-separated values
+
+(CSV) A common textual representation for tables in which the values in +each row are separated by commas. +
+
default value
+
+A value to use for a parameter if nothing is +specified explicitly. +
+
defensive programming
+
+The practice of writing programs that check their own operation to catch +errors as early as possible. +
+
delimiter
+
+A character or characters used to separate individual values, such as +the commas between columns in a CSV file. +
+
docstring
+
+Short for “documentation string”, this refers to textual documentation +embedded in Python programs. Unlike comments, docstrings are preserved +in the running program and can be examined in interactive sessions. +
+
documentation
+
+Human-language text written to explain what software does, how it works, +or how to use it. +
+
dotted notation
+
+A two-part notation used in many programming languages in which +thing.component refers to the component +belonging to thing. +
+
empty string
+
+A character string containing no characters, often thought of as the +“zero” of text. +
+
encapsulation
+
+The practice of hiding something’s implementation details so that the +rest of a program can worry about what it does rather than +how it does it. +
+
floating-point number
+
+A number containing a fractional part and an exponent. See also: integer. +
+
for loop
+
+A loop that is executed once for each value in some kind of set, list, +or range. See also: while loop. +
+
function
+
+A named group of instructions that is executed when the function’s name +is used in the code. Occurrence of a function name in the code is a function call. Functions may process input arguments and return the result back. Functions may +also be used for logically grouping together pieces of code. In such +cases, they don’t need to return any meaningful value and can be written +without the return +statement completely. Such functions return a special value +None, which is a way of saying “nothing” in Python. +
+
function call
+
+A use of a function in another piece of software. +
+
global variable
+
+A variable defined outside of a function. It can be used in global +statements, and read inside functions. +
+
heat map
+
+A graphical representation of two-dimensional data in which colors, +ranging on a scale of hue or intensity, represent the data values. +
+
Integrated Development +Environment (IDE)
+
+the place where you write your code. +
+
immutable
+
+Unchangeable. The value of immutable data cannot be altered after it has +been created. See also: mutable. +
+
import
+
+To load a library into a program. +
+
in-place operators
+
+An operator such as += that provides a shorthand notation +for the common case in which the variable being assigned to is also an +operand on the right hand side of the assignment. For example, the +statement x += 3 means the same thing as +x = x + 3. +
+
index
+
+A subscript that specifies the location of a single value in a +collection, such as a single pixel in an image. +
+
inner loop
+
+A loop that is inside another loop. See also: outer loop. +
+
integer
+
+A whole number, such as -12343. See also: floating-point number. +
+
invariant
+
+An expression whose value doesn’t change during the execution of a +program, typically used in an assertion. See +also: precondition, postcondition. +
+
library
+
+A family of code units (functions, classes, variables) that implement a +set of related tasks. +
+
local variable
+
+A variable defined inside of a function, that exists only in the scope +of that function, meaning it cannot be accessed by code outside of the +function. +
+
loop variable
+
+The variable that keeps track of the progress of the loop. +
+
member
+
+A variable contained within an object. +
+
method
+
+A function which is tied to a particular object. +Each of an object’s methods typically implements one of the things it +can do, or one of the questions it can answer. +
+
mutable
+
+Changeable. The value of mutable data can be altered after it has been +created. See immutable.” +
+
notebook
+
+Interactive computational environment accessed via your web browser, in +which you can write and execute Python code and combine it with +explanatory text, mathematics and visualizations. Examples are IPython +or Jupyter notebooks. +
+
object
+
+A collection of conceptually related variables (members) and functions using those variables (methods). +
+
outer loop
+
+A loop that contains another loop. See also: inner +loop. +
+
parameter
+
+A variable named in the function’s declaration that is used to hold a +value passed into the call. The term is often used interchangeably (and +inconsistently) with argument. +
+
pipe
+
+A connection from the output of one program to the input of another. +When two or more programs are connected in this way, they are called a +“pipeline”. +
+
postcondition
+
+A condition that a function (or other block of code) guarantees is true +once it has finished running. Postconditions are often represented using +assertions. +
+
precondition
+
+A condition that must be true in order for a function (or other block of +code) to run correctly. +
+
regression
+
+To re-introduce a bug that was once fixed. +
+
return statement
+
+A statement that causes a function to stop executing and return a value +to its caller immediately. +
+
RGB
+
+An additive model that represents +colors as combinations of red, green, and blue. Each color’s value is +typically in the range 0..255 (i.e., a one-byte integer). +
+
sequence
+
+A collection of information that is presented in a specific order. For +example, in Python, a string is a sequence of +characters, while a list is a sequence of any variable. +
+
shape
+
+An array’s dimensions, represented as a vector. For example, a 5×3 +array’s shape is (5,3). +
+
silent failure
+
+Failing without producing any warning messages. Silent failures are hard +to detect and debug. +
+
slice
+
+A regular subsequence of a larger sequence, such as the first five +elements or every second element. +
+
stack frame
+
+A data structure that provides storage for a function’s local variables. +Each time a function is called, a new stack frame is created and put on +the top of the call stack. When the function +returns, the stack frame is discarded. +
+
standard input
+
+A process’s default input stream. In interactive command-line +applications, it is typically connected to the keyboard; in a pipe, it receives data from the standard output of the preceding process. +
+
standard output
+
+A process’s default output stream. In interactive command-line +applications, data sent to standard output is displayed on the screen; +in a pipe, it is passed to the standard input of the next process. +
+
string
+
+Short for “character string”, a sequence of zero +or more characters. +
+
syntax
+
+The rules that define how code must be written for a computer to +understand. +
+
syntax error
+
+A programming error that occurs when statements are in an order or +contain characters not expected by the programming language. +
+
tab completion
+
+A feature of command-line interpreters, in which the program +automatically fills in partially typed commands upon pressing the +Tab key. +
+
test oracle
+
+A program, device, data set, or human being against which the results of +a test can be compared. +
+
test-driven +development
+
+The practice of writing unit tests before writing the code they +test. +
+
traceback
+
+The sequence of function calls that led to an error. +
+
tuple
+
+An immutable sequence +of values. +
+
type
+
+The classification of something in a program (for example, the contents +of a variable) as a kind of number (e.g. floating-point, integer), string, or something else. +
+
type of error
+
+Indicates the nature of an error in a program. For example, in Python, +an IOError to problems with file input/output. See also: syntax error. +
+
variable
+
+A value that has a name associated with it. +
+
while loop
+
+A loop that keeps executing as long as some condition is true. See also: +for loop. +
+
+
+ + +
+
+ + + diff --git a/key-points.html b/key-points.html new file mode 100644 index 0000000..a3c72f2 --- /dev/null +++ b/key-points.html @@ -0,0 +1,600 @@ + + + + + +Python for Official Statistics: Key Points + + + + + + + + + + + + +
+ Python for Official Statistics +
+ +
+
+ + + + + + +
+
+ + +

Introduction

+
+
    +
  • Python is an interpreted language.
  • +
  • Code is commonly developed inside an integrated development +environment.
  • +
  • A typical Python workflow uses base Python and additional Python +packages developed for statistical programming purposes.
  • +
  • In-line and external documentation helps ensure that your code is +readable.
  • +
  • You can find help through the built-in help function and external +resources.
  • +

Python Fundamentals

+
+
    +
  • Basic data types in Python include integers, strings, and +floating-point numbers.
  • +
  • Use variable = value to assign a value to a variable in +order to record it in memory.
  • +
  • Variables are created on demand whenever a value is assigned to +them.
  • +
  • Use print(something) to display the value of +something.
  • +
  • Use # some kind of explanation to add comments to +programs.
  • +
  • Built-in functions are always available to use.
  • +

Data Transformation

+

List and Dictionary Methods

+
+
    +
  • Lists can contain any Python object including other lists
  • +
  • Lists are ordered i.e. indexed and can therefore be sliced by index +number
  • +
  • Unlike strings and integers, the values inside a list can be +modified in place
  • +
  • A list which contains other lists is referred to as a nested +list
  • +
  • Dictionaries behave like unordered lists and are defined using +key-value pairs
  • +
  • Dictionary keys are unique
  • +
  • A dictionary which contains other dictionaries is referred to as a +nested dictionary
  • +
  • Values inside nested lists and dictionaries can be accessed by an +additional index
  • +

Loops and Conditional Logic

+
+
    +
  • Use for variable in sequence to process the elements of +a sequence one at a time.
  • +
  • The body of a for loop must be indented.
  • +
  • Use len(thing) to determine the length of something +that contains other values.
  • +
  • Use if condition to start a conditional statement, +elif condition to provide additional tests, and +else to provide a default.
  • +
  • The bodies of the branches of conditional statements must be +indented.
  • +
  • Use == to test for equality.
  • +
  • +X and Y is only true if both X and +Y are true.
  • +
  • +X or Y is true if either X or +Y, or both, are true.
  • +
  • Zero, the empty string, and the empty list are considered false; all +other numbers, strings, and lists are considered true.
  • +
  • +True and False represent truth +values.
  • +

Alternatives to Loops

+
+
    +
  • NULL
  • +

Creating Functions

+
+
    +
  • Define a function using +def function_name(parameter).
  • +
  • The body of a function must be indented.
  • +
  • Call a function using function_name(value).
  • +
  • Numbers are stored as integers or floating-point numbers.
  • +
  • Variables defined within a function can only be seen and used within +the body of the function.
  • +
  • Variables created outside of any function are called global +variables.
  • +
  • Within a function, we can access global variables.
  • +
  • Variables created within a function override global variables if +their names match.
  • +
  • Use help(thing) to view help for something.
  • +
  • Put docstrings in functions to provide help for that function.
  • +
  • Specify default values for parameters when defining a function using +name=value in the parameter list.
  • +
  • Parameters can be passed by matching based on name, by position, or +by omitting them (in which case the default value is used).
  • +
  • Put code whose parameters change frequently in a function, then call +it with different parameter values to customize its behavior.
  • +

Data Analysis

+
+
    +
  • NULL
  • +

Visualizations

+
+
    +
  • NULL
  • +

Errors and Exceptions

+
+
    +
  • NULL
  • +
+
+
+
+ + +
+ + +
+ + + + + diff --git a/link.svg b/link.svg new file mode 100644 index 0000000..88ad827 --- /dev/null +++ b/link.svg @@ -0,0 +1,12 @@ + + + + + + diff --git a/md5sum.txt b/md5sum.txt new file mode 100644 index 0000000..3141a28 --- /dev/null +++ b/md5sum.txt @@ -0,0 +1,20 @@ +"file" "checksum" "built" "date" +"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-07-11" +"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-07-11" +"config.yaml" "74431342d2a5c8a8ec51a96d422da0ee" "site/built/config.yaml" "2024-07-11" +"index.md" "57929a3be492ca717a93078c8edcd09f" "site/built/index.md" "2024-07-11" +"episodes/01-introduction.md" "bf140f974f3a79c9be24d171a6dc019d" "site/built/01-introduction.md" "2024-07-11" +"episodes/02-python_fundamentals.md" "e804fce2e1ad688e13e2986d25564360" "site/built/02-python_fundamentals.md" "2024-07-11" +"episodes/03-data_transformation.md" "c334965445d8c9e8362ae5ad4274041f" "site/built/03-data_transformation.md" "2024-07-11" +"episodes/04-lists.md" "190f2d24f26924f61e265f4818270208" "site/built/04-lists.md" "2024-07-11" +"episodes/05-loops.md" "94757103936ff21aa5bf078b3ee64885" "site/built/05-loops.md" "2024-07-11" +"episodes/06-alternative_loops.md" "f22afc1200ee5d738e835320fac2fb9b" "site/built/06-alternative_loops.md" "2024-07-11" +"episodes/07-functions.md" "95fb4e8a9686e4e222bde075fda79458" "site/built/07-functions.md" "2024-07-11" +"episodes/08-data_analysis.md" "b690289d367a4da22ee3ac6431824bb5" "site/built/08-data_analysis.md" "2024-07-11" +"episodes/09-visualizations.md" "9c76ab5b7f8fcd6d5fe7675b8be84dcd" "site/built/09-visualizations.md" "2024-07-11" +"episodes/10-errors_exceptions.md" "517149df417b9da9703fdbe0256dabaf" "site/built/10-errors_exceptions.md" "2024-07-11" +"instructors/instructor-notes.md" "a59fd3b94c07c3fe3218c054a0f03277" "site/built/instructor-notes.md" "2024-07-11" +"learners/discuss.md" "2758e2e5abd231d82d25c6453d8abbc6" "site/built/discuss.md" "2024-07-11" +"learners/reference.md" "9fe67fb9df32a28661dbb591c082482c" "site/built/reference.md" "2024-07-11" +"learners/setup.md" "ac87d318cb43bd0279662183e274fea5" "site/built/setup.md" "2024-07-11" +"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-07-11" diff --git a/mstile-150x150.png b/mstile-150x150.png new file mode 100644 index 0000000..8136f75 Binary files /dev/null and b/mstile-150x150.png differ diff --git a/pkgdown.css b/pkgdown.css new file mode 100644 index 0000000..80ea5b8 --- /dev/null +++ b/pkgdown.css @@ -0,0 +1,384 @@ +/* Sticky footer */ + +/** + * Basic idea: https://philipwalton.github.io/solved-by-flexbox/demos/sticky-footer/ + * Details: https://github.com/philipwalton/solved-by-flexbox/blob/master/assets/css/components/site.css + * + * .Site -> body > .container + * .Site-content -> body > .container .row + * .footer -> footer + * + * Key idea seems to be to ensure that .container and __all its parents__ + * have height set to 100% + * + */ + +html, body { + height: 100%; +} + +body { + position: relative; +} + +body > .container { + display: flex; + height: 100%; + flex-direction: column; +} + +body > .container .row { + flex: 1 0 auto; +} + +footer { + margin-top: 45px; + padding: 35px 0 36px; + border-top: 1px solid #e5e5e5; + color: #666; + display: flex; + flex-shrink: 0; +} +footer p { + margin-bottom: 0; +} +footer div { + flex: 1; +} +footer .pkgdown { + text-align: right; +} +footer p { + margin-bottom: 0; +} + +img.icon { + float: right; +} + +/* Ensure in-page images don't run outside their container */ +.contents img { + max-width: 100%; + height: auto; +} + +/* Fix bug in bootstrap (only seen in firefox) */ +summary { + display: list-item; +} + +/* Typographic tweaking ---------------------------------*/ + +.contents .page-header { + margin-top: calc(-60px + 1em); +} + +dd { + margin-left: 3em; +} + +/* Section anchors ---------------------------------*/ + +a.anchor { + display: none; + margin-left: 5px; + width: 20px; + height: 20px; + + background-image: url(./link.svg); + background-repeat: no-repeat; + background-size: 20px 20px; + background-position: center center; +} + +h1:hover .anchor, +h2:hover .anchor, +h3:hover .anchor, +h4:hover .anchor, +h5:hover .anchor, +h6:hover .anchor { + display: inline-block; +} + +/* Fixes for fixed navbar --------------------------*/ + +.contents h1, .contents h2, .contents h3, .contents h4 { + padding-top: 60px; + margin-top: -40px; +} + +/* Navbar submenu --------------------------*/ + +.dropdown-submenu { + position: relative; +} + +.dropdown-submenu>.dropdown-menu { + top: 0; + left: 100%; + margin-top: -6px; + margin-left: -1px; + border-radius: 0 6px 6px 6px; +} + +.dropdown-submenu:hover>.dropdown-menu { + display: block; +} + +.dropdown-submenu>a:after { + display: block; + content: " "; + float: right; + width: 0; + height: 0; + border-color: transparent; + border-style: solid; + border-width: 5px 0 5px 5px; + border-left-color: #cccccc; + margin-top: 5px; + margin-right: -10px; +} + +.dropdown-submenu:hover>a:after { + border-left-color: #ffffff; +} + +.dropdown-submenu.pull-left { + float: none; +} + +.dropdown-submenu.pull-left>.dropdown-menu { + left: -100%; + margin-left: 10px; + border-radius: 6px 0 6px 6px; +} + +/* Sidebar --------------------------*/ + +#pkgdown-sidebar { + margin-top: 30px; + position: -webkit-sticky; + position: sticky; + top: 70px; +} + +#pkgdown-sidebar h2 { + font-size: 1.5em; + margin-top: 1em; +} + +#pkgdown-sidebar h2:first-child { + margin-top: 0; +} + +#pkgdown-sidebar .list-unstyled li { + margin-bottom: 0.5em; +} + +/* bootstrap-toc tweaks ------------------------------------------------------*/ + +/* All levels of nav */ + +nav[data-toggle='toc'] .nav > li > a { + padding: 4px 20px 4px 6px; + font-size: 1.5rem; + font-weight: 400; + color: inherit; +} + +nav[data-toggle='toc'] .nav > li > a:hover, +nav[data-toggle='toc'] .nav > li > a:focus { + padding-left: 5px; + color: inherit; + border-left: 1px solid #878787; +} + +nav[data-toggle='toc'] .nav > .active > a, +nav[data-toggle='toc'] .nav > .active:hover > a, +nav[data-toggle='toc'] .nav > .active:focus > a { + padding-left: 5px; + font-size: 1.5rem; + font-weight: 400; + color: inherit; + border-left: 2px solid #878787; +} + +/* Nav: second level (shown on .active) */ + +nav[data-toggle='toc'] .nav .nav { + display: none; /* Hide by default, but at >768px, show it */ + padding-bottom: 10px; +} + +nav[data-toggle='toc'] .nav .nav > li > a { + padding-left: 16px; + font-size: 1.35rem; +} + +nav[data-toggle='toc'] .nav .nav > li > a:hover, +nav[data-toggle='toc'] .nav .nav > li > a:focus { + padding-left: 15px; +} + +nav[data-toggle='toc'] .nav .nav > .active > a, +nav[data-toggle='toc'] .nav .nav > .active:hover > a, +nav[data-toggle='toc'] .nav .nav > .active:focus > a { + padding-left: 15px; + font-weight: 500; + font-size: 1.35rem; +} + +/* orcid ------------------------------------------------------------------- */ + +.orcid { + font-size: 16px; + color: #A6CE39; + /* margins are required by official ORCID trademark and display guidelines */ + margin-left:4px; + margin-right:4px; + vertical-align: middle; +} + +/* Reference index & topics ----------------------------------------------- */ + +.ref-index th {font-weight: normal;} + +.ref-index td {vertical-align: top; min-width: 100px} +.ref-index .icon {width: 40px;} +.ref-index .alias {width: 40%;} +.ref-index-icons .alias {width: calc(40% - 40px);} +.ref-index .title {width: 60%;} + +.ref-arguments th {text-align: right; padding-right: 10px;} +.ref-arguments th, .ref-arguments td {vertical-align: top; min-width: 100px} +.ref-arguments .name {width: 20%;} +.ref-arguments .desc {width: 80%;} + +/* Nice scrolling for wide elements --------------------------------------- */ + +table { + display: block; + overflow: auto; +} + +/* Syntax highlighting ---------------------------------------------------- */ + +pre, code, pre code { + background-color: #f8f8f8; + color: #333; +} +pre, pre code { + white-space: pre-wrap; + word-break: break-all; + overflow-wrap: break-word; +} + +pre { + border: 1px solid #eee; +} + +pre .img, pre .r-plt { + margin: 5px 0; +} + +pre .img img, pre .r-plt img { + background-color: #fff; +} + +code a, pre a { + color: #375f84; +} + +a.sourceLine:hover { + text-decoration: none; +} + +.fl {color: #1514b5;} +.fu {color: #000000;} /* function */ +.ch,.st {color: #036a07;} /* string */ +.kw {color: #264D66;} /* keyword */ +.co {color: #888888;} /* comment */ + +.error {font-weight: bolder;} +.warning {font-weight: bolder;} + +/* Clipboard --------------------------*/ + +.hasCopyButton { + position: relative; +} + +.btn-copy-ex { + position: absolute; + right: 0; + top: 0; + visibility: hidden; +} + +.hasCopyButton:hover button.btn-copy-ex { + visibility: visible; +} + +/* headroom.js ------------------------ */ + +.headroom { + will-change: transform; + transition: transform 200ms linear; +} +.headroom--pinned { + transform: translateY(0%); +} +.headroom--unpinned { + transform: translateY(-100%); +} + +/* mark.js ----------------------------*/ + +mark { + background-color: rgba(255, 255, 51, 0.5); + border-bottom: 2px solid rgba(255, 153, 51, 0.3); + padding: 1px; +} + +/* vertical spacing after htmlwidgets */ +.html-widget { + margin-bottom: 10px; +} + +/* fontawesome ------------------------ */ + +.fab { + font-family: "Font Awesome 5 Brands" !important; +} + +/* don't display links in code chunks when printing */ +/* source: https://stackoverflow.com/a/10781533 */ +@media print { + code a:link:after, code a:visited:after { + content: ""; + } +} + +/* Section anchors --------------------------------- + Added in pandoc 2.11: https://github.com/jgm/pandoc-templates/commit/9904bf71 +*/ + +div.csl-bib-body { } +div.csl-entry { + clear: both; +} +.hanging-indent div.csl-entry { + margin-left:2em; + text-indent:-2em; +} +div.csl-left-margin { + min-width:2em; + float:left; +} +div.csl-right-inline { + margin-left:2em; + padding-left:1em; +} +div.csl-indent { + margin-left: 2em; +} diff --git a/pkgdown.js b/pkgdown.js new file mode 100644 index 0000000..6f0eee4 --- /dev/null +++ b/pkgdown.js @@ -0,0 +1,108 @@ +/* http://gregfranko.com/blog/jquery-best-practices/ */ +(function($) { + $(function() { + + $('.navbar-fixed-top').headroom(); + + $('body').css('padding-top', $('.navbar').height() + 10); + $(window).resize(function(){ + $('body').css('padding-top', $('.navbar').height() + 10); + }); + + $('[data-toggle="tooltip"]').tooltip(); + + var cur_path = paths(location.pathname); + var links = $("#navbar ul li a"); + var max_length = -1; + var pos = -1; + for (var i = 0; i < links.length; i++) { + if (links[i].getAttribute("href") === "#") + continue; + // Ignore external links + if (links[i].host !== location.host) + continue; + + var nav_path = paths(links[i].pathname); + + var length = prefix_length(nav_path, cur_path); + if (length > max_length) { + max_length = length; + pos = i; + } + } + + // Add class to parent
  • , and enclosing
  • if in dropdown + if (pos >= 0) { + var menu_anchor = $(links[pos]); + menu_anchor.parent().addClass("active"); + menu_anchor.closest("li.dropdown").addClass("active"); + } + }); + + function paths(pathname) { + var pieces = pathname.split("/"); + pieces.shift(); // always starts with / + + var end = pieces[pieces.length - 1]; + if (end === "index.html" || end === "") + pieces.pop(); + return(pieces); + } + + // Returns -1 if not found + function prefix_length(needle, haystack) { + if (needle.length > haystack.length) + return(-1); + + // Special case for length-0 haystack, since for loop won't run + if (haystack.length === 0) { + return(needle.length === 0 ? 0 : -1); + } + + for (var i = 0; i < haystack.length; i++) { + if (needle[i] != haystack[i]) + return(i); + } + + return(haystack.length); + } + + /* Clipboard --------------------------*/ + + function changeTooltipMessage(element, msg) { + var tooltipOriginalTitle=element.getAttribute('data-original-title'); + element.setAttribute('data-original-title', msg); + $(element).tooltip('show'); + element.setAttribute('data-original-title', tooltipOriginalTitle); + } + + if(ClipboardJS.isSupported()) { + $(document).ready(function() { + var copyButton = ""; + + $("div.sourceCode").addClass("hasCopyButton"); + + // Insert copy buttons: + $(copyButton).prependTo(".hasCopyButton"); + + // Initialize tooltips: + $('.btn-copy-ex').tooltip({container: 'body'}); + + // Initialize clipboard: + var clipboardBtnCopies = new ClipboardJS('[data-clipboard-copy]', { + text: function(trigger) { + return trigger.parentNode.textContent.replace(/\n#>[^\n]*/g, ""); + } + }); + + clipboardBtnCopies.on('success', function(e) { + changeTooltipMessage(e.trigger, 'Copied!'); + e.clearSelection(); + }); + + clipboardBtnCopies.on('error', function() { + changeTooltipMessage(e.trigger,'Press Ctrl+C or Command+C to copy'); + }); + }); + } +})(window.jQuery || window.$) diff --git a/pkgdown.yml b/pkgdown.yml new file mode 100644 index 0000000..f680ae2 --- /dev/null +++ b/pkgdown.yml @@ -0,0 +1,5 @@ +pandoc: 3.1.11 +pkgdown: 2.1.0 +pkgdown_sha: ~ +articles: {} +last_built: 2024-07-11T22:26Z diff --git a/profiles.html b/profiles.html new file mode 100644 index 0000000..c2b50f7 --- /dev/null +++ b/profiles.html @@ -0,0 +1,408 @@ + +Python for Official Statistics: Learner Profiles +
    + Python for Official Statistics +
    + +
    +
    + + + + + +
    +
    +

    Learner Profiles

    + +

    This is a placeholder file. Please add content here.

    + +
    +
    + + +
    +
    + + + diff --git a/reference.html b/reference.html new file mode 100644 index 0000000..5d84b25 --- /dev/null +++ b/reference.html @@ -0,0 +1,806 @@ + +Python for Official Statistics: Glossary +
    + Python for Official Statistics +
    + +
    +
    + + + + + +
    +
    +

    Glossary

    +

    Last updated on 2024-07-11 | + + Edit this page

    + + + +
    + +
    + + +

    Glossary +

    +
    argument
    +
    +A value given to a function or program when it runs. The term is often +used interchangeably (and inconsistently) with parameter. +
    +
    assertion
    +
    +An expression which is supposed to be true at a particular point in a +program. Programmers typically put assertions in their code to check for +errors; if the assertion fails (i.e., if the expression evaluates as +false), the program halts and produces an error message. See also: invariant, precondition, postcondition. +
    +
    assign
    +
    +To give a value a name by associating a variable with it. +
    +
    body
    +
    +(of a function): the statements that are executed when a function runs. +
    +
    call stack
    +
    +A data structure inside a running program that keeps track of active +function calls. +
    +
    case-insensitive
    +
    +Treating text as if upper and lower case characters of the same letter +were the same. See also: case-sensitive. +
    +
    case-sensitive
    +
    +Treating text as if upper and lower case characters of the same letter +are different. See also: case-insensitive. +
    +
    comment
    +
    +A remark in a program that is intended to help human readers understand +what is going on, but is ignored by the computer. Comments in Python, R, +and the Unix shell start with a # character and run to the +end of the line; comments in SQL start with --, and other +languages have other conventions. +
    +
    compose
    +
    +To apply one function to the result of another, such as +f(g(x)). +
    +
    conditional statement
    +
    +A statement in a program that might or might not be executed depending +on whether a test is true or false. +
    +
    comma-separated values
    +
    +(CSV) A common textual representation for tables in which the values in +each row are separated by commas. +
    +
    default value
    +
    +A value to use for a parameter if nothing is +specified explicitly. +
    +
    defensive programming
    +
    +The practice of writing programs that check their own operation to catch +errors as early as possible. +
    +
    delimiter
    +
    +A character or characters used to separate individual values, such as +the commas between columns in a CSV file. +
    +
    docstring
    +
    +Short for “documentation string”, this refers to textual documentation +embedded in Python programs. Unlike comments, docstrings are preserved +in the running program and can be examined in interactive sessions. +
    +
    documentation
    +
    +Human-language text written to explain what software does, how it works, +or how to use it. +
    +
    dotted notation
    +
    +A two-part notation used in many programming languages in which +thing.component refers to the component +belonging to thing. +
    +
    empty string
    +
    +A character string containing no characters, often thought of as the +“zero” of text. +
    +
    encapsulation
    +
    +The practice of hiding something’s implementation details so that the +rest of a program can worry about what it does rather than +how it does it. +
    +
    floating-point number
    +
    +A number containing a fractional part and an exponent. See also: integer. +
    +
    for loop
    +
    +A loop that is executed once for each value in some kind of set, list, +or range. See also: while loop. +
    +
    function
    +
    +A named group of instructions that is executed when the function’s name +is used in the code. Occurrence of a function name in the code is a function call. Functions may process input arguments and return the result back. Functions may +also be used for logically grouping together pieces of code. In such +cases, they don’t need to return any meaningful value and can be written +without the return +statement completely. Such functions return a special value +None, which is a way of saying “nothing” in Python. +
    +
    function call
    +
    +A use of a function in another piece of software. +
    +
    global variable
    +
    +A variable defined outside of a function. It can be used in global +statements, and read inside functions. +
    +
    heat map
    +
    +A graphical representation of two-dimensional data in which colors, +ranging on a scale of hue or intensity, represent the data values. +
    +
    Integrated Development +Environment (IDE)
    +
    +the place where you write your code. +
    +
    immutable
    +
    +Unchangeable. The value of immutable data cannot be altered after it has +been created. See also: mutable. +
    +
    import
    +
    +To load a library into a program. +
    +
    in-place operators
    +
    +An operator such as += that provides a shorthand notation +for the common case in which the variable being assigned to is also an +operand on the right hand side of the assignment. For example, the +statement x += 3 means the same thing as +x = x + 3. +
    +
    index
    +
    +A subscript that specifies the location of a single value in a +collection, such as a single pixel in an image. +
    +
    inner loop
    +
    +A loop that is inside another loop. See also: outer loop. +
    +
    integer
    +
    +A whole number, such as -12343. See also: floating-point number. +
    +
    invariant
    +
    +An expression whose value doesn’t change during the execution of a +program, typically used in an assertion. See +also: precondition, postcondition. +
    +
    library
    +
    +A family of code units (functions, classes, variables) that implement a +set of related tasks. +
    +
    local variable
    +
    +A variable defined inside of a function, that exists only in the scope +of that function, meaning it cannot be accessed by code outside of the +function. +
    +
    loop variable
    +
    +The variable that keeps track of the progress of the loop. +
    +
    member
    +
    +A variable contained within an object. +
    +
    method
    +
    +A function which is tied to a particular object. +Each of an object’s methods typically implements one of the things it +can do, or one of the questions it can answer. +
    +
    mutable
    +
    +Changeable. The value of mutable data can be altered after it has been +created. See immutable.” +
    +
    notebook
    +
    +Interactive computational environment accessed via your web browser, in +which you can write and execute Python code and combine it with +explanatory text, mathematics and visualizations. Examples are IPython +or Jupyter notebooks. +
    +
    object
    +
    +A collection of conceptually related variables (members) and functions using those variables (methods). +
    +
    outer loop
    +
    +A loop that contains another loop. See also: inner +loop. +
    +
    parameter
    +
    +A variable named in the function’s declaration that is used to hold a +value passed into the call. The term is often used interchangeably (and +inconsistently) with argument. +
    +
    pipe
    +
    +A connection from the output of one program to the input of another. +When two or more programs are connected in this way, they are called a +“pipeline”. +
    +
    postcondition
    +
    +A condition that a function (or other block of code) guarantees is true +once it has finished running. Postconditions are often represented using +assertions. +
    +
    precondition
    +
    +A condition that must be true in order for a function (or other block of +code) to run correctly. +
    +
    regression
    +
    +To re-introduce a bug that was once fixed. +
    +
    return statement
    +
    +A statement that causes a function to stop executing and return a value +to its caller immediately. +
    +
    RGB
    +
    +An additive model that represents +colors as combinations of red, green, and blue. Each color’s value is +typically in the range 0..255 (i.e., a one-byte integer). +
    +
    sequence
    +
    +A collection of information that is presented in a specific order. For +example, in Python, a string is a sequence of +characters, while a list is a sequence of any variable. +
    +
    shape
    +
    +An array’s dimensions, represented as a vector. For example, a 5×3 +array’s shape is (5,3). +
    +
    silent failure
    +
    +Failing without producing any warning messages. Silent failures are hard +to detect and debug. +
    +
    slice
    +
    +A regular subsequence of a larger sequence, such as the first five +elements or every second element. +
    +
    stack frame
    +
    +A data structure that provides storage for a function’s local variables. +Each time a function is called, a new stack frame is created and put on +the top of the call stack. When the function +returns, the stack frame is discarded. +
    +
    standard input
    +
    +A process’s default input stream. In interactive command-line +applications, it is typically connected to the keyboard; in a pipe, it receives data from the standard output of the preceding process. +
    +
    standard output
    +
    +A process’s default output stream. In interactive command-line +applications, data sent to standard output is displayed on the screen; +in a pipe, it is passed to the standard input of the next process. +
    +
    string
    +
    +Short for “character string”, a sequence of zero +or more characters. +
    +
    syntax
    +
    +The rules that define how code must be written for a computer to +understand. +
    +
    syntax error
    +
    +A programming error that occurs when statements are in an order or +contain characters not expected by the programming language. +
    +
    tab completion
    +
    +A feature of command-line interpreters, in which the program +automatically fills in partially typed commands upon pressing the +Tab key. +
    +
    test oracle
    +
    +A program, device, data set, or human being against which the results of +a test can be compared. +
    +
    test-driven +development
    +
    +The practice of writing unit tests before writing the code they +test. +
    +
    traceback
    +
    +The sequence of function calls that led to an error. +
    +
    tuple
    +
    +An immutable sequence +of values. +
    +
    type
    +
    +The classification of something in a program (for example, the contents +of a variable) as a kind of number (e.g. floating-point, integer), string, or something else. +
    +
    type of error
    +
    +Indicates the nature of an error in a program. For example, in Python, +an IOError to problems with file input/output. See also: syntax error. +
    +
    variable
    +
    +A value that has a name associated with it. +
    +
    while loop
    +
    +A loop that keeps executing as long as some condition is true. See also: +for loop. +
    +
    +
    + + +
    +
    + + + diff --git a/safari-pinned-tab.svg b/safari-pinned-tab.svg new file mode 100644 index 0000000..8a74e60 --- /dev/null +++ b/safari-pinned-tab.svg @@ -0,0 +1,68 @@ + + + + +Created by potrace 1.14, written by Peter Selinger 2001-2017 + + + + + + + + diff --git a/site.webmanifest b/site.webmanifest new file mode 100644 index 0000000..f2302ff --- /dev/null +++ b/site.webmanifest @@ -0,0 +1,19 @@ +{ + "name": "The Carpentries", + "short_name": "The Carpentries", + "icons": [ + { + "src": "/android-chrome-192x192.png", + "sizes": "192x192", + "type": "image/png" + }, + { + "src": "/android-chrome-512x512.png", + "sizes": "512x512", + "type": "image/png" + } + ], + "theme_color": "#ffffff", + "background_color": "#ffffff", + "display": "standalone" +} diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 0000000..21c5374 --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,105 @@ + + + + https://UNECE.github.io/ModernStats_Python/01-introduction.html + + + https://UNECE.github.io/ModernStats_Python/02-python_fundamentals.html + + + https://UNECE.github.io/ModernStats_Python/03-data_transformation.html + + + https://UNECE.github.io/ModernStats_Python/04-lists.html + + + https://UNECE.github.io/ModernStats_Python/05-loops.html + + + https://UNECE.github.io/ModernStats_Python/06-alternative_loops.html + + + https://UNECE.github.io/ModernStats_Python/07-functions.html + + + https://UNECE.github.io/ModernStats_Python/08-data_analysis.html + + + https://UNECE.github.io/ModernStats_Python/09-visualizations.html + + + https://UNECE.github.io/ModernStats_Python/10-errors_exceptions.html + + + https://UNECE.github.io/ModernStats_Python/404.html + + + https://UNECE.github.io/ModernStats_Python/CODE_OF_CONDUCT.html + + + https://UNECE.github.io/ModernStats_Python/LICENSE.html + + + https://UNECE.github.io/ModernStats_Python/discuss.html + + + https://UNECE.github.io/ModernStats_Python/index.html + + + https://UNECE.github.io/ModernStats_Python/instructor/01-introduction.html + + + https://UNECE.github.io/ModernStats_Python/instructor/02-python_fundamentals.html + + + https://UNECE.github.io/ModernStats_Python/instructor/03-data_transformation.html + + + https://UNECE.github.io/ModernStats_Python/instructor/04-lists.html + + + https://UNECE.github.io/ModernStats_Python/instructor/05-loops.html + + + https://UNECE.github.io/ModernStats_Python/instructor/06-alternative_loops.html + + + https://UNECE.github.io/ModernStats_Python/instructor/07-functions.html + + + https://UNECE.github.io/ModernStats_Python/instructor/08-data_analysis.html + + + https://UNECE.github.io/ModernStats_Python/instructor/09-visualizations.html + + + https://UNECE.github.io/ModernStats_Python/instructor/10-errors_exceptions.html + + + https://UNECE.github.io/ModernStats_Python/instructor/404.html + + + https://UNECE.github.io/ModernStats_Python/instructor/CODE_OF_CONDUCT.html + + + https://UNECE.github.io/ModernStats_Python/instructor/LICENSE.html + + + https://UNECE.github.io/ModernStats_Python/instructor/discuss.html + + + https://UNECE.github.io/ModernStats_Python/instructor/index.html + + + https://UNECE.github.io/ModernStats_Python/instructor/profiles.html + + + https://UNECE.github.io/ModernStats_Python/instructor/reference.html + + + https://UNECE.github.io/ModernStats_Python/profiles.html + + + https://UNECE.github.io/ModernStats_Python/reference.html + +