How do I find reliable and safe resources or code online?
+
+
+
+
+
+
+
Objectives
+
identify basic concepts in programming
+
+
+
+
+
+
Programming in Python
+
+
In most general terms, programming is the process of writing
+instructions for a computer. In this course we will be using Python as
+the language to communicate with the computer.
+
+
Strictly speaking, Python is an interpreted language, rather than a
+compiled language, meaning we are not communicating directly with the
+computer when we use Python. When we run Python code, our Python source
+code is first translated into byte code, which is then executed by the
+Python virtual machine.
+
+
Programming is a wide topic including a variety of techniques and
+tools. In this course we’ll be focusing on programming for statistical
+analysis.
+
+
IDEs
+
IDE stands for Integrated Development Environment. IDEs are where you
+will write, edit, and debug python scripts, so you want to choose one
+that makes you feel comfortable and includes the functionality that you
+need. Some open-source IDEs for Python include JupyterLab and Visual Studio
+Code.
+
+
+
Packages
+
Packages, or libraries, are extensions to the statistical programming
+language. They contain code, data, and documentation in a standardised
+collection format that can be installed by users, typically via a
+centralised software repository. A typical Python workflow will use base
+Python (the core operations and functions provided by your Python
+installation) as well as specialised data analysis and scientific
+packages like NumPy, SciPy and Pandas.
+
+
Best Practices
+
+
Let’s overview some base concepts that any programmer should always
+keep in mind.
+
+
Documentation
+
Have you ever returned to a task and tried to read a note that you
+quickly scrawled for yourself the last time you were working on it? Have
+you ever inherited a project from a colleague and found you have no idea
+what remains to be done?
+
It can be very challenging to return to your own work or a
+colleague’s and this goes doubly for programming. Documentation is one
+way we can reduce the burden on future selves and our colleagues.
+
+
Inline Documentation
+
As a new programmer, inline documentation can be the most helpful.
+Inline documentation refers to writing comments on the same line as your
+code. For example, if we wrote a line of code to sum 1+1, we might
+document it as follows:
+
+
PYTHON
+
+
1+1# adding the numbers 1 and 1 together.
+
+
Although this is a very simple line of code and it might seem like
+overkill to document it in this way, these types of comments can be very
+helpful in jogging your memory when returning to a project. Inline
+comments can also help you to break multi-step programs into digestible
+and readable pieces.
+
+
+
External Documentation
+
Sometimes you require more detail than you can comfortably fit in
+your inline documentation. In this case it can be helpful to create
+separate files to document your project. This type of documentation will
+typically focus on the goals, scope, and any special instructions
+relating to your project rather than the details fo your code. The most
+common type of external documentation is a README file. It is best
+practice to create a basic README file for any project. A basic README
+should include:
+
a brief description of the project,
+
any special instructions for installation or use,
+
the authors and any references.
+
README files are just text files and it is best practice is to save
+your README file as a README.md markdown document. This
+file format is automatically recognised by code repositories like
+GitHub, so your README contents are displayed alongside your code
+repository.
+
+
+
DocStrings
+
In chapter 7: functions we’ll learn
+about documentation specific to functions known as DocStrings.
+
+
+
Getting Help
+
+
Later on, in chapter 10: Errors
+and Exceptions we will cover errors in more detail. However, before
+we get there it’s very likely you’ll need some assistance writing Python
+code.
+
+
Built-in Help
+
There is a help
+function built into base Python. You can use it to investigate
+built-in functions, data types, and more. For example, say we want to
+know more about the print() function in Python:
+
+
PYTHON
+
+
help(print)
+
+
+
OUTPUT
+
+
Help on built-in function print in module builtins:
+
+print(...)
+ print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
+
+ Prints the values to a stream, or to sys.stdout by default.
+ Optional keyword arguments:
+ file: a file-like object (stream); defaults to the current sys.stdout.
+ sep: string inserted between values, default a space.
+ end: string appended after the last value, default a newline.
+-- More --
+
+
+
+
Finding Resources online
+
Stack Overflow is a valuable
+resource for programmers of all levels. It can be daunting to post your
+own question! Fortunately, chances are someone else has already asked a
+similar question!
It can also be helpful to do a general search for a particular topic
+or error message. It’s very likely the first few results will be from
+StackOverflow, followed by a few from official documentation and then
+you may start seeing results from personal blogs or third parties. These
+third party results can sometime be valuable but we should be cautious!
+Here are a few things to keep in mind when you are looking for online
+resources:
+
Don’t download or install anything unless you are certain of what it
+is and why you need it.
+
Don’t copy or run code unless you fully understand what it
+does.
+
Python is an open-source language; official documentation and
+resources will not be behind a paywall.
+
You may not find a resource or solution to fit your exact needs. Try
+to be flexible and adapt online solutions to fit your needs.
+
+
+
+
+
+
Key Points
+
+
+
Python is an interpreted language.
+
Code is commonly developed inside an integrated development
+environment.
+
A typical Python workflow uses base Python and additional Python
+packages developed for statistical programming purposes.
+
In-line and external documentation helps ensure that your code is
+readable.
+
You can find help through the built-in help function and external
+resources.
Can I change the value associated with a variable after I create
+it?
+
+
+
+
+
+
+
Objectives
+
Assign values to variables.
+
+
+
+
+
+
Variables
+
+
Any Python interpreter can be used as a calculator:
+
+
PYTHON
+
+
3+5*4
+
+
+
OUTPUT
+
+
23
+
+
This is great but not very interesting. To do anything useful with
+data, we need to assign its value to a variable. In Python, we
+can assign a value to a variable, using the equals sign
+=. For example, we can track the weight of a patient who
+weighs 60 kilograms by assigning the value 60 to a variable
+weight_kg:
+
+
PYTHON
+
+
weight_kg =60
+
+
From now on, whenever we use weight_kg, Python will
+substitute the value we assigned to it. In layperson’s terms, a
+variable is a name for a value.
+weight0 is a valid variable name, whereas
+0weight is not
+
+weight and Weight are different
+variables
+
Types of data
+
+
Python knows various types of data. Three common ones are:
+
integer numbers
+
floating point numbers, and
+
strings.
+
In the example above, variable weight_kg has an integer
+value of 60. If we want to more precisely track the weight
+of our patient, we can use a floating point value by executing:
+
+
PYTHON
+
+
weight_kg =60.3
+
+
To create a string, we add single or double quotes around some text.
+To identify and track a patient throughout our study, we can assign each
+person a unique identifier by storing it in a string:
+
+
PYTHON
+
+
patient_id ='001'
+
+
Using Variables in Python
+
+
Once we have data stored with variable names, we can make use of it
+in calculations. We may want to store our patient’s weight in pounds as
+well as kilograms:
+
+
PYTHON
+
+
weight_lb =2.2* weight_kg
+
+
We might decide to add a prefix to our patient identifier:
+
+
PYTHON
+
+
patient_id ='inflam_'+ patient_id
+
+
Built-in Python functions
+
+
To carry out common tasks with data and variables in Python, the
+language provides us with several built-in functions. To display information to
+the screen, we use the print function:
+
+
PYTHON
+
+
print(weight_lb)
+print(patient_id)
+
+
+
OUTPUT
+
+
132.66
+inflam_001
+
+
When we want to make use of a function, referred to as calling the
+function, we follow its name by parentheses. The parentheses are
+important: if you leave them off, the function doesn’t actually run!
+Sometimes you will include values or variables inside the parentheses
+for the function to use. In the case of print, we use the
+parentheses to tell the function what value we want to display. We will
+learn more about how functions work and how to create our own in later
+episodes.
+
We can display multiple things at once using only one
+print call:
+
+
PYTHON
+
+
print(patient_id, 'weight in kilograms:', weight_kg)
+
+
+
OUTPUT
+
+
inflam_001 weight in kilograms: 60.3
+
+
We can also call a function inside of another function call. For example,
+Python has a built-in function called type that tells you a
+value’s data type:
+
+
PYTHON
+
+
print(type(60.3))
+print(type(patient_id))
+
+
+
OUTPUT
+
+
<class 'float'>
+<class 'str'>
+
+
Moreover, we can do arithmetic with variables right inside the
+print function:
+
+
PYTHON
+
+
print('weight in pounds:', 2.2* weight_kg)
+
+
+
OUTPUT
+
+
weight in pounds: 132.66
+
+
The above command, however, did not change the value of
+weight_kg:
+
+
PYTHON
+
+
print(weight_kg)
+
+
+
OUTPUT
+
+
60.3
+
+
To change the value of the weight_kg variable, we have
+to assignweight_kg a new value using the
+equals = sign:
+
+
PYTHON
+
+
weight_kg =65.0
+print('weight in kilograms is now:', weight_kg)
+
+
+
OUTPUT
+
+
weight in kilograms is now: 65.0
+
+
+
+
+
+
+
Variables as Sticky Notes
+
+
+
A variable in Python is analogous to a sticky note with a name
+written on it: assigning a value to a variable is like putting that
+sticky note on a particular value.
+
Using this analogy, we can investigate how assigning a value to one
+variable does not change values of other, seemingly
+related, variables. For example, let’s store the subject’s weight in
+pounds in its own variable:
+
+
PYTHON
+
+
# There are 2.2 pounds per kilogram
+weight_lb =2.2* weight_kg
+print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
+
+
+
OUTPUT
+
+
weight in kilograms: 65.0 and in pounds: 143.0
+
+
Everything in a line of code following the ‘#’ symbol is a comment that is ignored by Python.
+Comments allow programmers to leave explanatory notes for other
+programmers or their future selves.
+
Similar to above, the expression 2.2 * weight_kg is
+evaluated to 143.0, and then this value is assigned to the
+variable weight_lb (i.e. the sticky note
+weight_lb is placed on 143.0). At this point,
+each variable is “stuck” to completely distinct and unrelated
+values.
+
Let’s now change weight_kg:
+
+
PYTHON
+
+
weight_kg =100.0
+print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
+
+
+
OUTPUT
+
+
weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
+
+
Since weight_lb doesn’t “remember” where its value comes
+from, it is not updated when we change weight_kg.
+
+
+
+
+
+
+
+
+
Check Your Understanding
+
+
+
What values do the variables mass and age
+have after each of the following statements? Test your answer by
+executing the lines.
+
+
PYTHON
+
+
mass =47.5
+age =122
+mass = mass *2.0
+age = age -20
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
`mass` holds a value of 47.5, `age` does not exist
+`mass` still holds a value of 47.5, `age` holds a value of 122
+`mass` now has a value of 95.0, `age`'s value is still 122
+`mass` still has a value of 95.0, `age` now holds 102
+
+
+
+
+
+
+
+
+
+
+
Sorting Out References
+
+
+
Python allows you to assign multiple values to multiple variables in
+one line by separating the variables and values with commas. What does
+the following program print out?
+
+
PYTHON
+
+
first, second ='Grace', 'Hopper'
+third, fourth = second, first
+print(third, fourth)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
Hopper Grace
+
+
+
+
+
+
+
+
+
+
+
Seeing Data Types
+
+
+
What are the data types of the following variables?
+
+
diff --git a/03-data_transformation.html b/03-data_transformation.html
new file mode 100644
index 0000000..c6a82aa
--- /dev/null
+++ b/03-data_transformation.html
@@ -0,0 +1,863 @@
+
+Python for Official Statistics: Data Transformation
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
Explain what a library is and what libraries are used for.
+
Import a Python library and use the functions it contains.
+
Read tabular data from a file into a program.
+
Select individual values and subsections from data.
+
Perform operations on arrays of data.
+
+
+
+
+
+
Words are useful, but what’s more useful are the sentences and
+stories we build with them. Similarly, while a lot of powerful, general
+tools are built into Python, specialized tools built up from these basic
+units live in libraries that can be
+called upon when needed.
+
Loading data into Python
+
+
To begin processing the clinical trial inflammation data, we need to
+load it into Python. Python can work with many different file types.
+Text files can be loaded into Python by using the base Python
+function
+
+
PYTHON
+
+
Open("filename.txt", "r")
+
+
where “r” means read only, or if you want to write to the file, you
+can use “w”.
+
However, our patient data is in a csv. file, which is more commonly
+loaded by using a library. Python has hundreds of thousands of libraries
+to choose from to help carry out your work. Importing a library is like
+getting a piece of lab equipment out of a storage locker and setting it
+up on the bench. Libraries provide additional functionality to the basic
+Python package, much like a new piece of equipment adds functionality to
+a lab space. Just like in the lab, importing too many libraries can
+sometimes complicate and slow down your programs - so we only import
+what we need for each program. There are a couple common Python
+libraries to load (and work with data).
+
pandas
+
+
The first library we will present is called pandas pandas is a
+Python library containing a set of functions and specialised data
+structures that have been designed to help Python programmers to perform
+data analysis tasks in a structured way.
+
Most of the things that pandas can do can be done with basic Python,
+but the collected set of pandas functions and data structure makes the
+data analysis tasks more consistent in terms of syntax and therefore
+aids readabilty.
+
Remember to write the library name with a lower case ‘p’ because the
+name of the package and Python is case sensitive.
+
+
Importing the pandas library
+
Importing the pandas library is done in exactly the same way as for
+any other library. In almost all examples of Python code using the
+pandas library, it will have been imported and given an alias of
+pd. We will follow the same convention.
+
+
PYTHON
+
+
import pandas as pd
+
+
+
+
Pandas data structures
+
There are two main data structure used by pandas, they are the Series
+and the Dataframe. The Series equates in general to a vector or a list.
+The Dataframe is equivalent to a table. Each column in a pandas
+Dataframe is a pandas Series data structure.
+
We will mainly be looking at the Dataframe.
+
We can easily create a Pandas Dataframe by reading a .csv file
+
+
+
Reading a csv file
+
When we read a csv dataset in base Python we did so by opening the
+dataset, reading and processing a record at a time and then closing the
+dataset after we had read the last record. Reading datasets in this way
+is slow and places all of the responsibility for extracting individual
+data items of information from the records on the programmer.
+
The main advantage of this approach, however, is that you only have
+to store one dataset record in memory at a time. This means that if you
+have the time, you can process datasets of any size.
+
In Pandas, csv files are read as complete datasets. You do not have
+to explicitly open and close the dataset. All of the dataset records are
+assembled into a Dataframe. If your dataset has column headers in the
+first record then these can be used as the Dataframe column names. You
+can explicitly state this in the parameters to the call, but pandas is
+usually able to infer that there ia a header row and use it
+automatically.
+
To tell Python that we’d like to start using pandas, we need to import it:
+
+
PYTHON
+
+
import pandas as pd
+
+
Often, libraries are given an alias or a short form name, in this
+case pandas is given the alias “pd”. Aliases for common data analysis
+libraries include:
+
+
PYTHON
+
+
import pandas as pd
+import numpy as np
+import matplotlib as plt
+import seaborn as sns
+
+
Once we’ve imported the library, we can ask the library to read our
+data file for us:
+
+
PYTHON
+
+
pd.read_csv("filename.csv)
+
+
pandas is a commonly used library for working with and analysing
+data. However, we will be working with a different package for the
+remainder of this course. If you would like to learn more about data
+manipulation and analysis using pandas, we recommend checking out Data Analysis and
+Visualization with Python for Social Scientists.
+
+
numpy
+
+
The second package that we will present is called NumPy, which stands for Numerical
+Python. In general, you should use this library when you want to do
+fancy things with lots of numbers, especially if you have matrices or
+arrays. Numpy matrices are typically lighter weight with better
+performance, particularly when working with large datasets.
+
We will be using this package to work with our clinical trial
+inflammation data.
+
To tell Python that we’d like to start using NumPy, we need to import it:
+
+
PYTHON
+
+
import numpy as np
+
+
Now that we have imported the library, we can ask the library (by
+using the alisa np) to read our data file for us:
The expression np.loadtxt(...) is a function call that asks Python
+to run the function
+loadtxt which belongs to the np library. The
+dot notation in Python is used most of all as an object
+attribute/property specifier or for invoking its method.
+object.property will give you the object.property value,
+object_name.method() will invoke on object_name method.
+
As an example, John Smith is the John that belongs to the Smith
+family. We could use the dot notation to write his name
+smith.john, just as loadtxt is a function that
+belongs to the np library.
+
np.loadtxt has two parameters: the name of the file we
+want to read and the delimiter
+that separates values on a line. These both need to be character strings
+(or strings for short), so we put
+them in quotes.
+
Since we haven’t told it to do anything else with the function’s
+output, the notebook displays it.
+In this case, that output is the data we just loaded. By default, only a
+few rows and columns are shown (with ... to omit elements
+when displaying big arrays). Note that, to save space when displaying
+NumPy arrays, Python does not show us trailing zeros, so
+1.0 becomes 1..
+
Our call to np.loadtxt read our file but didn’t save the
+data in memory. To do that, we need to assign the array to a variable.
+In a similar manner to how we assign a single value to a variable, we
+can also assign an array of values to a variable using the same syntax.
+Let’s re-run np.loadtxt and save the returned data:
+
+
PYTHON
+
+
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+
This statement doesn’t produce any output because we’ve assigned the
+output to the variable data. If we want to check that the
+data have been loaded, we can print the variable’s value:
Now that the data are in memory, we can manipulate them. First, let’s
+ask what type of thing
+data refers to:
+
+
PYTHON
+
+
print(type(data))
+
+
+
OUTPUT
+
+
<class 'np.ndarray'>
+
+
The output tells us that data currently refers to an
+N-dimensional array, the functionality for which is provided by the
+NumPy library. These data correspond to arthritis patients’
+inflammation. The rows are the individual patients, and the columns are
+their daily inflammation measurements.
+
+
+
+
+
+
Data Type
+
+
+
A Numpy array contains one or more elements of the same type. The
+type function will only tell you that a variable is a NumPy
+array but won’t tell you the type of thing inside the array. We can find
+out the type of the data contained in the NumPy array.
With the following command, we can see the array’s shape:
+
+
PYTHON
+
+
print(data.shape)
+
+
+
OUTPUT
+
+
(60, 40)
+
+
The output tells us that the data array variable
+contains 60 rows and 40 columns. When we created the variable
+data to store our arthritis data, we did not only create
+the array; we also created information about the array, called members or attributes. This extra
+information describes data in the same way an adjective
+describes a noun. data.shape is an attribute of
+data which describes the dimensions of data.
+We use the same dotted notation for the attributes of variables that we
+use for the functions in libraries because they have the same
+part-and-whole relationship.
+
If we want to get a single number from the array, we must provide an
+index in square brackets after the
+variable name, just as we do in math when referring to an element of a
+matrix. Our inflammation data has two dimensions, so we will need to use
+two indices to refer to one specific value:
+
+
PYTHON
+
+
print('first value in data:', data[0, 0])
+
+
+
OUTPUT
+
+
first value in data: 0.0
+
+
+
PYTHON
+
+
print('middle value in data:', data[29, 19])
+
+
+
OUTPUT
+
+
middle value in data: 16.0
+
+
The expression data[29, 19] accesses the element at row
+30, column 20. While this expression may not surprise you,
+data[0, 0] might. Programming languages like Fortran,
+MATLAB and R start counting at 1 because that’s what human beings have
+done for thousands of years. Languages in the C family (including C++,
+Java, Perl, and Python) count from 0 because it represents an offset
+from the first value in the array (the second value is offset by one
+index from the first value). This is closer to the way that computers
+represent arrays (if you are interested in the historical reasons behind
+counting indices from zero, you can read Mike
+Hoye’s blog post). As a result, if we have an M×N array in Python,
+its indices go from 0 to M-1 on the first axis and 0 to N-1 on the
+second. It takes a bit of getting used to, but one way to remember the
+rule is that the index is how many steps we have to take from the start
+to get the item we want.
+
+
+
+
+
+
In the Corner
+
+
+
What may also surprise you is that when Python displays an array, it
+shows the element with index [0, 0] in the upper left
+corner rather than the lower left. This is consistent with the way
+mathematicians draw matrices but different from the Cartesian
+coordinates. The indices are (row, column) instead of (column, row) for
+the same reason, which can be confusing when plotting data.
+
+
+
+
Slicing data
+
+
An index like [30, 20] selects a single element of an
+array, but we can select whole sections as well. For example, we can
+select the first ten days (columns) of values for the first four
+patients (rows) like this:
The slice0:4 means,
+“Start at index 0 and go up to, but not including, index 4”. Again, the
+up-to-but-not-including takes a bit of getting used to, but the rule is
+that the difference between the upper and lower bounds is the number of
+values in the slice.
We also don’t have to include the upper and lower bound on the slice.
+If we don’t include the lower bound, Python uses 0 by default; if we
+don’t include the upper, the slice runs to the end of the axis, and if
+we don’t include either (i.e., if we use ‘:’ on its own), the slice
+includes everything:
+
+
PYTHON
+
+
small = data[:3, 36:]
+print('small is:')
+print(small)
+
+
The above example selects rows 0 through 2 and columns 36 through to
+the end of the array.
+
+
OUTPUT
+
+
small is:
+[[ 2. 3. 0. 0.]
+ [ 1. 1. 0. 1.]
+ [ 2. 2. 1. 1.]]
+
+
diff --git a/04-lists.html b/04-lists.html
new file mode 100644
index 0000000..2c34ab1
--- /dev/null
+++ b/04-lists.html
@@ -0,0 +1,1105 @@
+
+Python for Official Statistics: List and Dictionary Methods
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
Understand the properties and behaviours of lists and
+dictionaries
+
Access values in lists and dictionaries
+
Create and access values from nest lists and dictionaries
+
+
+
+
+
+
Values can also be stored in other Python data types such as lists,
+dictionaries, sets and tuples. Storing objects in a list is a fast and
+versatile way to apply transformations across a sequence of values.
+Storing objects in dictionary as key-value pairs is useful for
+extracting specific values i.e. performing lookup operations.
+
Create and access lists
+
+
Lists have the following properties and behaviours:
+
A single list can store different primitive object types and even
+other lists
+
Lists are ordered and have a 0-based index
+
Lists can be appended to using the methods append() or
+insert()
+
+
Values inside a list can be removed using the methods
+remove() or pop()
+
+
Two lists can be concatenated with the operator +
+
+
Values inside a list can be conditionally iterated through
+
A list is mutable i.e. the values inside a list can be modified in
+place
+
To create a list, values are contained within square brackets
+i.e. [] and individually separated by commas. The function
+list() can also be used to create a list of values from an
+iterable object like a string, set or tuple.
+
+
PYTHON
+
+
# Create a list of integers using []
+list_1 = [1, 3, 5, 7]
+print(list_1)
+
+
+
OUTPUT
+
+
[1, 3, 5, 7]
+
+
+
PYTHON
+
+
# Unlike atomic vectors in R, a list can contain multiple primitive object types
+list_2 = [1, "one", 1.0, True]
+print(list_2)
+
+
+
OUTPUT
+
+
[1, 'one', 1.0, True]
+
+
+
PYTHON
+
+
# You can also use list() on an iterable object to convert it into a list
+string ='abcdefg'
+list_3 =list(string)
+print(list_3)
+
+
+
OUTPUT
+
+
['a', 'b', 'c', 'd', 'e', 'f', 'g']
+
+
Because lists have a 0-based index, we can access individual values
+by their list index position. For 0-based indexes, the first value
+always starts at position 0 i.e. the first element has an index of 0.
+Accessing multiple values by their index positions is also referred to
+as slicing or subsetting a list.
+
Note that we can use negative numbers as indices in Python. When we
+do so, the index -1 gives us the last element in the list,
+-2 gives us the second to last element in the list, and so
+on.
# A syntax quirk for slicing values is to +1 to the last value's index
+# To extract from index 0 to 2, we need to slice from [0:2+1] or [0:3]
+
+# Extract the first three values from list_3
+print('first 3 values:', list_3[0:3])
+
+# Start from index 0 and extract values from each subsequent second position
+print('every second value:', list_3[0::2])
+
+# Start from index 1, end at index 3 and extract from each subsequent second position
+print('every second value from index 1 to 3:', list_3[1:4:2])
+
+
+
OUTPUT
+
+
first 3 values: ['a', 'b', 'c']
+every second value: ['a', 'c', 'e', 'g']
+every second value from index 1 to 3: ['b', 'd']
+
+
Change list values
+
+
Data which can be modified in place is called mutable, while data
+which cannot be modified is called immutable. Strings and numbers are
+immutable in that when we want to change the value of a string or number
+variable, we can only replace the old value with a completely new
+value.
+
+
PYTHON
+
+
string ='abcde'
+string[0] ='b'# Produces a type error as strings are immutable
+
+# TypeError: 'str' object does not support item assignment
+
+
In contrast, lists are mutable and we can modify them after they have
+been created. We can change individual values, append new values, or
+reorder the whole list through sorting.
+
+
PYTHON
+
+
list_4 = ['apple', 'pear', 'plum']
+print('original list_4:', list_4)
+
+# Change the first value i.e. modify the list in place
+list_4[0] ='banana'
+print('modified list_4:', list_4)
+
+# Add new value to list using the method .insert(index number, value)
+list_4.insert(1, 'apple') # Index 1 refers to the second position
+print('appended list_4:', list_4)
# Sorting a list also modifies it in place
+list_5 = [2, 1, 3, 7]
+list_5.sort()
+print('list_5:', list_5)
+
+
+
OUTPUT
+
+
list_5: [1, 2, 3, 7]
+
+
However, be careful when modifying data in-place. If two variables
+refer to the same list, and you modify the list value, it will change
+for both variables!
+
+
PYTHON
+
+
# When we assign list_6 to list_5, it means both list_6 and list_5 point to the
+# same list object, not that list_6 is a copy of list_5.
+
+list_6 = list_5
+print('list_5:', list_5)
+print('list_6:', list_6)
+
+# Change the first value in list_6 from 1 to 2
+list_6[0] =2
+
+print('modified list_6:', list_6)
+print('unmodified list_5:', list_5)
+
+# Warning: list_5 and list_6 have both been modified in place!
Because of this behaviour, code which modifies data in place should
+be handled with care. You can also avoid this behaviour by expliciting
+creating a copy of the original list and modifying only the object copy.
+This is why creating a copy of the original data object can be useful in
+Python.
+
+
PYTHON
+
+
list_5 = [1, 2, 3, 7]
+list_7 = list_5.copy()
+print('list_5:', list_5)
+print('list_7:', list_7)
+
+# As list_7 is a completely new object copied from list_5, modifying list_7 does
+# not affect list_5.
+
+list_7[0] =2
+print('modified list_7:', list_7)
+print('unmodified list_5:', list_5)
There are a lot of functions and methods which can be applied to
+lists, such as len(), max(),
+index() and so forth. Mathematical operations do not work
+on lists of integers, with the exception of +.
+
Note that + concatenates two lists into a single longer
+list, rather than outputting the sum of two lists of numbers.
+
+
PYTHON
+
+
list_8 = [1, 2, 3]
+list_9 = [4, 5, 6]
+
+list_8 + list_9 # This concatenates the lists and does not sum the two lists together
+
+
+
OUTPUT
+
+
[1, 2, 3, 4, 5, 6]
+
+
In your spare time after this workshop, you can search for different
+list functions and methods and test them out yourselves.
+
Nested lists
+
+
We have previously mentioned that lists can be used to store other
+Python object types, including lists. This means that we can create
+nested lists in Python i.e. lists containing lists containing values.
+This property is useful when we have a collection of values that we want
+to access or transform as a subgroup.
+
To create a nested list, we also use [] or
+list() to contain one or more lists of values of
+interest.
+
+
PYTHON
+
+
veg_stock = [
+ ['lettuce', 'lettuce', 'tomato', 'zucchini'],
+ ['lettuce', 'lettuce', 'carrot', 'zucchini'],
+ ['lettuce', 'basil', 'tomato', 'zucchini']
+ ]
+
+# Check that veg_stock is a list object
+print(type(veg_stock))
+
+# Check that the first value in veg_stock is itself a list
+print(veg_stock[0], 'has type', type(veg_stock[0]))
+
+
+
OUTPUT
+
+
<class 'list'>
+['lettuce', 'lettuce', 'tomato', 'zucchini'] has type <class 'list'>
+
+
To extract the first sub-list within the veg_stock list
+object, we refer to its index like we would with any other value inside
+a list i.e. veg_stock[1] points to the second sub-list
+within the veg_stock list.
+
To access an individual string value inside a sub-list, we make use
+of a second index, which points to an individual value inside the
+sub-list.
+
+
PYTHON
+
+
print(veg_stock[0]) # Access the first sub-list
+print(veg_stock[0][0]) # Access the first value in the first sub-list
+
+print(type(veg_stock[0])) # The first value in veg_stock is a list
+print(type(veg_stock[0][0])) # The first value in the first list in veg_stock is a string
In general, however, when we are analysing a large collection of
+values, the best practice is to structure those values in columns and
+rows as a tabular Pandas data frame object. This is covered in another
+Carpentries Course called Python
+for Social Sciences.
+
Lists are still incredibly versatile and useful when you have a
+collection of values that need to be efficiently accessed or
+transformed. For example, data frame column names are commonly extracted
+and stored inside a list, so that the same transformation can then be
+mapped across multiple columns.
+
Create and access dictionaries
+
+
A dictionary is a Python data type that is particularly suited for
+enabling quick lookup operations on unstructured data sets.
+
A dictionary can therefore be thought of as an unordered list where
+every item or value is associated with a unique key (i.e. a self-defined
+index of unique strings or numbers). The index values are called keys
+and a dictionary contains key-value pairs with the format
+{key: value(s)}.
+
Dictionaries can be created by listing individual key-values pairs
+inside {} or using dict().
+
+
PYTHON
+
+
# A key-value pair can contain single or multiple values
+# Keys are treated as case sensitive and unique
+# Multiple values are first stored inside a list
+
+teams = {
+'data science': ['Mei Ling', 'Paul', 'Gwen', 'Suresh'],
+'user design': ['Amy', 'Linh', 'Sasha'],
+'software dev': ['David', 'Prya'],
+'comms': 'Taylor'
+ }
+
+
When using dict(), we need to indicate which key is
+associated with which value. This can be done directly using tuples,
+direct association i.e. using = or using
+zip(), which creates a set of tuples from an iterable
+list.
+
+
PYTHON
+
+
# To use dict(), key-value pairs are can be stored inside tuples
+ds_emp_status =dict([
+ ('Mei Ling', 'full time'),
+ ('Paul', 'full time'),
+ ('Gwen', 'part time'),
+ ('Suresh', 'part time')
+ ])
+
+# Key-value pairs can also be assigned by direct association
+# Keys cannot be strings i.e. wrapped in '' using this approach
+ud_emp_status =dict(
+ Amy ='full time',
+ Linh ='full time',
+ Sasha ='casual'
+ )
+
+# zip() can also be used if each key has only one value
+sd_emp_status =dict(zip(
+ ['David', 'Prya'],
+ ['full time', 'full time']
+ ))
+
+
To access a specific value inside a dictionary, we need to specify
+its key using []. This is similar to slicing or subsetting
+a list by specifying its index using [].
+
+
PYTHON
+
+
# Access the values associated with the key 'data science'
+print(teams['data science'])
+
+print('The object teams is of type', type(teams))
+print('The dict value', teams['data science'], 'is of type', type(teams['data science']))
+
+
+
OUTPUT
+
+
['Mei Ling', 'Paul', 'Gwen', 'Suresh']
+The data object teams is of type <class 'dict'>
+The value ['Mei Ling', 'Paul', 'Gwen', 'Suresh'] is of type <class 'list'>
+
+
We can also access a value from a dictionary using the
+get() method.
+
+
PYTHON
+
+
print(teams.get('user design'))
+
+# get() also enables us to return an alternate string when the key is not found
+# This prevents our code from returning an error message that halts the analysis
+
+print(teams.get('data engineering', 'WARNING: key does not exist'))
+
+
+
OUTPUT
+
+
['Amy', 'Linh', 'Sasha']
+WARNING: key does not exist
+
+
To access data inside a dictionary, we can also perform the following
+other actions:
+
Check whether a key exists in a dictionary using the keyword
+in
+
+
Retrieve unique dictionary keys using dict.keys()
+
+
Retrieve dictionary values using dict.values()
+
+
Retrieve dictionary items using dict.items()
+
+
+
PYTHON
+
+
# Check whether a key exists in a dictionary
+print('data science'in teams)
+print('Data Science'in teams) # Keys are case sensitive
+
+# Retrieve all dictionary keys
+print(teams.keys())
+print(sd_emp_status.keys())
+
+# Retrieve all dictionary values
+print(sd_emp_status.values())
+
+# Retrieve all dictionary key-value pairs
+print(sd_emp_status.items())
To add a new key-value pair to an existing dictionary, we can create
+a new key and directly attach a new value to it using = or
+alternatively use the method update().
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# Add new key-value pair using direct assignment
+sd_emp_status['Mohammad'] ='full time'
+
+# Add new key-value pair using update({'key': 'value'})
+sd_emp_status.update({'Carrie': 'part time'})
+
+print('updated dict items:', sd_emp_status.items())
Because keys are unique, a dictionary cannot contain two keys with
+the same name. This means that adding an item using a key that is
+already present in the dictionary will cause the previous value to be
+overwritten.
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# As the key 'Carrie' already exists, its value will be overwritten
+sd_emp_status['Carrie'] ='full time'
+print('updated dict items:', sd_emp_status.items())
To remove a key-value pair for an existing dictionary, we can use the
+del keyword or the method pop(). Using
+pop() also enables us to return an alternate string if we
+trt to remove a non-existing key, which prevents our code from returning
+an error message that halts the analysis.
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# Delete dictionary keys using del and pop()
+del sd_emp_status['Mohammad']
+sd_emp_status.pop('Carrie')
+sd_emp_status.pop('Anuradha', 'WARNING: key does not exist') # Does not generate an error
+
+print('modified dict items:', sd_emp_status.items())
Similar to lists, dictionaries can be nested as we can also store
+dictionaries as values inside a key-value pair using {}.
+Nested dictionaries are useful when we need to store unstructured data
+in a complex structure. For example, JSON data is commonly used for
+transmitting data in web applications and often exists in a nested
+structure that can be stored using nested dictionaries in Python.
+
+
PYTHON
+
+
# Individual dictionaries are enclosed in {} and separated by a comma
+nested_dict = {
+'dict_1': { # First key is a dictionary of key-value pairs
+'key_1a': 'value_1a',
+'key_1b': 'value_1b'
+ },
+'dict_2': { # Second key is another dictionary of key-value pairs
+'key_2a': 'value_2a',
+'key_2b': 'value_2b'
+ }
+ }
+
+print(nested_dict)
Similar to working with nested lists, to extract a value from the
+first sub-dictionary, we specify both the main dictionary and
+sub-dictionary keys using [].
+
+
PYTHON
+
+
# Extract the value for key 2a in dict_2
+print('original value:', nested_dict['dict_2']['key_2a'])
+
+# Adding or updating a value can be done through the same approach
+nested_dict['dict_2']['key_2a'] ="modified_value_2a"
+
+print('modified value:', nested_dict['dict_2']['key_2a'])
+
+
+
OUTPUT
+
+
original value: value_2a
+modified value: modified_value_2a
+
+
Optional: converting lists and dictionaries to Pandas data
+frames
+
+
Lists and dictionaries can be easily converted into a tabular Pandas
+data frame format. This can be useful when you need to create a small
+data set for unit testing purposes.
+
+
PYTHON
+
+
# Import pandas library
+import pandas as pd
+
+# Create a dictionary with each key-value pair representing a data frame column
+data = {
+'col_1': [3, 2, 1, 0],
+'col_2': ['a', 'b', 'c', 'd']
+ }
+
+df = pd.DataFrame.from_dict(data)
+
+print(df) # Outputs data as a tabular Pandas data frame
+print(type(df))
+
+
+
OUTPUT
+
+
col_1 col_2
+0 3 a
+1 2 b
+2 1 c
+3 0 d
+<class 'pandas.core.frame.DataFrame'>
+
+
+
+
+
+
+
Key Points
+
+
+
Lists can contain any Python object including other lists
+
Lists are ordered i.e. indexed and can therefore be sliced by index
+number
+
Unlike strings and integers, the values inside a list can be
+modified in place
+
A list which contains other lists is referred to as a nested
+list
+
Dictionaries behave like unordered lists and are defined using
+key-value pairs
+
Dictionary keys are unique
+
A dictionary which contains other dictionaries is referred to as a
+nested dictionary
+
Values inside nested lists and dictionaries can be accessed by an
+additional index
+
+
diff --git a/05-loops.html b/05-loops.html
new file mode 100644
index 0000000..849a7bb
--- /dev/null
+++ b/05-loops.html
@@ -0,0 +1,1591 @@
+
+Python for Official Statistics: Loops and Conditional Logic
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
In the episode about visualizing
+data, we will see Python code that plots values of interest from our
+first inflammation dataset (inflammation-01.csv), which
+revealed some suspicious features.
+
We have a dozen data sets right now and potentially more on the way
+if Dr. Maverick can keep up their surprisingly fast clinical trial rate.
+We want to create plots for all of our data sets with a single
+statement. To do that, we’ll have to teach the computer how to repeat
+things.
+
An example task that we might want to repeat is accessing numbers in
+a list, which we will do by printing each number on a line of its
+own.
+
+
PYTHON
+
+
odds = [1, 3, 5, 7]
+
+
In Python, a list is basically an ordered
+collection of elements, and every element has a unique number associated
+with it — its index. This means that we can access elements in a list
+using their indices. For example, we can get the first number in the
+list odds, by using odds[0]. One way to print
+each number is to use four print statements:
Not scalable. Imagine you need to print a list
+that has hundreds of elements. It might be easier to type them in
+manually.
+
Difficult to maintain. If we want to decorate
+each printed element with an asterisk or any other character, we would
+have to change four lines of code. While this might not be a problem for
+small lists, it would definitely be a problem for longer ones.
+
Fragile. If we use it with a list that has more
+elements than what we initially envisioned, it will only display part of
+the list’s elements. A shorter list, on the other hand, will cause an
+error because it will be trying to display elements of the list that do
+not exist.
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-3-7974b6cdaf14> in <module>()
+ 3 print(odds[1])
+ 4 print(odds[2])
+----> 5 print(odds[3])
+
+IndexError: list index out of range
This is shorter — certainly shorter than something that prints every
+number in a hundred-number list — and more robust as well:
+
+
PYTHON
+
+
odds = [1, 3, 5, 7, 9, 11]
+for num in odds:
+print(num)
+
+
+
OUTPUT
+
+
1
+3
+5
+7
+9
+11
+
+
The improved version uses a for
+loop to repeat an operation — in this case, printing — once for each
+thing in a sequence. The general form of a loop is:
+
+
PYTHON
+
+
for variable in collection:
+# do things using variable, such as print
+
+
Using the odds example above, the loop might look like this:
+
where each number (num) in the variable
+odds is looped through and printed one number after
+another. The other numbers in the diagram denote which loop cycle the
+number was printed in (1 being the first loop cycle, and 6 being the
+final loop cycle).
+
We can call the loop
+variable anything we like, but there must be a colon at the end of
+the line starting the loop, and we must indent anything we want to run
+inside the loop. Unlike many other languages, there is no command to
+signify the end of the loop body (e.g., end for);
+everything indented after the for statement belongs to the
+loop.
+
+
+
+
+
+
What’s in a name?
+
+
+
In the example above, the loop variable was given the name
+num as a mnemonic; it is short for ‘number’. We can choose
+any name we want for variables. We might just as easily have chosen the
+name banana for the loop variable, as long as we use the
+same name when we invoke the variable inside the loop:
It is a good idea to choose variable names that are meaningful,
+otherwise it would be more difficult to understand what the loop is
+doing.
+
+
+
+
Here’s another loop that repeatedly updates a variable:
+
+
PYTHON
+
+
length =0
+names = ['Curie', 'Darwin', 'Turing']
+for value in names:
+ length = length +1
+print('There are', length, 'names in the list.')
+
+
+
OUTPUT
+
+
There are 3 names in the list.
+
+
It’s worth tracing the execution of this little program step by step.
+Since there are three names in names, the statement on line
+4 will be executed three times. The first time around,
+length is zero (the value assigned to it on line 1) and
+value is Curie. The statement adds 1 to the
+old value of length, producing 1, and updates
+length to refer to that new value. The next time around,
+value is Darwin and length is 1,
+so length is updated to be 2. After one more update,
+length is 3; since there is nothing left in
+names for Python to process, the loop finishes and the
+print function on line 5 tells us our final answer.
+
Note that a loop variable
+is a variable that is being used to record progress in a loop. It still
+exists after the loop is over, and we can re-use variables previously
+defined as loop variables as
+well:
+
+
PYTHON
+
+
name ='Rosalind'
+for name in ['Curie', 'Darwin', 'Turing']:
+print(name)
+print('after the loop, name is', name)
+
+
+
OUTPUT
+
+
Curie
+Darwin
+Turing
+after the loop, name is Turing
+
+
Note also that finding the length of an object is such a common
+operation that Python actually has a built-in function to do it called
+len:
+
+
PYTHON
+
+
print(len([0, 1, 2, 3]))
+
+
+
OUTPUT
+
+
4
+
+
len is much faster than any function we could write
+ourselves, and much easier to read than a two-line loop; it will also
+give us the length of many other data types we haven’t seen yet, so we
+should always use it when we can.
+
+
+
+
+
+
From 1 to N
+
+
+
Python has a built-in function called range that
+generates a sequence of numbers range can accept 1, 2, or 3
+parameters.
+
If one parameter is given, range generates a sequence
+of that length, starting at zero and incrementing by 1. For example,
+range(3) produces the numbers 0, 1, 2.
+
If two parameters are given, range starts at the first
+and ends just before the second, incrementing by one. For example,
+range(2, 5) produces 2, 3, 4.
+
If range is given 3 parameters, it starts at the first
+one, ends just before the second one, and increments by the third one.
+For example, range(3, 10, 2) produces
+3, 5, 7, 9.
+
Using range, write a loop that uses range
+to print the first 3 natural numbers:
+
+
OUTPUT
+
+
1
+2
+3
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
for number inrange(1, 4):
+print(number)
+
+
+
+
+
+
+
+
+
+
+
Understanding the loops
+
+
+
Given the following loop:
+
+
PYTHON
+
+
word ='oxygen'
+for letter in word:
+print(letter)
+
+
How many times is the body of the loop executed?
+
3 times
+
4 times
+
5 times
+
6 times
+
+
+
+
+
+
+
+
+
The body of the loop is executed 6 times.
+
+
+
+
+
+
+
+
+
+
Computing Powers With Loops
+
+
+
Exponentiation is built into Python:
+
+
PYTHON
+
+
print(5**3)
+
+
+
OUTPUT
+
+
125
+
+
Write a loop that calculates the same result as 5 ** 3
+using multiplication (and without exponentiation).
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
result =1
+for number inrange(0, 3):
+ result = result *5
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Summing a List
+
+
+
Write a loop that calculates the sum of elements in a list by adding
+each element and printing the final value, so
+[124, 402, 36] prints 562
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
numbers = [124, 402, 36]
+summed =0
+for num in numbers:
+ summed = summed + num
+print(summed)
+
+
+
+
+
+
+
+
+
+
+
Computing the Value of a Polynomial
+
+
+
The built-in function enumerate takes a sequence (e.g.,
+a list) and generates a new sequence of the
+same length. Each element of the new sequence is a pair composed of the
+index (0, 1, 2,…) and the value from the original sequence:
+
+
PYTHON
+
+
for idx, val inenumerate(a_list):
+# Do something using idx and val
+
+
The code above loops through a_list, assigning the index
+to idx and the value to val.
+
Suppose you have encoded a polynomial as a list of coefficients in
+the following way: the first element is the constant term, the second
+element is the coefficient of the linear term, the third is the
+coefficient of the quadratic term, etc.
Write a loop using enumerate(coefs) which computes the
+value y of any polynomial, given x and
+coefs.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
y =0
+for idx, coef inenumerate(coefs):
+ y = y + coef * x**idx
+
+
+
+
+
+
Making Choices with Conditional Logic
+
+
How can we use Python to automatically recognize different situations
+we encounter with our data and take a different action for each? In this
+lesson, we’ll learn how to write code that runs only when certain
+conditions are true.
+
+
Conditionals
+
We can ask Python to take different actions, depending on a
+condition, with an if statement:
+
+
PYTHON
+
+
num =37
+if num >100:
+print('greater')
+else:
+print('not greater')
+print('done')
+
+
+
OUTPUT
+
+
not greater
+done
+
+
The second line of this code uses the keyword if to tell
+Python that we want to make a choice. If the test that follows the
+if statement is true, the body of the if
+(i.e., the set of lines indented underneath it) is executed, and
+“greater” is printed. If the test is false, the body of the
+else is executed instead, and “not greater” is printed.
+Only one or the other is ever executed before continuing on with program
+execution to print “done”:
+
Conditional
+statements don’t have to include an else. If there
+isn’t one, Python simply does nothing if the test is false:
+
+
PYTHON
+
+
num =53
+print('before conditional...')
+if num >100:
+print(num, 'is greater than 100')
+print('...after conditional')
+
+
+
OUTPUT
+
+
before conditional...
+...after conditional
+
+
We can also chain several tests together using elif,
+which is short for “else if”. The following Python code uses
+elif to print the sign of a number.
+
+
PYTHON
+
+
num =-3
+
+if num >0:
+print(num, 'is positive')
+elif num ==0:
+print(num, 'is zero')
+else:
+print(num, 'is negative')
+
+
+
OUTPUT
+
+
-3 is negative
+
+
Note that to test for equality we use a double equals sign
+== rather than a single equals sign = which is
+used to assign values.
+
+
+
+
+
+
Comparing in Python
+
+
+
Along with the > and == operators we
+have already used for comparing values in our conditionals, there are a
+few more options to know about:
+
+>: greater than
+
+<: less than
+
+==: equal to
+
+!=: does not equal
+
+>=: greater than or equal to
+
+<=: less than or equal to
+
+
+
+
We can also combine tests using and and or.
+and is only true if both parts are true:
+
+
PYTHON
+
+
if (1>0) and (-1>=0):
+print('both parts are true')
+else:
+print('at least one part is false')
+
+
+
OUTPUT
+
+
at least one part is false
+
+
while or is true if at least one part is true:
+
+
PYTHON
+
+
if (1<0) or (1>=0):
+print('at least one test is true')
+
+
+
OUTPUT
+
+
at least one test is true
+
+
+
+
+
+
+
+True and False
+
+
+
True and False are special words in Python
+called booleans, which represent truth values. A statement
+such as 1 < 0 returns the value False,
+while -1 < 0 returns the value True.
+
+
+
+
+
+
Checking Our Data
+
Now that we’ve seen how conditionals work, we can use them to check
+for the suspicious features we saw in our inflammation data. We are
+about to use functions provided by the numpy module again.
+Therefore, if you’re working in a new Python session, make sure to load
+the module with:
+
+
PYTHON
+
+
import numpy
+
+
From the first couple of plots, we saw that maximum daily
+inflammation exhibits a strange behavior and raises one unit a day.
+Wouldn’t it be a good idea to detect such behavior and report it as
+suspicious? Let’s do that! However, instead of checking every single day
+of the study, let’s merely check if maximum inflammation in the
+beginning (day 0) and in the middle (day 20) of the study are equal to
+the corresponding day numbers.
We also saw a different problem in the third dataset; the minima per
+day were all zero (looks like a healthy person snuck into our study). We
+can also check for this with an elif condition:
+
+
PYTHON
+
+
elif numpy.sum(numpy.amin(data, axis=0)) ==0:
+print('Minima add up to zero!')
+
+
And if neither of these conditions are true, we can use
+else to give the all-clear:
In this way, we have asked Python to do something different depending
+on the condition of our data. Here we printed messages in all cases, but
+we could also imagine not using the else catch-all so that
+messages are only printed when something is wrong, freeing us from
+having to manually examine every plot for features we’ve seen
+before.
Which of the following would be printed if you were to run this code?
+Why did you pick this answer?
+
A
+
B
+
C
+
B and C
+
+
+
+
+
+
+
+
+
C gets printed because the first two conditions,
+4 > 5 and 4 == 5, are not true, but
+4 < 5 is true. In this case, only one of these
+conditions can be true for at a time, but in other scenarios multiple
+elif conditions could be met. In these scenarios, only the
+action associated with the first true elif condition will
+occur, starting from the top of the conditional section.
+
This contrasts with the case of multiple if statements,
+where every action can occur as long as their condition is met.
+
+
+
+
+
+
+
+
+
+
What Is Truth?
+
+
+
True and False booleans are not the only
+values in Python that are true and false. In fact, any value
+can be used in an if or elif. After reading
+and running the code below, explain what the rule is for which values
+are considered true and which are > considered false.
+
+
PYTHON
+
+
if'':
+print('empty string is true')
+if'word':
+print('word is true')
+if []:
+print('empty list is true')
+if [1, 2, 3]:
+print('non-empty list is true')
+if0:
+print('zero is true')
+if1:
+print('one is true')
+
+
+
+
+
+
+
+
+
+
That’s Not Not What I Meant
+
+
+
Sometimes it is useful to check whether some condition is
+not true. The Boolean operator not can do this
+explicitly. After reading and running the code below, write some
+if statements that use not to test the rule
+that you formulated in the previous challenge.
+
+
PYTHON
+
+
ifnot'':
+print('empty string is not true')
+ifnot'word':
+print('word is not true')
+ifnotnotTrue:
+print('not not True is true')
+
+
+
+
+
+
+
+
+
+
Close Enough
+
+
+
Write some conditions that print True if the variable
+a is within 10% of the variable b and
+False otherwise. Compare your implementation with your
+partner’s. Do you get the same answer for all possible pairs of
+numbers?
a =5
+b =5.1
+
+ifabs(a - b) <=0.1*abs(b):
+print('True')
+else:
+print('False')
+
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
print(abs(a - b) <=0.1*abs(b))
+
+
This works because the Booleans True and
+False have string representations which can be printed.
+
+
+
+
+
+
+
+
+
+
In-Place Operators
+
+
+
Python (and most other languages in the C family) provides in-place operators that
+work like this:
+
+
PYTHON
+
+
x =1# original value
+x +=1# add one to x, assigning result back to x
+x *=3# multiply x by 3
+print(x)
+
+
+
OUTPUT
+
+
6
+
+
Write some code that sums the positive and negative numbers in a list
+separately, using in-place operators. Do you think the result is more or
+less readable than writing the same without in-place operators?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
positive_sum =0
+negative_sum =0
+test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
+for num in test_list:
+if num >0:
+ positive_sum += num
+elif num ==0:
+pass
+else:
+ negative_sum += num
+print(positive_sum, negative_sum)
+
+
Here pass means “don’t do anything”. In this particular
+case, it’s not actually needed, since if num == 0 neither
+sum needs to change, but it illustrates the use of elif and
+pass.
+
+
+
+
+
+
+
+
+
+
Sorting a List Into Buckets
+
+
+
In our data folder, large data sets are stored in files
+whose names start with “inflammation-” and small data sets – in files
+whose names start with “small-”. We also have some other files that we
+do not care about at this point. We’d like to break all these files into
+three lists called large_files, small_files,
+and other_files, respectively.
+
Add code to the template below to do this. Note that the string
+method startswith
+returns True if and only if the string it is called on
+starts with the string passed as an argument, that is:
+
+
PYTHON
+
+
'String'.startswith('Str')
+
+
+
OUTPUT
+
+
True
+
+
But
+
+
PYTHON
+
+
'String'.startswith('str')
+
+
+
OUTPUT
+
+
False
+
+
Use the following Python code as your starting point:
Write a loop that counts the number of vowels in a character
+string.
+
Test it on a few individual words and full sentences.
+
Once you are done, compare your solution to your neighbor’s. Did you
+make the same decisions about how to handle the letter ‘y’ (which some
+people think is a vowel, and some do not)?
+
+
Solution
+
vowels = 'aeiouAEIOU'
+sentence = 'Mary had a little lamb.'
+count = 0
+for char in sentence:
+ if char in vowels:
+ count += 1
+
+print('The number of vowels in this string is ' + str(count))
+
{.challenge}
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use for variable in sequence to process the elements of
+a sequence one at a time.
+
The body of a for loop must be indented.
+
Use len(thing) to determine the length of something
+that contains other values.
+
Use if condition to start a conditional statement,
+elif condition to provide additional tests, and
+else to provide a default.
+
The bodies of the branches of conditional statements must be
+indented.
+
Use == to test for equality.
+
+X and Y is only true if both X and
+Y are true.
+
+X or Y is true if either X or
+Y, or both, are true.
+
Zero, the empty string, and the empty list are considered false; all
+other numbers, strings, and lists are considered true.
+
+
diff --git a/06-alternative_loops.html b/06-alternative_loops.html
new file mode 100644
index 0000000..6ee802d
--- /dev/null
+++ b/06-alternative_loops.html
@@ -0,0 +1,489 @@
+
+Python for Official Statistics: Alternatives to Loops
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
What are functions, and how can I use them in Python?
+
How can I define new functions?
+
What’s the difference between defining and calling a function?
+
What happens when I call a function?
+
+
+
+
+
+
+
Objectives
+
identify what a function is
+
create new functions
+
Set default values for function parameters.
+
Explain why we should divide programs into small, single-purpose
+functions.
+
+
+
+
+
+
At this point, we’ve seen that code can have Python make decisions
+about what it sees in our data. What if we want to convert some of our
+data, like taking a temperature in Fahrenheit and converting it to
+Celsius. We could write something like this for converting a single
+number
But we would be in trouble as soon as we had to do this more than a
+couple times. Cutting and pasting it is going to make our code get very
+long and very repetitive, very quickly. We’d like a way to package our
+code so that it is easier to reuse, a shorthand way of re-executing
+longer pieces of code. In Python we can use ‘functions’. Let’s start by
+defining a function fahr_to_celsius that converts
+temperatures from Fahrenheit to Celsius:
+
+
PYTHON
+
+
def explicit_fahr_to_celsius(temp):
+# Assign the converted value to a variable
+ converted = ((temp -32) * (5/9))
+# Return the value of the new variable
+return converted
+
+def fahr_to_celsius(temp):
+# Return converted value more efficiently using the return
+# function without creating a new variable. This code does
+# the same thing as the previous function but it is more explicit
+# in explaining how the return command works.
+return ((temp -32) * (5/9))
+
+
The function definition opens with the keyword def
+followed by the name of the function (fahr_to_celsius) and
+a parenthesized list of parameter names (temp). The body of the function — the statements
+that are executed when it runs — is indented below the definition line.
+The body concludes with a return keyword followed by the
+return value.
+
When we call the function, the values we pass to it are assigned to
+those variables so that we can use them inside the function. Inside the
+function, we use a return
+statement to send a result back to whoever asked for it.
+
Let’s try running our function.
+
+
PYTHON
+
+
fahr_to_celsius(32)
+
+
This command should call our function, using “32” as the input and
+return the function value.
+
In fact, calling our own function is no different from calling any
+other function:
+
+
PYTHON
+
+
print('freezing point of water:', fahr_to_celsius(32), 'C')
+print('boiling point of water:', fahr_to_celsius(212), 'C')
+
+
+
OUTPUT
+
+
freezing point of water: 0.0 C
+boiling point of water: 100.0 C
+
+
We’ve successfully called the function that we defined, and we have
+access to the value that we returned.
+
Composing Functions
+
+
Now that we’ve seen how to turn Fahrenheit into Celsius, we can also
+write the function to turn Celsius into Kelvin:
+
+
PYTHON
+
+
def celsius_to_kelvin(temp_c):
+return temp_c +273.15
+
+print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))
+
+
+
OUTPUT
+
+
freezing point of water in Kelvin: 273.15
+
+
What about converting Fahrenheit to Kelvin? We could write out the
+formula, but we don’t need to. Instead, we can compose the two functions we have
+already created:
+
+
PYTHON
+
+
def fahr_to_kelvin(temp_f):
+ temp_c = fahr_to_celsius(temp_f)
+ temp_k = celsius_to_kelvin(temp_c)
+return temp_k
+
+print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))
+
+
+
OUTPUT
+
+
boiling point of water in Kelvin: 373.15
+
+
This is our first taste of how larger programs are built: we define
+basic operations, then combine them in ever-larger chunks to get the
+effect we want. Real-life functions will usually be larger than the ones
+shown here — typically half a dozen to a few dozen lines — but they
+shouldn’t ever be much longer than that, or the next person who reads it
+won’t be able to understand what’s going on.
+
Variable Scope
+
+
In composing our temperature conversion functions, we created
+variables inside of those functions, temp,
+temp_c, temp_f, and temp_k. We
+refer to these variables as local variables because they no
+longer exist once the function is done executing. If we try to access
+their values outside of the function, we will encounter an error:
+
+
PYTHON
+
+
print('Again, temperature in Kelvin was:', temp_k)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-1-eed2471d229b> in <module>
+----> 1 print('Again, temperature in Kelvin was:', temp_k)
+
+NameError: name 'temp_k' is not defined
+
+
If you want to reuse the temperature in Kelvin after you have
+calculated it with fahr_to_kelvin, you can store the result
+of the function call in a variable:
+
+
PYTHON
+
+
temp_kelvin = fahr_to_kelvin(212.0)
+print('temperature in Kelvin was:', temp_kelvin)
+
+
+
OUTPUT
+
+
temperature in Kelvin was: 373.15
+
+
The variable temp_kelvin, being defined outside any
+function, is said to be global.
+
Inside a function, one can read the value of such global
+variables:
+
+
PYTHON
+
+
def print_temperatures():
+print('temperature in Fahrenheit was:', temp_fahr)
+print('temperature in Kelvin was:', temp_kelvin)
+
+temp_fahr =212.0
+temp_kelvin = fahr_to_kelvin(temp_fahr)
+
+print_temperatures()
+
+
+
OUTPUT
+
+
temperature in Fahrenheit was: 212.0
+temperature in Kelvin was: 373.15
+
+
By giving our functions human-readable names, we can more easily read
+and understand what is happening in the for loop. Even
+better, if at some later date we want to use either of those pieces of
+code again, we can do so in a single line.
+
Testing and Documenting
+
+
Once we start putting things in functions so that we can re-use them,
+we need to start testing that those functions are working correctly. To
+see how to do this, let’s write a function to offset a dataset so that
+it’s mean value shifts to a user-defined value:
We could test this on our actual data, but since we don’t know what
+the values ought to be, it will be hard to tell if the result was
+correct. Instead, let’s use NumPy to create a matrix of 0’s and then
+offset its values to have a mean value of 3:
+
+
PYTHON
+
+
z = numpy.zeros((2,2))
+print(offset_mean(z, 3))
+
+
+
OUTPUT
+
+
[[ 3. 3.]
+ [ 3. 3.]]
+
+
That looks right, so let’s try offset_mean on our real
+data:
+
+
PYTHON
+
+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print(offset_mean(data, 0))
It’s hard to tell from the default output whether the result is
+correct, but there are a few tests that we can run to reassure us:
+
+
PYTHON
+
+
print('original min, mean, and max are:', numpy.amin(data), numpy.mean(data), numpy.amax(data))
+offset_data = offset_mean(data, 0)
+print('min, mean, and max of offset data are:',
+ numpy.amin(offset_data),
+ numpy.mean(offset_data),
+ numpy.amax(offset_data))
+
+
+
OUTPUT
+
+
original min, mean, and max are: 0.0 6.14875 20.0
+min, mean, and and max of offset data are: -6.14875 2.84217094304e-16 13.85125
+
+
That seems almost right: the original mean was about 6.1, so the
+lower bound from zero is now about -6.1. The mean of the offset data
+isn’t quite zero — we’ll explore why not in the challenges — but it’s
+pretty close. We can even go further and check that the standard
+deviation hasn’t changed:
+
+
PYTHON
+
+
print('std dev before and after:', numpy.std(data), numpy.std(offset_data))
+
+
+
OUTPUT
+
+
std dev before and after: 4.61383319712 4.61383319712
+
+
Those values look the same, but we probably wouldn’t notice if they
+were different in the sixth decimal place. Let’s do this instead:
+
+
PYTHON
+
+
print('difference in standard deviations before and after:',
+ numpy.std(data) - numpy.std(offset_data))
+
+
+
OUTPUT
+
+
difference in standard deviations before and after: -3.5527136788e-15
+
+
Again, the difference is very small. It’s still possible that our
+function is wrong, but it seems unlikely enough that we should probably
+get back to doing our analysis.
+
Documentation
+
+
We have one more task first, though: we should write some documentation for our function
+to remind ourselves later what it’s for and how to use it.
+
The usual way to put documentation in software is to add comments like this:
+
+
PYTHON
+
+
# offset_mean(data, target_mean_value):
+# return a new array containing the original data with its mean offset to match the desired value.
+def offset_mean(data, target_mean_value):
+return (data - numpy.mean(data)) + target_mean_value
+
+
There’s a better way, though. If the first thing in a function is a
+string that isn’t assigned to a variable, that string is attached to the
+function as its documentation:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value."""
+return (data - numpy.mean(data)) + target_mean_value
+
+
This is better because we can now ask Python’s built-in help system
+to show us the documentation for the function:
+
+
PYTHON
+
+
help(offset_mean)
+
+
+
OUTPUT
+
+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+ Return a new array containing the original data with its mean offset to match the desired value.
+
+
A string like this is called a docstring. We don’t need to use
+triple quotes when we write one, but if we do, we can break the string
+across multiple lines:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value.
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3], 0)
+ array([-1., 0., 1.])
+ """
+return (data - numpy.mean(data)) + target_mean_value
+
+help(offset_mean)
+
+
+
OUTPUT
+
+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+ Return a new array containing the original data
+ with its mean offset to match the desired value.
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3], 0)
+ array([-1., 0., 1.])
+
+
Defining Defaults
+
+
We have passed parameters to functions in two ways: directly, as in
+type(data), and by name, as in
+numpy.loadtxt(fname='something.csv', delimiter=','). In
+fact, we can pass the filename to loadtxt without the
+fname=:
Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1041, in loa
+dtxt
+ dtype = np.dtype(dtype)
+ File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/core/_internal.py", line 199, in
+_commastring
+ newitem = (dtype, eval(repeats))
+ File "<string>", line 1
+ ,
+ ^
+SyntaxError: unexpected EOF while parsing
+
+
To understand what’s going on, and make our own functions easier to
+use, let’s re-define our offset_mean function like
+this:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value=0.0):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value, (0 by default).
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3])
+ array([-1., 0., 1.])
+ """
+return (data - numpy.mean(data)) + target_mean_value
+
+
The key change is that the second parameter is now written
+target_mean_value=0.0 instead of just
+target_mean_value. If we call the function with two
+arguments, it works as it did before:
But we can also now call it with just one parameter, in which case
+target_mean_value is automatically assigned the default value of 0.0:
+
+
PYTHON
+
+
more_data =5+ numpy.zeros((2, 2))
+print('data before mean offset:')
+print(more_data)
+print('offset data:')
+print(offset_mean(more_data))
+
+
+
OUTPUT
+
+
data before mean offset:
+[[ 5. 5.]
+ [ 5. 5.]]
+offset data:
+[[ 0. 0.]
+ [ 0. 0.]]
+
+
This is handy: if we usually want a function to work one way, but
+occasionally need it to do something else, we can allow people to pass a
+parameter when they need to but provide a default to make the normal
+case easier. The example below shows how Python matches values to
+parameters:
As this example shows, parameters are matched up from left to right,
+and any that haven’t been given a value explicitly get their default
+value. We can override this behavior by naming the value as we pass it
+in:
+
+
PYTHON
+
+
print('only setting the value of c')
+display(c=77)
+
+
+
OUTPUT
+
+
only setting the value of c
+a: 1 b: 2 c: 77
+
+
With that in hand, let’s look at the help for
+numpy.loadtxt:
+
+
PYTHON
+
+
help(numpy.loadtxt)
+
+
+
OUTPUT
+
+
Help on function loadtxt in module numpy.lib.npyio:
+
+loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+ Load data from a text file.
+
+ Each row in the text file must have the same number of values.
+
+ Parameters
+ ----------
+...
+
+
There’s a lot of information here, but the most important part is the
+first couple of lines:
This tells us that loadtxt has one parameter called
+fname that doesn’t have a default value, and eight others
+that do. If we call the function like this:
+
+
PYTHON
+
+
numpy.loadtxt('inflammation-01.csv', ',')
+
+
then the filename is assigned to fname (which is what we
+want), but the delimiter string ',' is assigned to
+dtype rather than delimiter, because
+dtype is the second parameter in the list. However
+',' isn’t a known dtype so our code produced
+an error message when we tried to run it. When we call
+loadtxt we don’t have to provide fname= for
+the filename because it’s the first item in the list, but if we want the
+',' to be assigned to the variable delimiter,
+we do have to provide delimiter= for the second
+parameter since delimiter is not the second parameter in
+the list.
+
Readable functions
+
+
Consider these two functions:
+
+
PYTHON
+
+
def s(p):
+ a =0
+for v in p:
+ a += v
+ m = a /len(p)
+ d =0
+for v in p:
+ d += (v - m) * (v - m)
+return numpy.sqrt(d / (len(p) -1))
+
+def std_dev(sample):
+ sample_sum =0
+for value in sample:
+ sample_sum += value
+
+ sample_mean = sample_sum /len(sample)
+
+ sum_squared_devs =0
+for value in sample:
+ sum_squared_devs += (value - sample_mean) * (value - sample_mean)
+
+return numpy.sqrt(sum_squared_devs / (len(sample) -1))
+
+
The functions s and std_dev are
+computationally equivalent (they both calculate the sample standard
+deviation), but to a human reader, they look very different. You
+probably found std_dev much easier to read and understand
+than s.
+
As this example illustrates, both documentation and a programmer’s
+coding style combine to determine how easy it is for others to
+read and understand the programmer’s code. Choosing meaningful variable
+names and using blank spaces to break the code into logical “chunks” are
+helpful techniques for producing readable code. This is useful
+not only for sharing code with others, but also for the original
+programmer. If you need to revisit code that you wrote months ago and
+haven’t thought about since then, you will appreciate the value of
+readable code!
+
+
+
+
+
+
Combining Strings
+
+
+
“Adding” two strings produces their concatenation:
+'a' + 'b' is 'ab'. Write a function called
+fence that takes two parameters called
+original and wrapper and returns a new string
+that has the wrapper character at the beginning and end of the original.
+A call to your function should look like this:
+
+
PYTHON
+
+
print(fence('name', '*'))
+
+
+
OUTPUT
+
+
*name*
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def fence(original, wrapper):
+return wrapper + original + wrapper
+
+
+
+
+
+
+
+
+
+
+
Return versus print
+
+
+
Note that return and print are not
+interchangeable. print is a Python function that
+prints data to the screen. It enables us, users, see
+the data. return statement, on the other hand, makes data
+visible to the program. Let’s have a look at the following function:
+
+
PYTHON
+
+
def add(a, b):
+print(a + b)
+
+
Question: What will we see if we execute the
+following commands?
+
+
PYTHON
+
+
A = add(7, 3)
+print(A)
+
+
+
+
+
+
+
+
+
+
Python will first execute the function add with
+a = 7 and b = 3, and, therefore, print
+10. However, because function add does not
+have a line that starts with return (no return
+“statement”), it will, by default, return nothing which, in Python
+world, is called None. Therefore, A will be
+assigned to None and the last line (print(A))
+will print None. As a result, we will see:
+
+
OUTPUT
+
+
10
+None
+
+
+
+
+
+
+
+
+
+
+
Selecting Characters From Strings
+
+
+
If the variable s refers to a string, then
+s[0] is the string’s first character and s[-1]
+is its last. Write a function called outer that returns a
+string made up of just the first and last characters of its input. A
+call to your function should look like this:
Write a function rescale that takes an array as input
+and returns a corresponding array of values scaled to lie in the range
+0.0 to 1.0. (Hint: If L and H are the lowest
+and highest values in the original array, then the replacement for a
+value v should be (v-L) / (H-L).)
Run the commands help(numpy.arange) and
+help(numpy.linspace) to see how to use these functions to
+generate regularly-spaced values, then use those values to test your
+rescale function. Once you’ve successfully tested your
+function, add a docstring that explains what it does.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
"""Takes an array as input, and returns a corresponding array scaled so
+that 0 corresponds to the minimum and 1 to the maximum value of the input array.
+
+Examples:
+>>> rescale(numpy.arange(10.0))
+array([ 0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
+ 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ])
+>>> rescale(numpy.linspace(0, 100, 5))
+array([ 0. , 0.25, 0.5 , 0.75, 1. ])
+"""
+
+
+
+
+
+
+
+
+
+
+
Defining Defaults
+
+
+
Rewrite the rescale function so that it scales data to
+lie between 0.0 and 1.0 by default, but will
+allow the caller to specify lower and upper bounds if they want. Compare
+your implementation to your neighbor’s: do the two functions always
+behave the same way?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def rescale(input_array, low_val=0.0, high_val=1.0):
+"""rescales input array values to lie between low_val and high_val"""
+ L = numpy.amin(input_array)
+ H = numpy.amax(input_array)
+ intermed_array = (input_array - L) / (H - L)
+ output_array = intermed_array * (high_val - low_val) + low_val
+return output_array
+
+
+
+
+
+
+
+
+
+
+
Variables Inside and Outside Functions
+
+
+
What does the following piece of code display when run — and why?
+
+
PYTHON
+
+
f =0
+k =0
+
+def f2k(f):
+ k = ((f -32) * (5.0/9.0)) +273.15
+return k
+
+print(f2k(8))
+print(f2k(41))
+print(f2k(32))
+
+print(k)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
259.81666666666666
+278.15
+273.15
+0
+
+
k is 0 because the k inside the function
+f2k doesn’t know about the k defined outside
+the function. When the f2k function is called, it creates a
+local variable
+k. The function does not return any values and does not
+alter k outside of its local copy. Therefore the original
+value of k remains unchanged. Beware that a local
+k is created because f2k internal statements
+affect a new value to it. If k was only
+read, it would simply retrieve the global k
+value.
+
+
+
+
+
+
+
+
+
+
Mixing Default and Non-Default Parameters
+
+
+
Given the following code:
+
+
PYTHON
+
+
def numbers(one, two=2, three, four=4):
+ n =str(one) +str(two) +str(three) +str(four)
+return n
+
+print(numbers(1, three=3))
+
+
what do you expect will be printed? What is actually printed? What
+rule do you think Python is following?
+
1234
+
one2three4
+
1239
+
SyntaxError
+
Given that, what does the following piece of code display when
+run?
+
+
PYTHON
+
+
def func(a, b=3, c=6):
+print('a: ', a, 'b: ', b, 'c:', c)
+
+func(-1, 2)
+
+
a: b: 3 c: 6
+
a: -1 b: 3 c: 6
+
a: -1 b: 2 c: 6
+
a: b: -1 c: 2
+
+
+
+
+
+
+
+
+
Attempting to define the numbers function results in
+4. SyntaxError. The defined parameters two and
+four are given default values. Because one and
+three are not given default values, they are required to be
+included as arguments when the function is called and must be placed
+before any parameters that have default values in the function
+definition.
+
The given call to func displays
+a: -1 b: 2 c: 6. -1 is assigned to the first parameter
+a, 2 is assigned to the next parameter b, and
+c is not passed a value, so it uses its default value
+6.
+
+
+
+
+
+
+
+
+
+
Readable Code
+
+
+
Revise a function you wrote for one of the previous exercises to try
+to make the code more readable. Then, collaborate with one of your
+neighbors to critique each other’s functions and discuss how your
+function implementations could be further improved to make them more
+readable.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Define a function using
+def function_name(parameter).
+
The body of a function must be indented.
+
Call a function using function_name(value).
+
Numbers are stored as integers or floating-point numbers.
+
Variables defined within a function can only be seen and used within
+the body of the function.
+
Variables created outside of any function are called global
+variables.
+
Within a function, we can access global variables.
+
Variables created within a function override global variables if
+their names match.
+
Use help(thing) to view help for something.
+
Put docstrings in functions to provide help for that function.
+
Specify default values for parameters when defining a function using
+name=value in the parameter list.
+
Parameters can be passed by matching based on name, by position, or
+by omitting them (in which case the default value is used).
+
Put code whose parameters change frequently in a function, then call
+it with different parameter values to customize its behavior.
+
+
diff --git a/08-data_analysis.html b/08-data_analysis.html
new file mode 100644
index 0000000..3cc6743
--- /dev/null
+++ b/08-data_analysis.html
@@ -0,0 +1,491 @@
+
+Python for Official Statistics: Data Analysis
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
+
+
diff --git a/10-errors_exceptions.html b/10-errors_exceptions.html
new file mode 100644
index 0000000..8989987
--- /dev/null
+++ b/10-errors_exceptions.html
@@ -0,0 +1,1184 @@
+
+Python for Official Statistics: Errors and Exceptions
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
identify different errors and correct bugs associated with them
+
+
+
+
+
+
Every programmer encounters errors, both those who are just
+beginning, and those who have been programming for years. Encountering
+errors and exceptions can be very frustrating at times, and can make
+coding feel like a hopeless endeavour. However, understanding what the
+different types of errors are and when you are likely to encounter them
+can help a lot. Once you know why you get certain types of
+errors, they become much easier to fix.
+
Errors in Python have a very specific form, called a traceback. Let’s examine one:
+
+
PYTHON
+
+
# This code has an intentional error. You can type it directly or
+# use it for reference to understand the error message below.
+def favorite_ice_cream():
+ ice_creams = [
+'chocolate',
+'vanilla',
+'strawberry'
+ ]
+print(ice_creams[3])
+
+favorite_ice_cream()
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-1-70bd89baa4df> in <module>()
+ 9 print(ice_creams[3])
+ 10
+----> 11 favorite_ice_cream()
+
+<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
+ 7 'strawberry'
+ 8 ]
+----> 9 print(ice_creams[3])
+ 10
+ 11 favorite_ice_cream()
+
+IndexError: list index out of range
+
+
This particular traceback has two levels. You can determine the
+number of levels by looking for the number of arrows on the left hand
+side. In this case:
+
The first shows code from the cell above, with an arrow pointing
+to Line 11 (which is favorite_ice_cream()).
+
The second shows some code in the function
+favorite_ice_cream, with an arrow pointing to Line 9 (which
+is print(ice_creams[3])).
+
The last level is the actual place where the error occurred. The
+other level(s) show what function the program executed to get to the
+next level down. So, in this case, the program first performed a function call to the function
+favorite_ice_cream. Inside this function, the program
+encountered an error on Line 6, when it tried to run the code
+print(ice_creams[3]).
+
+
+
+
+
+
Long Tracebacks
+
+
+
Sometimes, you might see a traceback that is very long -- sometimes
+they might even be 20 levels deep! This can make it seem like something
+horrible happened, but the length of the error message does not reflect
+severity, rather, it indicates that your program called many functions
+before it encountered the error. Most of the time, the actual place
+where the error occurred is at the bottom-most level, so you can skip
+down the traceback to the bottom.
+
+
+
+
So what error did the program actually encounter? In the last line of
+the traceback, Python helpfully tells us the category or type of error
+(in this case, it is an IndexError) and a more detailed
+error message (in this case, it says “list index out of range”).
+
If you encounter an error and don’t know what it means, it is still
+important to read the traceback closely. That way, if you fix the error,
+but encounter a new one, you can tell that the error changed.
+Additionally, sometimes knowing where the error occurred is
+enough to fix it, even if you don’t entirely understand the message.
+
If you do encounter an error you don’t recognize, try looking at the
+official
+documentation on errors. However, note that you may not always be
+able to find the error there, as it is possible to create custom errors.
+In that case, hopefully the custom error message is informative enough
+to help you figure out what went wrong. Libraries like pandas and numpy
+have these custom errors, but the procedure to figure them out is the
+same: go to the earliest line in the error, and look at the error
+message for it. The documentation for these libraries will often provide
+the information you need about any functions you are using. There are
+also large communities of users for data libraries that can help as
+well!
+
+
+
+
+
+
Reading Error Messages
+
+
+
Read the Python code and the resulting traceback below, and answer
+the following questions:
+
How many levels does the traceback have?
+
What is the function name where the error occurred?
+
On which line number in this function did the error occur?
+
What is the type of error?
+
What is the error message?
+
+
PYTHON
+
+
# This code has an intentional error. Do not type it directly;
+# use it for reference to understand the error message below.
+def print_message(day):
+ messages = [
+'Hello, world!',
+'Today is Tuesday!',
+'It is the middle of the week.',
+'Today is Donnerstag in German!',
+'Last day of the week!',
+'Hooray for the weekend!',
+'Aw, the weekend is almost over.'
+ ]
+print(messages[day])
+
+def print_sunday_message():
+ print_message(7)
+
+print_sunday_message()
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-7-3ad455d81842> in <module>
+ 16 print_message(7)
+ 17
+---> 18 print_sunday_message()
+ 19
+
+<ipython-input-7-3ad455d81842> in print_sunday_message()
+ 14
+ 15 def print_sunday_message():
+---> 16 print_message(7)
+ 17
+ 18 print_sunday_message()
+
+<ipython-input-7-3ad455d81842> in print_message(day)
+ 11 'Aw, the weekend is almost over.'
+ 12 ]
+---> 13 print(messages[day])
+ 14
+ 15 def print_sunday_message():
+
+IndexError: list index out of range
+
+
+
+
+
+
+
+
+
+
3 levels
+
print_message
+
13
+
IndexError
+
+list index out of range You can then infer that
+7 is not the right index to use with
+messages.
+
+
+
+
+
+
+
+
+
+
Better errors on newer Pythons
+
+
+
Newer versions of Python have improved error printouts. If you are
+debugging errors, it is often helpful to use the latest Python version,
+even if you support older versions of Python.
+
+
+
+
Type Errors
+
+
One of the most common types of errors in Python are called type
+errors. These errors occur when you try to perform an operation on
+an object in python that cannot support it. This happens easily when
+working with large datasets where there are expected value types like
+either strings or integers. When we write a function expecting integers,
+we will not get an error until we encounter an operation that cannot
+handle strings. For example:
File "<ipython-input-3-6bb841ea1423>", line 3
+ letter=my_string["e"]
+ ^
+TypeError: string indices must be integers
+
+
We get this error because we are trying to use an index to access
+part of our string, which requires an integer. Instead, we entered a
+character and received a type error. This is fixed by replacing “e” with
+2.
+
In the case of datasets, we often see type errors when a mathematical
+operation, such as taking a mean, is performed on a column that contains
+characters, either as a result of formatting or introduced through
+error. As a result, correcting the error can involve simply removing the
+characters from the strings using regular expressions, or if the
+characters have resulted in incorrect data, removing those observations
+from the dataset.
+
Syntax Errors
+
+
When you forget a colon at the end of a line, accidentally add one
+space too many when indenting under an if statement, or
+forget a parenthesis, you will encounter a syntax error. This means that
+Python couldn’t figure out how to read your program. This is similar to
+forgetting punctuation in English: for example, this text is difficult
+to read there is no punctuation there is also no capitalization why is
+this hard because you have to figure out where each sentence ends you
+also have to figure out where each sentence begins to some extent it
+might be ambiguous if there should be a sentence break or not
+
People can typically figure out what is meant by text with no
+punctuation, but people are much smarter than computers. If Python
+doesn’t know how to read the program, it will give up and inform you
+with an error. For example:
Here, Python tells us that there is a SyntaxError on
+line 1, and even puts a little arrow in the place where there is an
+issue. In this case the problem is that the function definition is
+missing a colon at the end.
+
Actually, the function above has two issues with syntax. If
+we fix the problem with the colon, we see that there is also an
+IndentationError, which means that the lines in the
+function definition do not all have the same indentation:
Both SyntaxError and IndentationError
+indicate a problem with the syntax of your program, but an
+IndentationError is more specific: it always means
+that there is a problem with how your code is indented.
+
+
+
+
+
+
Tabs and Spaces
+
+
+
Some indentation errors are harder to spot than others. In
+particular, mixing spaces and tabs can be difficult to spot because they
+are both whitespace. In the
+example below, the first two lines in the body of the function
+some_function are indented with tabs, while the third line
+— with spaces. If you’re working in a Jupyter notebook, be sure to copy
+and paste this example rather than trying to type it in manually because
+Jupyter automatically replaces tabs with spaces.
Visually it is impossible to spot the error. Fortunately, Python does
+not allow you to mix tabs and spaces.
+
+
ERROR
+
+
File "<ipython-input-5-653b36fbcd41>", line 4
+ return msg
+ ^
+TabError: inconsistent use of tabs and spaces in indentation
+
+
+
+
+
Variable Name Errors
+
+
Another very common type of error is called a NameError,
+and occurs when you try to use a variable that does not exist. For
+example:
+
+
PYTHON
+
+
print(a)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-7-9d7b17ad5387> in <module>()
+----> 1 print(a)
+
+NameError: name 'a' is not defined
+
+
Variable name errors come with some of the most informative error
+messages, which are usually of the form “name ‘the_variable_name’ is not
+defined”.
+
Why does this error message occur? That’s a harder question to
+answer, because it depends on what your code is supposed to do. However,
+there are a few very common reasons why you might have an undefined
+variable. The first is that you meant to use a string, but forgot to put quotes around
+it:
+
+
PYTHON
+
+
print(hello)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-8-9553ee03b645> in <module>()
+----> 1 print(hello)
+
+NameError: name 'hello' is not defined
+
+
The second reason is that you might be trying to use a variable that
+does not yet exist. In the following example, count should
+have been defined (e.g., with count = 0) before the for
+loop:
+
+
PYTHON
+
+
for number inrange(10):
+ count = count + number
+print('The count is:', count)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-9-dd6a12d7ca5c> in <module>()
+ 1 for number in range(10):
+----> 2 count = count + number
+ 3 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+
Finally, the third possibility is that you made a typo when you were
+writing your code. Let’s say we fixed the error above by adding the line
+Count = 0 before the for loop. Frustratingly, this actually
+does not fix the error. Remember that variables are case-sensitive, so the variable
+count is different from Count. We still get
+the same error, because we still have not defined
+count:
+
+
PYTHON
+
+
Count =0
+for number inrange(10):
+ count = count + number
+print('The count is:', count)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-10-d77d40059aea> in <module>()
+ 1 Count = 0
+ 2 for number in range(10):
+----> 3 count = count + number
+ 4 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+
Index Errors
+
+
Next up are errors having to do with containers (like lists and
+strings) and the items within them. If you try to access an item in a
+list or a string that does not exist, then you will get an error. This
+makes sense: if you asked someone what day they would like to get
+coffee, and they answered “caturday”, you might be a bit annoyed. Python
+gets similarly annoyed if you try to ask it for an item that doesn’t
+exist:
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-11-d817f55b7d6c> in <module>()
+ 3 print('Letter #2 is', letters[1])
+ 4 print('Letter #3 is', letters[2])
+----> 5 print('Letter #4 is', letters[3])
+
+IndexError: list index out of range
+
+
Here, Python is telling us that there is an IndexError
+in our code, meaning we tried to access a list index that did not
+exist.
+
File Errors
+
+
The last type of error we’ll cover today are the most common type of
+error when using Python with data, those associated with reading and
+writing files: FileNotFoundError. If you try to read a file
+that does not exist, you will receive a FileNotFoundError
+telling you so. If you attempt to write to a file that was opened
+read-only, Python 3 returns an UnsupportedOperationError.
+More generally, problems with input and output manifest as
+OSErrors, which may show up as a more specific subclass;
+you can see the
+list in the Python docs. They all have a unique UNIX
+errno, which is you can see in the error message.
+
+
PYTHON
+
+
file_handle =open('myfile.txt', 'r')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+FileNotFoundError Traceback (most recent call last)
+<ipython-input-14-f6e1ac4aee96> in <module>()
+----> 1 file_handle = open('myfile.txt', 'r')
+
+FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
+
+
One reason for receiving this error is that you specified an
+incorrect path to the file. For example, if I am currently in a folder
+called myproject, and I have a file in
+myproject/writing/myfile.txt, but I try to open
+myfile.txt, this will fail. The correct path would be
+writing/myfile.txt. It is also possible that the file name
+or its path contains a typo. There may also be specific settings based
+on your organization if you are using shared, networked, or cloud-based
+drives. It is best to check with your IT administrators if you are still
+encountering issues reading in a file after troubleshooting.
+
A related issue can occur if you use the “read” flag instead of the
+“write” flag. Python will not give you an error if you try to open a
+file for writing when the file does not exist. However, if you meant to
+open a file for reading, but accidentally opened it for writing, and
+then try to read from it, you will get an
+UnsupportedOperation error telling you that the file was
+not opened for reading:
If you are getting a read or write error on file or folder that you
+are able to open and/or edit with other programs, you may need to
+contact an IT administrator to check the permissions granted to you and
+any programs you are using.
+
These are the most common errors with files, though many others
+exist. If you get an error that you’ve never seen before, searching the
+Internet for that error type often reveals common reasons why you might
+get that error.
+
+
+
+
+
+
Identifying Syntax Errors
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. Is it a
+SyntaxError or an IndentationError?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
PYTHON
+
+
def another_function
+print('Syntax errors are annoying.')
+print('But at least Python tells us about them!')
+print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+
+
+
SyntaxError for missing (): at end of first
+line, IndentationError for mismatch between second and
+third lines. A fixed version is:
+
+
PYTHON
+
+
def another_function():
+print('Syntax errors are annoying.')
+print('But at least Python tells us about them!')
+print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+
+
+
+
Identifying Variable Name Errors
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. What type of
+NameError do you think this is? In other words, is it a
+string with no quotes, a misspelled variable, or a variable that should
+have been defined but was not?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
PYTHON
+
+
for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (Number %3) ==0:
+ message = message + a
+else:
+ message = message +'b'
+print(message)
+
+
+
+
+
+
+
+
+
+
3 NameErrors for number being misspelled,
+for message not defined, and for a not being
+in quotes.
+
Fixed version:
+
+
PYTHON
+
+
message =''
+for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (number %3) ==0:
+ message = message +'a'
+else:
+ message = message +'b'
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Identifying Index Errors
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. What type of error is
+it?
+
Fix the error.
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+
+
+
+
IndexError; the last entry is seasons[3],
+so seasons[4] doesn’t make sense. A fixed version is:
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[-1])
+
+
+
+
+
+
A Final Note About Correcting Errors
+
+
There are a lot of very helpful answers for many error messages,
+however when working with official statistics, we need to also exercise
+some caution. Be aware and be wary of any answers that ask you to
+download a package from someone’s personal GitHub repository or other
+file sharing service. Try to find the type of error first and understand
+what the issue is before downloading anything claiming to fix the error.
+If the error is the result of an issue with a version of a package,
+check if there are any security vulnerabilities with that version, and
+use a package manager to move between package versions.
+
+
diff --git a/404.html b/404.html
new file mode 100644
index 0000000..48eb95f
--- /dev/null
+++ b/404.html
@@ -0,0 +1,445 @@
+
+Python for Official Statistics: Page not found
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Python for Official Statistics
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Page not found
+
+
Our apologies!
+
+
We cannot seem to find the page you are looking for. Here are some
+tips that may help:
+
+
diff --git a/CODE_OF_CONDUCT.html b/CODE_OF_CONDUCT.html
new file mode 100644
index 0000000..c7a51e7
--- /dev/null
+++ b/CODE_OF_CONDUCT.html
@@ -0,0 +1,456 @@
+
+Python for Official Statistics: Contributor Code of Conduct
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
to Share—copy and redistribute the material in any
+medium or format
+
to Adapt—remix, transform, and build upon the
+material
+
for any purpose, even commercially.
+
The licensor cannot revoke these freedoms as long as you follow the
+license terms.
+
Under the following terms:
+
Attribution—You must give appropriate credit
+(mentioning that your work is derived from work that is Copyright (c)
+The Carpentries and, where practical, linking to https://carpentries.org/), provide a link to the
+license, and indicate if changes were made. You may do so in any
+reasonable manner, but not in any way that suggests the licensor
+endorses you or your use.
+
No additional restrictions—You may not apply
+legal terms or technological measures that legally restrict others from
+doing anything the license permits. With the understanding
+that:
+
Notices:
+
You do not have to comply with the license for elements of the
+material in the public domain or where your use is permitted by an
+applicable exception or limitation.
+
No warranties are given. The license may not give you all of the
+permissions necessary for your intended use. For example, other rights
+such as publicity, privacy, or moral rights may limit how you use the
+material.
+
Software
+
+
Except where otherwise noted, the example programs and other software
+provided by The Carpentries are made available under the OSI-approved MIT
+license.
+
Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+“Software”), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
Trademark
+
+
“The Carpentries”, “Software Carpentry”, “Data Carpentry”, and
+“Library Carpentry” and their respective logos are registered trademarks
+of Community Initiatives.
How do I find reliable and safe resources or code online?
+
+
+
+
+
+
+
+
Objectives
+
+
identify basic concepts in programming
+
+
+
+
+
+
+
Programming in Python
+
+
+
+
In most general terms, programming is the process of writing
+instructions for a computer. In this course we will be using Python as
+the language to communicate with the computer.
+
+
Strictly speaking, Python is an interpreted language, rather than a
+compiled language, meaning we are not communicating directly with the
+computer when we use Python. When we run Python code, our Python source
+code is first translated into byte code, which is then executed by the
+Python virtual machine.
+
+
Programming is a wide topic including a variety of techniques and
+tools. In this course we’ll be focusing on programming for statistical
+analysis.
+
+
IDEs
+
+
IDE stands for Integrated Development Environment. IDEs are where you
+will write, edit, and debug python scripts, so you want to choose one
+that makes you feel comfortable and includes the functionality that you
+need. Some open-source IDEs for Python include JupyterLab and Visual Studio
+Code.
+
+
+
Packages
+
+
Packages, or libraries, are extensions to the statistical programming
+language. They contain code, data, and documentation in a standardised
+collection format that can be installed by users, typically via a
+centralised software repository. A typical Python workflow will use base
+Python (the core operations and functions provided by your Python
+installation) as well as specialised data analysis and scientific
+packages like NumPy, SciPy and Pandas.
+
+
Best Practices
+
+
+
+
Let’s overview some base concepts that any programmer should always
+keep in mind.
+
+
Documentation
+
+
Have you ever returned to a task and tried to read a note that you
+quickly scrawled for yourself the last time you were working on it? Have
+you ever inherited a project from a colleague and found you have no idea
+what remains to be done?
+
It can be very challenging to return to your own work or a
+colleague’s and this goes doubly for programming. Documentation is one
+way we can reduce the burden on future selves and our colleagues.
+
+
Inline Documentation
+
+
As a new programmer, inline documentation can be the most helpful.
+Inline documentation refers to writing comments on the same line as your
+code. For example, if we wrote a line of code to sum 1+1, we might
+document it as follows:
+
+
PYTHON
+
+
1+1# adding the numbers 1 and 1 together.
+
+
Although this is a very simple line of code and it might seem like
+overkill to document it in this way, these types of comments can be very
+helpful in jogging your memory when returning to a project. Inline
+comments can also help you to break multi-step programs into digestible
+and readable pieces.
+
+
+
External Documentation
+
+
Sometimes you require more detail than you can comfortably fit in
+your inline documentation. In this case it can be helpful to create
+separate files to document your project. This type of documentation will
+typically focus on the goals, scope, and any special instructions
+relating to your project rather than the details fo your code. The most
+common type of external documentation is a README file. It is best
+practice to create a basic README file for any project. A basic README
+should include:
+
+
a brief description of the project,
+
any special instructions for installation or use,
+
the authors and any references.
+
+
README files are just text files and it is best practice is to save
+your README file as a README.md markdown document. This
+file format is automatically recognised by code repositories like
+GitHub, so your README contents are displayed alongside your code
+repository.
+
+
+
DocStrings
+
+
In chapter 7: functions we’ll learn
+about documentation specific to functions known as DocStrings.
+
+
+
Getting Help
+
+
+
+
Later on, in chapter 10: Errors
+and Exceptions we will cover errors in more detail. However, before
+we get there it’s very likely you’ll need some assistance writing Python
+code.
+
+
Built-in Help
+
+
There is a help
+function built into base Python. You can use it to investigate
+built-in functions, data types, and more. For example, say we want to
+know more about the print() function in Python:
+
+
PYTHON
+
+
help(print)
+
+
+
OUTPUT
+
+
Help on built-in function print in module builtins:
+
+print(...)
+ print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
+
+ Prints the values to a stream, or to sys.stdout by default.
+ Optional keyword arguments:
+ file: a file-like object (stream); defaults to the current sys.stdout.
+ sep: string inserted between values, default a space.
+ end: string appended after the last value, default a newline.
+-- More --
+
+
+
+
Finding Resources online
+
+
Stack Overflow is a valuable
+resource for programmers of all levels. It can be daunting to post your
+own question! Fortunately, chances are someone else has already asked a
+similar question!
It can also be helpful to do a general search for a particular topic
+or error message. It’s very likely the first few results will be from
+StackOverflow, followed by a few from official documentation and then
+you may start seeing results from personal blogs or third parties. These
+third party results can sometime be valuable but we should be cautious!
+Here are a few things to keep in mind when you are looking for online
+resources:
+
+
Don’t download or install anything unless you are certain of what it
+is and why you need it.
+
Don’t copy or run code unless you fully understand what it
+does.
+
Python is an open-source language; official documentation and
+resources will not be behind a paywall.
+
You may not find a resource or solution to fit your exact needs. Try
+to be flexible and adapt online solutions to fit your needs.
+
+
+
+
+
+
+
Key Points
+
+
+
+
Python is an interpreted language.
+
Code is commonly developed inside an integrated development
+environment.
+
A typical Python workflow uses base Python and additional Python
+packages developed for statistical programming purposes.
+
In-line and external documentation helps ensure that your code is
+readable.
+
You can find help through the built-in help function and external
+resources.
Can I change the value associated with a variable after I create
+it?
+
+
+
+
+
+
+
+
Objectives
+
+
Assign values to variables.
+
+
+
+
+
+
+
Variables
+
+
+
+
Any Python interpreter can be used as a calculator:
+
+
PYTHON
+
+
3+5*4
+
+
+
OUTPUT
+
+
23
+
+
This is great but not very interesting. To do anything useful with
+data, we need to assign its value to a variable. In Python, we
+can assign a value to a variable, using the equals sign
+=. For example, we can track the weight of a patient who
+weighs 60 kilograms by assigning the value 60 to a variable
+weight_kg:
+
+
PYTHON
+
+
weight_kg =60
+
+
From now on, whenever we use weight_kg, Python will
+substitute the value we assigned to it. In layperson’s terms, a
+variable is a name for a value.
+weight0 is a valid variable name, whereas
+0weight is not
+
+weight and Weight are different
+variables
+
Types of data
+
+
+
+
Python knows various types of data. Three common ones are:
+
+
integer numbers
+
floating point numbers, and
+
strings.
+
+
In the example above, variable weight_kg has an integer
+value of 60. If we want to more precisely track the weight
+of our patient, we can use a floating point value by executing:
+
+
PYTHON
+
+
weight_kg =60.3
+
+
To create a string, we add single or double quotes around some text.
+To identify and track a patient throughout our study, we can assign each
+person a unique identifier by storing it in a string:
+
+
PYTHON
+
+
patient_id ='001'
+
+
Using Variables in Python
+
+
+
+
Once we have data stored with variable names, we can make use of it
+in calculations. We may want to store our patient’s weight in pounds as
+well as kilograms:
+
+
PYTHON
+
+
weight_lb =2.2* weight_kg
+
+
We might decide to add a prefix to our patient identifier:
+
+
PYTHON
+
+
patient_id ='inflam_'+ patient_id
+
+
Built-in Python functions
+
+
+
+
To carry out common tasks with data and variables in Python, the
+language provides us with several built-in functions. To display information to
+the screen, we use the print function:
+
+
PYTHON
+
+
print(weight_lb)
+print(patient_id)
+
+
+
OUTPUT
+
+
132.66
+inflam_001
+
+
When we want to make use of a function, referred to as calling the
+function, we follow its name by parentheses. The parentheses are
+important: if you leave them off, the function doesn’t actually run!
+Sometimes you will include values or variables inside the parentheses
+for the function to use. In the case of print, we use the
+parentheses to tell the function what value we want to display. We will
+learn more about how functions work and how to create our own in later
+episodes.
+
We can display multiple things at once using only one
+print call:
+
+
PYTHON
+
+
print(patient_id, 'weight in kilograms:', weight_kg)
+
+
+
OUTPUT
+
+
inflam_001 weight in kilograms: 60.3
+
+
We can also call a function inside of another function call. For example,
+Python has a built-in function called type that tells you a
+value’s data type:
+
+
PYTHON
+
+
print(type(60.3))
+print(type(patient_id))
+
+
+
OUTPUT
+
+
<class 'float'>
+<class 'str'>
+
+
Moreover, we can do arithmetic with variables right inside the
+print function:
+
+
PYTHON
+
+
print('weight in pounds:', 2.2* weight_kg)
+
+
+
OUTPUT
+
+
weight in pounds: 132.66
+
+
The above command, however, did not change the value of
+weight_kg:
+
+
PYTHON
+
+
print(weight_kg)
+
+
+
OUTPUT
+
+
60.3
+
+
To change the value of the weight_kg variable, we have
+to assignweight_kg a new value using the
+equals = sign:
+
+
PYTHON
+
+
weight_kg =65.0
+print('weight in kilograms is now:', weight_kg)
+
+
+
OUTPUT
+
+
weight in kilograms is now: 65.0
+
+
+
+
+
+
+
Variables as Sticky Notes
+
+
+
A variable in Python is analogous to a sticky note with a name
+written on it: assigning a value to a variable is like putting that
+sticky note on a particular value.
+
Using this analogy, we can investigate how assigning a value to one
+variable does not change values of other, seemingly
+related, variables. For example, let’s store the subject’s weight in
+pounds in its own variable:
+
+
PYTHON
+
+
# There are 2.2 pounds per kilogram
+weight_lb =2.2* weight_kg
+print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
+
+
+
OUTPUT
+
+
weight in kilograms: 65.0 and in pounds: 143.0
+
+
Everything in a line of code following the ‘#’ symbol is a comment that is ignored by Python.
+Comments allow programmers to leave explanatory notes for other
+programmers or their future selves.
+
Similar to above, the expression 2.2 * weight_kg is
+evaluated to 143.0, and then this value is assigned to the
+variable weight_lb (i.e. the sticky note
+weight_lb is placed on 143.0). At this point,
+each variable is “stuck” to completely distinct and unrelated
+values.
+
Let’s now change weight_kg:
+
+
PYTHON
+
+
weight_kg =100.0
+print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
+
+
+
OUTPUT
+
+
weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
+
+
Since weight_lb doesn’t “remember” where its value comes
+from, it is not updated when we change weight_kg.
+
+
+
+
+
+
+
+
+
Check Your Understanding
+
+
+
What values do the variables mass and age
+have after each of the following statements? Test your answer by
+executing the lines.
+
+
PYTHON
+
+
mass =47.5
+age =122
+mass = mass *2.0
+age = age -20
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
`mass` holds a value of 47.5, `age` does not exist
+`mass` still holds a value of 47.5, `age` holds a value of 122
+`mass` now has a value of 95.0, `age`'s value is still 122
+`mass` still has a value of 95.0, `age` now holds 102
+
+
+
+
+
+
+
+
+
+
+
Sorting Out References
+
+
+
Python allows you to assign multiple values to multiple variables in
+one line by separating the variables and values with commas. What does
+the following program print out?
+
+
PYTHON
+
+
first, second ='Grace', 'Hopper'
+third, fourth = second, first
+print(third, fourth)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
Hopper Grace
+
+
+
+
+
+
+
+
+
+
+
Seeing Data Types
+
+
+
What are the data types of the following variables?
Explain what a library is and what libraries are used for.
+
Import a Python library and use the functions it contains.
+
Read tabular data from a file into a program.
+
Select individual values and subsections from data.
+
Perform operations on arrays of data.
+
+
+
+
+
+
+
Words are useful, but what’s more useful are the sentences and
+stories we build with them. Similarly, while a lot of powerful, general
+tools are built into Python, specialized tools built up from these basic
+units live in libraries that can be
+called upon when needed.
+
Loading data into Python
+
+
+
+
To begin processing the clinical trial inflammation data, we need to
+load it into Python. Python can work with many different file types.
+Text files can be loaded into Python by using the base Python
+function
+
+
PYTHON
+
+
Open("filename.txt", "r")
+
+
where “r” means read only, or if you want to write to the file, you
+can use “w”.
+
However, our patient data is in a csv. file, which is more commonly
+loaded by using a library. Python has hundreds of thousands of libraries
+to choose from to help carry out your work. Importing a library is like
+getting a piece of lab equipment out of a storage locker and setting it
+up on the bench. Libraries provide additional functionality to the basic
+Python package, much like a new piece of equipment adds functionality to
+a lab space. Just like in the lab, importing too many libraries can
+sometimes complicate and slow down your programs - so we only import
+what we need for each program. There are a couple common Python
+libraries to load (and work with data).
+
pandas
+
+
+
+
The first library we will present is called pandas pandas is a
+Python library containing a set of functions and specialised data
+structures that have been designed to help Python programmers to perform
+data analysis tasks in a structured way.
+
Most of the things that pandas can do can be done with basic Python,
+but the collected set of pandas functions and data structure makes the
+data analysis tasks more consistent in terms of syntax and therefore
+aids readabilty.
+
Remember to write the library name with a lower case ‘p’ because the
+name of the package and Python is case sensitive.
+
+
Importing the pandas library
+
+
Importing the pandas library is done in exactly the same way as for
+any other library. In almost all examples of Python code using the
+pandas library, it will have been imported and given an alias of
+pd. We will follow the same convention.
+
+
PYTHON
+
+
import pandas as pd
+
+
+
+
Pandas data structures
+
+
There are two main data structure used by pandas, they are the Series
+and the Dataframe. The Series equates in general to a vector or a list.
+The Dataframe is equivalent to a table. Each column in a pandas
+Dataframe is a pandas Series data structure.
+
We will mainly be looking at the Dataframe.
+
We can easily create a Pandas Dataframe by reading a .csv file
+
+
+
Reading a csv file
+
+
When we read a csv dataset in base Python we did so by opening the
+dataset, reading and processing a record at a time and then closing the
+dataset after we had read the last record. Reading datasets in this way
+is slow and places all of the responsibility for extracting individual
+data items of information from the records on the programmer.
+
The main advantage of this approach, however, is that you only have
+to store one dataset record in memory at a time. This means that if you
+have the time, you can process datasets of any size.
+
In Pandas, csv files are read as complete datasets. You do not have
+to explicitly open and close the dataset. All of the dataset records are
+assembled into a Dataframe. If your dataset has column headers in the
+first record then these can be used as the Dataframe column names. You
+can explicitly state this in the parameters to the call, but pandas is
+usually able to infer that there ia a header row and use it
+automatically.
+
To tell Python that we’d like to start using pandas, we need to import it:
+
+
PYTHON
+
+
import pandas as pd
+
+
Often, libraries are given an alias or a short form name, in this
+case pandas is given the alias “pd”. Aliases for common data analysis
+libraries include:
+
+
PYTHON
+
+
import pandas as pd
+import numpy as np
+import matplotlib as plt
+import seaborn as sns
+
+
Once we’ve imported the library, we can ask the library to read our
+data file for us:
+
+
PYTHON
+
+
pd.read_csv("filename.csv)
+
+
pandas is a commonly used library for working with and analysing
+data. However, we will be working with a different package for the
+remainder of this course. If you would like to learn more about data
+manipulation and analysis using pandas, we recommend checking out Data Analysis and
+Visualization with Python for Social Scientists.
+
+
numpy
+
+
+
+
The second package that we will present is called NumPy, which stands for Numerical
+Python. In general, you should use this library when you want to do
+fancy things with lots of numbers, especially if you have matrices or
+arrays. Numpy matrices are typically lighter weight with better
+performance, particularly when working with large datasets.
+
We will be using this package to work with our clinical trial
+inflammation data.
+
To tell Python that we’d like to start using NumPy, we need to import it:
+
+
PYTHON
+
+
import numpy as np
+
+
Now that we have imported the library, we can ask the library (by
+using the alisa np) to read our data file for us:
The expression np.loadtxt(...) is a function call that asks Python
+to run the function
+loadtxt which belongs to the np library. The
+dot notation in Python is used most of all as an object
+attribute/property specifier or for invoking its method.
+object.property will give you the object.property value,
+object_name.method() will invoke on object_name method.
+
As an example, John Smith is the John that belongs to the Smith
+family. We could use the dot notation to write his name
+smith.john, just as loadtxt is a function that
+belongs to the np library.
+
np.loadtxt has two parameters: the name of the file we
+want to read and the delimiter
+that separates values on a line. These both need to be character strings
+(or strings for short), so we put
+them in quotes.
+
Since we haven’t told it to do anything else with the function’s
+output, the notebook displays it.
+In this case, that output is the data we just loaded. By default, only a
+few rows and columns are shown (with ... to omit elements
+when displaying big arrays). Note that, to save space when displaying
+NumPy arrays, Python does not show us trailing zeros, so
+1.0 becomes 1..
+
Our call to np.loadtxt read our file but didn’t save the
+data in memory. To do that, we need to assign the array to a variable.
+In a similar manner to how we assign a single value to a variable, we
+can also assign an array of values to a variable using the same syntax.
+Let’s re-run np.loadtxt and save the returned data:
+
+
PYTHON
+
+
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+
This statement doesn’t produce any output because we’ve assigned the
+output to the variable data. If we want to check that the
+data have been loaded, we can print the variable’s value:
Now that the data are in memory, we can manipulate them. First, let’s
+ask what type of thing
+data refers to:
+
+
PYTHON
+
+
print(type(data))
+
+
+
OUTPUT
+
+
<class 'np.ndarray'>
+
+
The output tells us that data currently refers to an
+N-dimensional array, the functionality for which is provided by the
+NumPy library. These data correspond to arthritis patients’
+inflammation. The rows are the individual patients, and the columns are
+their daily inflammation measurements.
+
+
+
+
+
+
Data Type
+
+
+
A Numpy array contains one or more elements of the same type. The
+type function will only tell you that a variable is a NumPy
+array but won’t tell you the type of thing inside the array. We can find
+out the type of the data contained in the NumPy array.
With the following command, we can see the array’s shape:
+
+
PYTHON
+
+
print(data.shape)
+
+
+
OUTPUT
+
+
(60, 40)
+
+
The output tells us that the data array variable
+contains 60 rows and 40 columns. When we created the variable
+data to store our arthritis data, we did not only create
+the array; we also created information about the array, called members or attributes. This extra
+information describes data in the same way an adjective
+describes a noun. data.shape is an attribute of
+data which describes the dimensions of data.
+We use the same dotted notation for the attributes of variables that we
+use for the functions in libraries because they have the same
+part-and-whole relationship.
+
If we want to get a single number from the array, we must provide an
+index in square brackets after the
+variable name, just as we do in math when referring to an element of a
+matrix. Our inflammation data has two dimensions, so we will need to use
+two indices to refer to one specific value:
+
+
PYTHON
+
+
print('first value in data:', data[0, 0])
+
+
+
OUTPUT
+
+
first value in data: 0.0
+
+
+
PYTHON
+
+
print('middle value in data:', data[29, 19])
+
+
+
OUTPUT
+
+
middle value in data: 16.0
+
+
The expression data[29, 19] accesses the element at row
+30, column 20. While this expression may not surprise you,
+data[0, 0] might. Programming languages like Fortran,
+MATLAB and R start counting at 1 because that’s what human beings have
+done for thousands of years. Languages in the C family (including C++,
+Java, Perl, and Python) count from 0 because it represents an offset
+from the first value in the array (the second value is offset by one
+index from the first value). This is closer to the way that computers
+represent arrays (if you are interested in the historical reasons behind
+counting indices from zero, you can read Mike
+Hoye’s blog post). As a result, if we have an M×N array in Python,
+its indices go from 0 to M-1 on the first axis and 0 to N-1 on the
+second. It takes a bit of getting used to, but one way to remember the
+rule is that the index is how many steps we have to take from the start
+to get the item we want.
+
+
+
+
+
+
In the Corner
+
+
+
What may also surprise you is that when Python displays an array, it
+shows the element with index [0, 0] in the upper left
+corner rather than the lower left. This is consistent with the way
+mathematicians draw matrices but different from the Cartesian
+coordinates. The indices are (row, column) instead of (column, row) for
+the same reason, which can be confusing when plotting data.
+
+
+
+
Slicing data
+
+
+
+
An index like [30, 20] selects a single element of an
+array, but we can select whole sections as well. For example, we can
+select the first ten days (columns) of values for the first four
+patients (rows) like this:
The slice0:4 means,
+“Start at index 0 and go up to, but not including, index 4”. Again, the
+up-to-but-not-including takes a bit of getting used to, but the rule is
+that the difference between the upper and lower bounds is the number of
+values in the slice.
We also don’t have to include the upper and lower bound on the slice.
+If we don’t include the lower bound, Python uses 0 by default; if we
+don’t include the upper, the slice runs to the end of the axis, and if
+we don’t include either (i.e., if we use ‘:’ on its own), the slice
+includes everything:
+
+
PYTHON
+
+
small = data[:3, 36:]
+print('small is:')
+print(small)
+
+
The above example selects rows 0 through 2 and columns 36 through to
+the end of the array.
+
+
OUTPUT
+
+
small is:
+[[ 2. 3. 0. 0.]
+ [ 1. 1. 0. 1.]
+ [ 2. 2. 1. 1.]]
Understand the properties and behaviours of lists and
+dictionaries
+
Access values in lists and dictionaries
+
Create and access values from nest lists and dictionaries
+
+
+
+
+
+
+
Values can also be stored in other Python data types such as lists,
+dictionaries, sets and tuples. Storing objects in a list is a fast and
+versatile way to apply transformations across a sequence of values.
+Storing objects in dictionary as key-value pairs is useful for
+extracting specific values i.e. performing lookup operations.
+
Create and access lists
+
+
+
+
Lists have the following properties and behaviours:
+
+
A single list can store different primitive object types and even
+other lists
+
Lists are ordered and have a 0-based index
+
Lists can be appended to using the methods append() or
+insert()
+
+
Values inside a list can be removed using the methods
+remove() or pop()
+
+
Two lists can be concatenated with the operator +
+
+
Values inside a list can be conditionally iterated through
+
A list is mutable i.e. the values inside a list can be modified in
+place
+
+
To create a list, values are contained within square brackets
+i.e. [] and individually separated by commas. The function
+list() can also be used to create a list of values from an
+iterable object like a string, set or tuple.
+
+
PYTHON
+
+
# Create a list of integers using []
+list_1 = [1, 3, 5, 7]
+print(list_1)
+
+
+
OUTPUT
+
+
[1, 3, 5, 7]
+
+
+
PYTHON
+
+
# Unlike atomic vectors in R, a list can contain multiple primitive object types
+list_2 = [1, "one", 1.0, True]
+print(list_2)
+
+
+
OUTPUT
+
+
[1, 'one', 1.0, True]
+
+
+
PYTHON
+
+
# You can also use list() on an iterable object to convert it into a list
+string ='abcdefg'
+list_3 =list(string)
+print(list_3)
+
+
+
OUTPUT
+
+
['a', 'b', 'c', 'd', 'e', 'f', 'g']
+
+
Because lists have a 0-based index, we can access individual values
+by their list index position. For 0-based indexes, the first value
+always starts at position 0 i.e. the first element has an index of 0.
+Accessing multiple values by their index positions is also referred to
+as slicing or subsetting a list.
+
Note that we can use negative numbers as indices in Python. When we
+do so, the index -1 gives us the last element in the list,
+-2 gives us the second to last element in the list, and so
+on.
# A syntax quirk for slicing values is to +1 to the last value's index
+# To extract from index 0 to 2, we need to slice from [0:2+1] or [0:3]
+
+# Extract the first three values from list_3
+print('first 3 values:', list_3[0:3])
+
+# Start from index 0 and extract values from each subsequent second position
+print('every second value:', list_3[0::2])
+
+# Start from index 1, end at index 3 and extract from each subsequent second position
+print('every second value from index 1 to 3:', list_3[1:4:2])
+
+
+
OUTPUT
+
+
first 3 values: ['a', 'b', 'c']
+every second value: ['a', 'c', 'e', 'g']
+every second value from index 1 to 3: ['b', 'd']
+
+
Change list values
+
+
+
+
Data which can be modified in place is called mutable, while data
+which cannot be modified is called immutable. Strings and numbers are
+immutable in that when we want to change the value of a string or number
+variable, we can only replace the old value with a completely new
+value.
+
+
PYTHON
+
+
string ='abcde'
+string[0] ='b'# Produces a type error as strings are immutable
+
+# TypeError: 'str' object does not support item assignment
+
+
In contrast, lists are mutable and we can modify them after they have
+been created. We can change individual values, append new values, or
+reorder the whole list through sorting.
+
+
PYTHON
+
+
list_4 = ['apple', 'pear', 'plum']
+print('original list_4:', list_4)
+
+# Change the first value i.e. modify the list in place
+list_4[0] ='banana'
+print('modified list_4:', list_4)
+
+# Add new value to list using the method .insert(index number, value)
+list_4.insert(1, 'apple') # Index 1 refers to the second position
+print('appended list_4:', list_4)
# Sorting a list also modifies it in place
+list_5 = [2, 1, 3, 7]
+list_5.sort()
+print('list_5:', list_5)
+
+
+
OUTPUT
+
+
list_5: [1, 2, 3, 7]
+
+
However, be careful when modifying data in-place. If two variables
+refer to the same list, and you modify the list value, it will change
+for both variables!
+
+
PYTHON
+
+
# When we assign list_6 to list_5, it means both list_6 and list_5 point to the
+# same list object, not that list_6 is a copy of list_5.
+
+list_6 = list_5
+print('list_5:', list_5)
+print('list_6:', list_6)
+
+# Change the first value in list_6 from 1 to 2
+list_6[0] =2
+
+print('modified list_6:', list_6)
+print('unmodified list_5:', list_5)
+
+# Warning: list_5 and list_6 have both been modified in place!
Because of this behaviour, code which modifies data in place should
+be handled with care. You can also avoid this behaviour by expliciting
+creating a copy of the original list and modifying only the object copy.
+This is why creating a copy of the original data object can be useful in
+Python.
+
+
PYTHON
+
+
list_5 = [1, 2, 3, 7]
+list_7 = list_5.copy()
+print('list_5:', list_5)
+print('list_7:', list_7)
+
+# As list_7 is a completely new object copied from list_5, modifying list_7 does
+# not affect list_5.
+
+list_7[0] =2
+print('modified list_7:', list_7)
+print('unmodified list_5:', list_5)
There are a lot of functions and methods which can be applied to
+lists, such as len(), max(),
+index() and so forth. Mathematical operations do not work
+on lists of integers, with the exception of +.
+
Note that + concatenates two lists into a single longer
+list, rather than outputting the sum of two lists of numbers.
+
+
PYTHON
+
+
list_8 = [1, 2, 3]
+list_9 = [4, 5, 6]
+
+list_8 + list_9 # This concatenates the lists and does not sum the two lists together
+
+
+
OUTPUT
+
+
[1, 2, 3, 4, 5, 6]
+
+
In your spare time after this workshop, you can search for different
+list functions and methods and test them out yourselves.
+
Nested lists
+
+
+
+
We have previously mentioned that lists can be used to store other
+Python object types, including lists. This means that we can create
+nested lists in Python i.e. lists containing lists containing values.
+This property is useful when we have a collection of values that we want
+to access or transform as a subgroup.
+
To create a nested list, we also use [] or
+list() to contain one or more lists of values of
+interest.
+
+
PYTHON
+
+
veg_stock = [
+ ['lettuce', 'lettuce', 'tomato', 'zucchini'],
+ ['lettuce', 'lettuce', 'carrot', 'zucchini'],
+ ['lettuce', 'basil', 'tomato', 'zucchini']
+ ]
+
+# Check that veg_stock is a list object
+print(type(veg_stock))
+
+# Check that the first value in veg_stock is itself a list
+print(veg_stock[0], 'has type', type(veg_stock[0]))
+
+
+
OUTPUT
+
+
<class 'list'>
+['lettuce', 'lettuce', 'tomato', 'zucchini'] has type <class 'list'>
+
+
To extract the first sub-list within the veg_stock list
+object, we refer to its index like we would with any other value inside
+a list i.e. veg_stock[1] points to the second sub-list
+within the veg_stock list.
+
To access an individual string value inside a sub-list, we make use
+of a second index, which points to an individual value inside the
+sub-list.
+
+
PYTHON
+
+
print(veg_stock[0]) # Access the first sub-list
+print(veg_stock[0][0]) # Access the first value in the first sub-list
+
+print(type(veg_stock[0])) # The first value in veg_stock is a list
+print(type(veg_stock[0][0])) # The first value in the first list in veg_stock is a string
In general, however, when we are analysing a large collection of
+values, the best practice is to structure those values in columns and
+rows as a tabular Pandas data frame object. This is covered in another
+Carpentries Course called Python
+for Social Sciences.
+
Lists are still incredibly versatile and useful when you have a
+collection of values that need to be efficiently accessed or
+transformed. For example, data frame column names are commonly extracted
+and stored inside a list, so that the same transformation can then be
+mapped across multiple columns.
+
Create and access dictionaries
+
+
+
+
A dictionary is a Python data type that is particularly suited for
+enabling quick lookup operations on unstructured data sets.
+
A dictionary can therefore be thought of as an unordered list where
+every item or value is associated with a unique key (i.e. a self-defined
+index of unique strings or numbers). The index values are called keys
+and a dictionary contains key-value pairs with the format
+{key: value(s)}.
+
Dictionaries can be created by listing individual key-values pairs
+inside {} or using dict().
+
+
PYTHON
+
+
# A key-value pair can contain single or multiple values
+# Keys are treated as case sensitive and unique
+# Multiple values are first stored inside a list
+
+teams = {
+'data science': ['Mei Ling', 'Paul', 'Gwen', 'Suresh'],
+'user design': ['Amy', 'Linh', 'Sasha'],
+'software dev': ['David', 'Prya'],
+'comms': 'Taylor'
+ }
+
+
When using dict(), we need to indicate which key is
+associated with which value. This can be done directly using tuples,
+direct association i.e. using = or using
+zip(), which creates a set of tuples from an iterable
+list.
+
+
PYTHON
+
+
# To use dict(), key-value pairs are can be stored inside tuples
+ds_emp_status =dict([
+ ('Mei Ling', 'full time'),
+ ('Paul', 'full time'),
+ ('Gwen', 'part time'),
+ ('Suresh', 'part time')
+ ])
+
+# Key-value pairs can also be assigned by direct association
+# Keys cannot be strings i.e. wrapped in '' using this approach
+ud_emp_status =dict(
+ Amy ='full time',
+ Linh ='full time',
+ Sasha ='casual'
+ )
+
+# zip() can also be used if each key has only one value
+sd_emp_status =dict(zip(
+ ['David', 'Prya'],
+ ['full time', 'full time']
+ ))
+
+
To access a specific value inside a dictionary, we need to specify
+its key using []. This is similar to slicing or subsetting
+a list by specifying its index using [].
+
+
PYTHON
+
+
# Access the values associated with the key 'data science'
+print(teams['data science'])
+
+print('The object teams is of type', type(teams))
+print('The dict value', teams['data science'], 'is of type', type(teams['data science']))
+
+
+
OUTPUT
+
+
['Mei Ling', 'Paul', 'Gwen', 'Suresh']
+The data object teams is of type <class 'dict'>
+The value ['Mei Ling', 'Paul', 'Gwen', 'Suresh'] is of type <class 'list'>
+
+
We can also access a value from a dictionary using the
+get() method.
+
+
PYTHON
+
+
print(teams.get('user design'))
+
+# get() also enables us to return an alternate string when the key is not found
+# This prevents our code from returning an error message that halts the analysis
+
+print(teams.get('data engineering', 'WARNING: key does not exist'))
+
+
+
OUTPUT
+
+
['Amy', 'Linh', 'Sasha']
+WARNING: key does not exist
+
+
To access data inside a dictionary, we can also perform the following
+other actions:
+
+
Check whether a key exists in a dictionary using the keyword
+in
+
+
Retrieve unique dictionary keys using dict.keys()
+
+
Retrieve dictionary values using dict.values()
+
+
Retrieve dictionary items using dict.items()
+
+
+
+
PYTHON
+
+
# Check whether a key exists in a dictionary
+print('data science'in teams)
+print('Data Science'in teams) # Keys are case sensitive
+
+# Retrieve all dictionary keys
+print(teams.keys())
+print(sd_emp_status.keys())
+
+# Retrieve all dictionary values
+print(sd_emp_status.values())
+
+# Retrieve all dictionary key-value pairs
+print(sd_emp_status.items())
To add a new key-value pair to an existing dictionary, we can create
+a new key and directly attach a new value to it using = or
+alternatively use the method update().
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# Add new key-value pair using direct assignment
+sd_emp_status['Mohammad'] ='full time'
+
+# Add new key-value pair using update({'key': 'value'})
+sd_emp_status.update({'Carrie': 'part time'})
+
+print('updated dict items:', sd_emp_status.items())
Because keys are unique, a dictionary cannot contain two keys with
+the same name. This means that adding an item using a key that is
+already present in the dictionary will cause the previous value to be
+overwritten.
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# As the key 'Carrie' already exists, its value will be overwritten
+sd_emp_status['Carrie'] ='full time'
+print('updated dict items:', sd_emp_status.items())
To remove a key-value pair for an existing dictionary, we can use the
+del keyword or the method pop(). Using
+pop() also enables us to return an alternate string if we
+trt to remove a non-existing key, which prevents our code from returning
+an error message that halts the analysis.
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# Delete dictionary keys using del and pop()
+del sd_emp_status['Mohammad']
+sd_emp_status.pop('Carrie')
+sd_emp_status.pop('Anuradha', 'WARNING: key does not exist') # Does not generate an error
+
+print('modified dict items:', sd_emp_status.items())
Similar to lists, dictionaries can be nested as we can also store
+dictionaries as values inside a key-value pair using {}.
+Nested dictionaries are useful when we need to store unstructured data
+in a complex structure. For example, JSON data is commonly used for
+transmitting data in web applications and often exists in a nested
+structure that can be stored using nested dictionaries in Python.
+
+
PYTHON
+
+
# Individual dictionaries are enclosed in {} and separated by a comma
+nested_dict = {
+'dict_1': { # First key is a dictionary of key-value pairs
+'key_1a': 'value_1a',
+'key_1b': 'value_1b'
+ },
+'dict_2': { # Second key is another dictionary of key-value pairs
+'key_2a': 'value_2a',
+'key_2b': 'value_2b'
+ }
+ }
+
+print(nested_dict)
Similar to working with nested lists, to extract a value from the
+first sub-dictionary, we specify both the main dictionary and
+sub-dictionary keys using [].
+
+
PYTHON
+
+
# Extract the value for key 2a in dict_2
+print('original value:', nested_dict['dict_2']['key_2a'])
+
+# Adding or updating a value can be done through the same approach
+nested_dict['dict_2']['key_2a'] ="modified_value_2a"
+
+print('modified value:', nested_dict['dict_2']['key_2a'])
+
+
+
OUTPUT
+
+
original value: value_2a
+modified value: modified_value_2a
+
+
Optional: converting lists and dictionaries to Pandas data
+frames
+
+
+
+
Lists and dictionaries can be easily converted into a tabular Pandas
+data frame format. This can be useful when you need to create a small
+data set for unit testing purposes.
+
+
PYTHON
+
+
# Import pandas library
+import pandas as pd
+
+# Create a dictionary with each key-value pair representing a data frame column
+data = {
+'col_1': [3, 2, 1, 0],
+'col_2': ['a', 'b', 'c', 'd']
+ }
+
+df = pd.DataFrame.from_dict(data)
+
+print(df) # Outputs data as a tabular Pandas data frame
+print(type(df))
+
+
+
OUTPUT
+
+
col_1 col_2
+0 3 a
+1 2 b
+2 1 c
+3 0 d
+<class 'pandas.core.frame.DataFrame'>
+
+
+
+
+
+
+
Key Points
+
+
+
+
Lists can contain any Python object including other lists
+
Lists are ordered i.e. indexed and can therefore be sliced by index
+number
+
Unlike strings and integers, the values inside a list can be
+modified in place
+
A list which contains other lists is referred to as a nested
+list
+
Dictionaries behave like unordered lists and are defined using
+key-value pairs
+
Dictionary keys are unique
+
A dictionary which contains other dictionaries is referred to as a
+nested dictionary
+
Values inside nested lists and dictionaries can be accessed by an
+additional index
In the episode about visualizing
+data, we will see Python code that plots values of interest from our
+first inflammation dataset (inflammation-01.csv), which
+revealed some suspicious features.
+
We have a dozen data sets right now and potentially more on the way
+if Dr. Maverick can keep up their surprisingly fast clinical trial rate.
+We want to create plots for all of our data sets with a single
+statement. To do that, we’ll have to teach the computer how to repeat
+things.
+
An example task that we might want to repeat is accessing numbers in
+a list, which we will do by printing each number on a line of its
+own.
+
+
PYTHON
+
+
odds = [1, 3, 5, 7]
+
+
In Python, a list is basically an ordered
+collection of elements, and every element has a unique number associated
+with it — its index. This means that we can access elements in a list
+using their indices. For example, we can get the first number in the
+list odds, by using odds[0]. One way to print
+each number is to use four print statements:
Not scalable. Imagine you need to print a list
+that has hundreds of elements. It might be easier to type them in
+manually.
+
Difficult to maintain. If we want to decorate
+each printed element with an asterisk or any other character, we would
+have to change four lines of code. While this might not be a problem for
+small lists, it would definitely be a problem for longer ones.
+
Fragile. If we use it with a list that has more
+elements than what we initially envisioned, it will only display part of
+the list’s elements. A shorter list, on the other hand, will cause an
+error because it will be trying to display elements of the list that do
+not exist.
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-3-7974b6cdaf14> in <module>()
+ 3 print(odds[1])
+ 4 print(odds[2])
+----> 5 print(odds[3])
+
+IndexError: list index out of range
This is shorter — certainly shorter than something that prints every
+number in a hundred-number list — and more robust as well:
+
+
PYTHON
+
+
odds = [1, 3, 5, 7, 9, 11]
+for num in odds:
+print(num)
+
+
+
OUTPUT
+
+
1
+3
+5
+7
+9
+11
+
+
The improved version uses a for
+loop to repeat an operation — in this case, printing — once for each
+thing in a sequence. The general form of a loop is:
+
+
PYTHON
+
+
for variable in collection:
+# do things using variable, such as print
+
+
Using the odds example above, the loop might look like this:
+
where each number (num) in the variable
+odds is looped through and printed one number after
+another. The other numbers in the diagram denote which loop cycle the
+number was printed in (1 being the first loop cycle, and 6 being the
+final loop cycle).
+
We can call the loop
+variable anything we like, but there must be a colon at the end of
+the line starting the loop, and we must indent anything we want to run
+inside the loop. Unlike many other languages, there is no command to
+signify the end of the loop body (e.g., end for);
+everything indented after the for statement belongs to the
+loop.
+
+
+
+
+
+
What’s in a name?
+
+
+
In the example above, the loop variable was given the name
+num as a mnemonic; it is short for ‘number’. We can choose
+any name we want for variables. We might just as easily have chosen the
+name banana for the loop variable, as long as we use the
+same name when we invoke the variable inside the loop:
It is a good idea to choose variable names that are meaningful,
+otherwise it would be more difficult to understand what the loop is
+doing.
+
+
+
+
Here’s another loop that repeatedly updates a variable:
+
+
PYTHON
+
+
length =0
+names = ['Curie', 'Darwin', 'Turing']
+for value in names:
+ length = length +1
+print('There are', length, 'names in the list.')
+
+
+
OUTPUT
+
+
There are 3 names in the list.
+
+
It’s worth tracing the execution of this little program step by step.
+Since there are three names in names, the statement on line
+4 will be executed three times. The first time around,
+length is zero (the value assigned to it on line 1) and
+value is Curie. The statement adds 1 to the
+old value of length, producing 1, and updates
+length to refer to that new value. The next time around,
+value is Darwin and length is 1,
+so length is updated to be 2. After one more update,
+length is 3; since there is nothing left in
+names for Python to process, the loop finishes and the
+print function on line 5 tells us our final answer.
+
Note that a loop variable
+is a variable that is being used to record progress in a loop. It still
+exists after the loop is over, and we can re-use variables previously
+defined as loop variables as
+well:
+
+
PYTHON
+
+
name ='Rosalind'
+for name in ['Curie', 'Darwin', 'Turing']:
+print(name)
+print('after the loop, name is', name)
+
+
+
OUTPUT
+
+
Curie
+Darwin
+Turing
+after the loop, name is Turing
+
+
Note also that finding the length of an object is such a common
+operation that Python actually has a built-in function to do it called
+len:
+
+
PYTHON
+
+
print(len([0, 1, 2, 3]))
+
+
+
OUTPUT
+
+
4
+
+
len is much faster than any function we could write
+ourselves, and much easier to read than a two-line loop; it will also
+give us the length of many other data types we haven’t seen yet, so we
+should always use it when we can.
+
+
+
+
+
+
From 1 to N
+
+
+
Python has a built-in function called range that
+generates a sequence of numbers range can accept 1, 2, or 3
+parameters.
+
+
If one parameter is given, range generates a sequence
+of that length, starting at zero and incrementing by 1. For example,
+range(3) produces the numbers 0, 1, 2.
+
If two parameters are given, range starts at the first
+and ends just before the second, incrementing by one. For example,
+range(2, 5) produces 2, 3, 4.
+
If range is given 3 parameters, it starts at the first
+one, ends just before the second one, and increments by the third one.
+For example, range(3, 10, 2) produces
+3, 5, 7, 9.
+
+
Using range, write a loop that uses range
+to print the first 3 natural numbers:
+
+
OUTPUT
+
+
1
+2
+3
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
for number inrange(1, 4):
+print(number)
+
+
+
+
+
+
+
+
+
+
+
Understanding the loops
+
+
+
Given the following loop:
+
+
PYTHON
+
+
word ='oxygen'
+for letter in word:
+print(letter)
+
+
How many times is the body of the loop executed?
+
+
3 times
+
4 times
+
5 times
+
6 times
+
+
+
+
+
+
+
+
+
+
The body of the loop is executed 6 times.
+
+
+
+
+
+
+
+
+
+
Computing Powers With Loops
+
+
+
Exponentiation is built into Python:
+
+
PYTHON
+
+
print(5**3)
+
+
+
OUTPUT
+
+
125
+
+
Write a loop that calculates the same result as 5 ** 3
+using multiplication (and without exponentiation).
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
result =1
+for number inrange(0, 3):
+ result = result *5
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Summing a List
+
+
+
Write a loop that calculates the sum of elements in a list by adding
+each element and printing the final value, so
+[124, 402, 36] prints 562
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
numbers = [124, 402, 36]
+summed =0
+for num in numbers:
+ summed = summed + num
+print(summed)
+
+
+
+
+
+
+
+
+
+
+
Computing the Value of a Polynomial
+
+
+
The built-in function enumerate takes a sequence (e.g.,
+a list) and generates a new sequence of the
+same length. Each element of the new sequence is a pair composed of the
+index (0, 1, 2,…) and the value from the original sequence:
+
+
PYTHON
+
+
for idx, val inenumerate(a_list):
+# Do something using idx and val
+
+
The code above loops through a_list, assigning the index
+to idx and the value to val.
+
Suppose you have encoded a polynomial as a list of coefficients in
+the following way: the first element is the constant term, the second
+element is the coefficient of the linear term, the third is the
+coefficient of the quadratic term, etc.
Write a loop using enumerate(coefs) which computes the
+value y of any polynomial, given x and
+coefs.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
y =0
+for idx, coef inenumerate(coefs):
+ y = y + coef * x**idx
+
+
+
+
+
+
Making Choices with Conditional Logic
+
+
+
+
How can we use Python to automatically recognize different situations
+we encounter with our data and take a different action for each? In this
+lesson, we’ll learn how to write code that runs only when certain
+conditions are true.
+
+
Conditionals
+
+
We can ask Python to take different actions, depending on a
+condition, with an if statement:
+
+
PYTHON
+
+
num =37
+if num >100:
+print('greater')
+else:
+print('not greater')
+print('done')
+
+
+
OUTPUT
+
+
not greater
+done
+
+
The second line of this code uses the keyword if to tell
+Python that we want to make a choice. If the test that follows the
+if statement is true, the body of the if
+(i.e., the set of lines indented underneath it) is executed, and
+“greater” is printed. If the test is false, the body of the
+else is executed instead, and “not greater” is printed.
+Only one or the other is ever executed before continuing on with program
+execution to print “done”:
+
Conditional
+statements don’t have to include an else. If there
+isn’t one, Python simply does nothing if the test is false:
+
+
PYTHON
+
+
num =53
+print('before conditional...')
+if num >100:
+print(num, 'is greater than 100')
+print('...after conditional')
+
+
+
OUTPUT
+
+
before conditional...
+...after conditional
+
+
We can also chain several tests together using elif,
+which is short for “else if”. The following Python code uses
+elif to print the sign of a number.
+
+
PYTHON
+
+
num =-3
+
+if num >0:
+print(num, 'is positive')
+elif num ==0:
+print(num, 'is zero')
+else:
+print(num, 'is negative')
+
+
+
OUTPUT
+
+
-3 is negative
+
+
Note that to test for equality we use a double equals sign
+== rather than a single equals sign = which is
+used to assign values.
+
+
+
+
+
+
Comparing in Python
+
+
+
Along with the > and == operators we
+have already used for comparing values in our conditionals, there are a
+few more options to know about:
+
+
+>: greater than
+
+<: less than
+
+==: equal to
+
+!=: does not equal
+
+>=: greater than or equal to
+
+<=: less than or equal to
+
+
+
+
+
We can also combine tests using and and or.
+and is only true if both parts are true:
+
+
PYTHON
+
+
if (1>0) and (-1>=0):
+print('both parts are true')
+else:
+print('at least one part is false')
+
+
+
OUTPUT
+
+
at least one part is false
+
+
while or is true if at least one part is true:
+
+
PYTHON
+
+
if (1<0) or (1>=0):
+print('at least one test is true')
+
+
+
OUTPUT
+
+
at least one test is true
+
+
+
+
+
+
+
+True and False
+
+
+
True and False are special words in Python
+called booleans, which represent truth values. A statement
+such as 1 < 0 returns the value False,
+while -1 < 0 returns the value True.
+
+
+
+
+
+
Checking Our Data
+
+
Now that we’ve seen how conditionals work, we can use them to check
+for the suspicious features we saw in our inflammation data. We are
+about to use functions provided by the numpy module again.
+Therefore, if you’re working in a new Python session, make sure to load
+the module with:
+
+
PYTHON
+
+
import numpy
+
+
From the first couple of plots, we saw that maximum daily
+inflammation exhibits a strange behavior and raises one unit a day.
+Wouldn’t it be a good idea to detect such behavior and report it as
+suspicious? Let’s do that! However, instead of checking every single day
+of the study, let’s merely check if maximum inflammation in the
+beginning (day 0) and in the middle (day 20) of the study are equal to
+the corresponding day numbers.
We also saw a different problem in the third dataset; the minima per
+day were all zero (looks like a healthy person snuck into our study). We
+can also check for this with an elif condition:
+
+
PYTHON
+
+
elif numpy.sum(numpy.amin(data, axis=0)) ==0:
+print('Minima add up to zero!')
+
+
And if neither of these conditions are true, we can use
+else to give the all-clear:
In this way, we have asked Python to do something different depending
+on the condition of our data. Here we printed messages in all cases, but
+we could also imagine not using the else catch-all so that
+messages are only printed when something is wrong, freeing us from
+having to manually examine every plot for features we’ve seen
+before.
Which of the following would be printed if you were to run this code?
+Why did you pick this answer?
+
+
A
+
B
+
C
+
B and C
+
+
+
+
+
+
+
+
+
+
C gets printed because the first two conditions,
+4 > 5 and 4 == 5, are not true, but
+4 < 5 is true. In this case, only one of these
+conditions can be true for at a time, but in other scenarios multiple
+elif conditions could be met. In these scenarios, only the
+action associated with the first true elif condition will
+occur, starting from the top of the conditional section.
+
This contrasts with the case of multiple if statements,
+where every action can occur as long as their condition is met.
+
+
+
+
+
+
+
+
+
+
+
What Is Truth?
+
+
+
True and False booleans are not the only
+values in Python that are true and false. In fact, any value
+can be used in an if or elif. After reading
+and running the code below, explain what the rule is for which values
+are considered true and which are > considered false.
+
+
PYTHON
+
+
if'':
+print('empty string is true')
+if'word':
+print('word is true')
+if []:
+print('empty list is true')
+if [1, 2, 3]:
+print('non-empty list is true')
+if0:
+print('zero is true')
+if1:
+print('one is true')
+
+
+
+
+
+
+
+
+
+
That’s Not Not What I Meant
+
+
+
Sometimes it is useful to check whether some condition is
+not true. The Boolean operator not can do this
+explicitly. After reading and running the code below, write some
+if statements that use not to test the rule
+that you formulated in the previous challenge.
+
+
PYTHON
+
+
ifnot'':
+print('empty string is not true')
+ifnot'word':
+print('word is not true')
+ifnotnotTrue:
+print('not not True is true')
+
+
+
+
+
+
+
+
+
+
Close Enough
+
+
+
Write some conditions that print True if the variable
+a is within 10% of the variable b and
+False otherwise. Compare your implementation with your
+partner’s. Do you get the same answer for all possible pairs of
+numbers?
a =5
+b =5.1
+
+ifabs(a - b) <=0.1*abs(b):
+print('True')
+else:
+print('False')
+
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
print(abs(a - b) <=0.1*abs(b))
+
+
This works because the Booleans True and
+False have string representations which can be printed.
+
+
+
+
+
+
+
+
+
+
In-Place Operators
+
+
+
Python (and most other languages in the C family) provides in-place operators that
+work like this:
+
+
PYTHON
+
+
x =1# original value
+x +=1# add one to x, assigning result back to x
+x *=3# multiply x by 3
+print(x)
+
+
+
OUTPUT
+
+
6
+
+
Write some code that sums the positive and negative numbers in a list
+separately, using in-place operators. Do you think the result is more or
+less readable than writing the same without in-place operators?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
positive_sum =0
+negative_sum =0
+test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
+for num in test_list:
+if num >0:
+ positive_sum += num
+elif num ==0:
+pass
+else:
+ negative_sum += num
+print(positive_sum, negative_sum)
+
+
Here pass means “don’t do anything”. In this particular
+case, it’s not actually needed, since if num == 0 neither
+sum needs to change, but it illustrates the use of elif and
+pass.
+
+
+
+
+
+
+
+
+
+
Sorting a List Into Buckets
+
+
+
In our data folder, large data sets are stored in files
+whose names start with “inflammation-” and small data sets – in files
+whose names start with “small-”. We also have some other files that we
+do not care about at this point. We’d like to break all these files into
+three lists called large_files, small_files,
+and other_files, respectively.
+
Add code to the template below to do this. Note that the string
+method startswith
+returns True if and only if the string it is called on
+starts with the string passed as an argument, that is:
+
+
PYTHON
+
+
'String'.startswith('Str')
+
+
+
OUTPUT
+
+
True
+
+
But
+
+
PYTHON
+
+
'String'.startswith('str')
+
+
+
OUTPUT
+
+
False
+
+
Use the following Python code as your starting point:
Write a loop that counts the number of vowels in a character
+string.
+
Test it on a few individual words and full sentences.
+
Once you are done, compare your solution to your neighbor’s. Did you
+make the same decisions about how to handle the letter ‘y’ (which some
+people think is a vowel, and some do not)?
+
+
+
Solution
+
+
vowels = 'aeiouAEIOU'
+sentence = 'Mary had a little lamb.'
+count = 0
+for char in sentence:
+ if char in vowels:
+ count += 1
+
+print('The number of vowels in this string is ' + str(count))
+
{.challenge}
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
+
Use for variable in sequence to process the elements of
+a sequence one at a time.
+
The body of a for loop must be indented.
+
Use len(thing) to determine the length of something
+that contains other values.
+
Use if condition to start a conditional statement,
+elif condition to provide additional tests, and
+else to provide a default.
+
The bodies of the branches of conditional statements must be
+indented.
+
Use == to test for equality.
+
+X and Y is only true if both X and
+Y are true.
+
+X or Y is true if either X or
+Y, or both, are true.
+
Zero, the empty string, and the empty list are considered false; all
+other numbers, strings, and lists are considered true.
What are functions, and how can I use them in Python?
+
How can I define new functions?
+
What’s the difference between defining and calling a function?
+
What happens when I call a function?
+
+
+
+
+
+
+
+
Objectives
+
+
identify what a function is
+
create new functions
+
Set default values for function parameters.
+
Explain why we should divide programs into small, single-purpose
+functions.
+
+
+
+
+
+
+
At this point, we’ve seen that code can have Python make decisions
+about what it sees in our data. What if we want to convert some of our
+data, like taking a temperature in Fahrenheit and converting it to
+Celsius. We could write something like this for converting a single
+number
But we would be in trouble as soon as we had to do this more than a
+couple times. Cutting and pasting it is going to make our code get very
+long and very repetitive, very quickly. We’d like a way to package our
+code so that it is easier to reuse, a shorthand way of re-executing
+longer pieces of code. In Python we can use ‘functions’. Let’s start by
+defining a function fahr_to_celsius that converts
+temperatures from Fahrenheit to Celsius:
+
+
PYTHON
+
+
def explicit_fahr_to_celsius(temp):
+# Assign the converted value to a variable
+ converted = ((temp -32) * (5/9))
+# Return the value of the new variable
+return converted
+
+def fahr_to_celsius(temp):
+# Return converted value more efficiently using the return
+# function without creating a new variable. This code does
+# the same thing as the previous function but it is more explicit
+# in explaining how the return command works.
+return ((temp -32) * (5/9))
+
+
The function definition opens with the keyword def
+followed by the name of the function (fahr_to_celsius) and
+a parenthesized list of parameter names (temp). The body of the function — the statements
+that are executed when it runs — is indented below the definition line.
+The body concludes with a return keyword followed by the
+return value.
+
When we call the function, the values we pass to it are assigned to
+those variables so that we can use them inside the function. Inside the
+function, we use a return
+statement to send a result back to whoever asked for it.
+
Let’s try running our function.
+
+
PYTHON
+
+
fahr_to_celsius(32)
+
+
This command should call our function, using “32” as the input and
+return the function value.
+
In fact, calling our own function is no different from calling any
+other function:
+
+
PYTHON
+
+
print('freezing point of water:', fahr_to_celsius(32), 'C')
+print('boiling point of water:', fahr_to_celsius(212), 'C')
+
+
+
OUTPUT
+
+
freezing point of water: 0.0 C
+boiling point of water: 100.0 C
+
+
We’ve successfully called the function that we defined, and we have
+access to the value that we returned.
+
Composing Functions
+
+
+
+
Now that we’ve seen how to turn Fahrenheit into Celsius, we can also
+write the function to turn Celsius into Kelvin:
+
+
PYTHON
+
+
def celsius_to_kelvin(temp_c):
+return temp_c +273.15
+
+print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))
+
+
+
OUTPUT
+
+
freezing point of water in Kelvin: 273.15
+
+
What about converting Fahrenheit to Kelvin? We could write out the
+formula, but we don’t need to. Instead, we can compose the two functions we have
+already created:
+
+
PYTHON
+
+
def fahr_to_kelvin(temp_f):
+ temp_c = fahr_to_celsius(temp_f)
+ temp_k = celsius_to_kelvin(temp_c)
+return temp_k
+
+print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))
+
+
+
OUTPUT
+
+
boiling point of water in Kelvin: 373.15
+
+
This is our first taste of how larger programs are built: we define
+basic operations, then combine them in ever-larger chunks to get the
+effect we want. Real-life functions will usually be larger than the ones
+shown here — typically half a dozen to a few dozen lines — but they
+shouldn’t ever be much longer than that, or the next person who reads it
+won’t be able to understand what’s going on.
+
Variable Scope
+
+
+
+
In composing our temperature conversion functions, we created
+variables inside of those functions, temp,
+temp_c, temp_f, and temp_k. We
+refer to these variables as local variables because they no
+longer exist once the function is done executing. If we try to access
+their values outside of the function, we will encounter an error:
+
+
PYTHON
+
+
print('Again, temperature in Kelvin was:', temp_k)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-1-eed2471d229b> in <module>
+----> 1 print('Again, temperature in Kelvin was:', temp_k)
+
+NameError: name 'temp_k' is not defined
+
+
If you want to reuse the temperature in Kelvin after you have
+calculated it with fahr_to_kelvin, you can store the result
+of the function call in a variable:
+
+
PYTHON
+
+
temp_kelvin = fahr_to_kelvin(212.0)
+print('temperature in Kelvin was:', temp_kelvin)
+
+
+
OUTPUT
+
+
temperature in Kelvin was: 373.15
+
+
The variable temp_kelvin, being defined outside any
+function, is said to be global.
+
Inside a function, one can read the value of such global
+variables:
+
+
PYTHON
+
+
def print_temperatures():
+print('temperature in Fahrenheit was:', temp_fahr)
+print('temperature in Kelvin was:', temp_kelvin)
+
+temp_fahr =212.0
+temp_kelvin = fahr_to_kelvin(temp_fahr)
+
+print_temperatures()
+
+
+
OUTPUT
+
+
temperature in Fahrenheit was: 212.0
+temperature in Kelvin was: 373.15
+
+
By giving our functions human-readable names, we can more easily read
+and understand what is happening in the for loop. Even
+better, if at some later date we want to use either of those pieces of
+code again, we can do so in a single line.
+
Testing and Documenting
+
+
+
+
Once we start putting things in functions so that we can re-use them,
+we need to start testing that those functions are working correctly. To
+see how to do this, let’s write a function to offset a dataset so that
+it’s mean value shifts to a user-defined value:
We could test this on our actual data, but since we don’t know what
+the values ought to be, it will be hard to tell if the result was
+correct. Instead, let’s use NumPy to create a matrix of 0’s and then
+offset its values to have a mean value of 3:
+
+
PYTHON
+
+
z = numpy.zeros((2,2))
+print(offset_mean(z, 3))
+
+
+
OUTPUT
+
+
[[ 3. 3.]
+ [ 3. 3.]]
+
+
That looks right, so let’s try offset_mean on our real
+data:
+
+
PYTHON
+
+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print(offset_mean(data, 0))
It’s hard to tell from the default output whether the result is
+correct, but there are a few tests that we can run to reassure us:
+
+
PYTHON
+
+
print('original min, mean, and max are:', numpy.amin(data), numpy.mean(data), numpy.amax(data))
+offset_data = offset_mean(data, 0)
+print('min, mean, and max of offset data are:',
+ numpy.amin(offset_data),
+ numpy.mean(offset_data),
+ numpy.amax(offset_data))
+
+
+
OUTPUT
+
+
original min, mean, and max are: 0.0 6.14875 20.0
+min, mean, and and max of offset data are: -6.14875 2.84217094304e-16 13.85125
+
+
That seems almost right: the original mean was about 6.1, so the
+lower bound from zero is now about -6.1. The mean of the offset data
+isn’t quite zero — we’ll explore why not in the challenges — but it’s
+pretty close. We can even go further and check that the standard
+deviation hasn’t changed:
+
+
PYTHON
+
+
print('std dev before and after:', numpy.std(data), numpy.std(offset_data))
+
+
+
OUTPUT
+
+
std dev before and after: 4.61383319712 4.61383319712
+
+
Those values look the same, but we probably wouldn’t notice if they
+were different in the sixth decimal place. Let’s do this instead:
+
+
PYTHON
+
+
print('difference in standard deviations before and after:',
+ numpy.std(data) - numpy.std(offset_data))
+
+
+
OUTPUT
+
+
difference in standard deviations before and after: -3.5527136788e-15
+
+
Again, the difference is very small. It’s still possible that our
+function is wrong, but it seems unlikely enough that we should probably
+get back to doing our analysis.
+
Documentation
+
+
+
+
We have one more task first, though: we should write some documentation for our function
+to remind ourselves later what it’s for and how to use it.
+
The usual way to put documentation in software is to add comments like this:
+
+
PYTHON
+
+
# offset_mean(data, target_mean_value):
+# return a new array containing the original data with its mean offset to match the desired value.
+def offset_mean(data, target_mean_value):
+return (data - numpy.mean(data)) + target_mean_value
+
+
There’s a better way, though. If the first thing in a function is a
+string that isn’t assigned to a variable, that string is attached to the
+function as its documentation:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value."""
+return (data - numpy.mean(data)) + target_mean_value
+
+
This is better because we can now ask Python’s built-in help system
+to show us the documentation for the function:
+
+
PYTHON
+
+
help(offset_mean)
+
+
+
OUTPUT
+
+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+ Return a new array containing the original data with its mean offset to match the desired value.
+
+
A string like this is called a docstring. We don’t need to use
+triple quotes when we write one, but if we do, we can break the string
+across multiple lines:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value.
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3], 0)
+ array([-1., 0., 1.])
+ """
+return (data - numpy.mean(data)) + target_mean_value
+
+help(offset_mean)
+
+
+
OUTPUT
+
+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+ Return a new array containing the original data
+ with its mean offset to match the desired value.
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3], 0)
+ array([-1., 0., 1.])
+
+
Defining Defaults
+
+
+
+
We have passed parameters to functions in two ways: directly, as in
+type(data), and by name, as in
+numpy.loadtxt(fname='something.csv', delimiter=','). In
+fact, we can pass the filename to loadtxt without the
+fname=:
Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1041, in loa
+dtxt
+ dtype = np.dtype(dtype)
+ File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/core/_internal.py", line 199, in
+_commastring
+ newitem = (dtype, eval(repeats))
+ File "<string>", line 1
+ ,
+ ^
+SyntaxError: unexpected EOF while parsing
+
+
To understand what’s going on, and make our own functions easier to
+use, let’s re-define our offset_mean function like
+this:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value=0.0):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value, (0 by default).
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3])
+ array([-1., 0., 1.])
+ """
+return (data - numpy.mean(data)) + target_mean_value
+
+
The key change is that the second parameter is now written
+target_mean_value=0.0 instead of just
+target_mean_value. If we call the function with two
+arguments, it works as it did before:
But we can also now call it with just one parameter, in which case
+target_mean_value is automatically assigned the default value of 0.0:
+
+
PYTHON
+
+
more_data =5+ numpy.zeros((2, 2))
+print('data before mean offset:')
+print(more_data)
+print('offset data:')
+print(offset_mean(more_data))
+
+
+
OUTPUT
+
+
data before mean offset:
+[[ 5. 5.]
+ [ 5. 5.]]
+offset data:
+[[ 0. 0.]
+ [ 0. 0.]]
+
+
This is handy: if we usually want a function to work one way, but
+occasionally need it to do something else, we can allow people to pass a
+parameter when they need to but provide a default to make the normal
+case easier. The example below shows how Python matches values to
+parameters:
As this example shows, parameters are matched up from left to right,
+and any that haven’t been given a value explicitly get their default
+value. We can override this behavior by naming the value as we pass it
+in:
+
+
PYTHON
+
+
print('only setting the value of c')
+display(c=77)
+
+
+
OUTPUT
+
+
only setting the value of c
+a: 1 b: 2 c: 77
+
+
With that in hand, let’s look at the help for
+numpy.loadtxt:
+
+
PYTHON
+
+
help(numpy.loadtxt)
+
+
+
OUTPUT
+
+
Help on function loadtxt in module numpy.lib.npyio:
+
+loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+ Load data from a text file.
+
+ Each row in the text file must have the same number of values.
+
+ Parameters
+ ----------
+...
+
+
There’s a lot of information here, but the most important part is the
+first couple of lines:
This tells us that loadtxt has one parameter called
+fname that doesn’t have a default value, and eight others
+that do. If we call the function like this:
+
+
PYTHON
+
+
numpy.loadtxt('inflammation-01.csv', ',')
+
+
then the filename is assigned to fname (which is what we
+want), but the delimiter string ',' is assigned to
+dtype rather than delimiter, because
+dtype is the second parameter in the list. However
+',' isn’t a known dtype so our code produced
+an error message when we tried to run it. When we call
+loadtxt we don’t have to provide fname= for
+the filename because it’s the first item in the list, but if we want the
+',' to be assigned to the variable delimiter,
+we do have to provide delimiter= for the second
+parameter since delimiter is not the second parameter in
+the list.
+
Readable functions
+
+
+
+
Consider these two functions:
+
+
PYTHON
+
+
def s(p):
+ a =0
+for v in p:
+ a += v
+ m = a /len(p)
+ d =0
+for v in p:
+ d += (v - m) * (v - m)
+return numpy.sqrt(d / (len(p) -1))
+
+def std_dev(sample):
+ sample_sum =0
+for value in sample:
+ sample_sum += value
+
+ sample_mean = sample_sum /len(sample)
+
+ sum_squared_devs =0
+for value in sample:
+ sum_squared_devs += (value - sample_mean) * (value - sample_mean)
+
+return numpy.sqrt(sum_squared_devs / (len(sample) -1))
+
+
The functions s and std_dev are
+computationally equivalent (they both calculate the sample standard
+deviation), but to a human reader, they look very different. You
+probably found std_dev much easier to read and understand
+than s.
+
As this example illustrates, both documentation and a programmer’s
+coding style combine to determine how easy it is for others to
+read and understand the programmer’s code. Choosing meaningful variable
+names and using blank spaces to break the code into logical “chunks” are
+helpful techniques for producing readable code. This is useful
+not only for sharing code with others, but also for the original
+programmer. If you need to revisit code that you wrote months ago and
+haven’t thought about since then, you will appreciate the value of
+readable code!
+
+
+
+
+
+
Combining Strings
+
+
+
“Adding” two strings produces their concatenation:
+'a' + 'b' is 'ab'. Write a function called
+fence that takes two parameters called
+original and wrapper and returns a new string
+that has the wrapper character at the beginning and end of the original.
+A call to your function should look like this:
+
+
PYTHON
+
+
print(fence('name', '*'))
+
+
+
OUTPUT
+
+
*name*
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def fence(original, wrapper):
+return wrapper + original + wrapper
+
+
+
+
+
+
+
+
+
+
+
Return versus print
+
+
+
Note that return and print are not
+interchangeable. print is a Python function that
+prints data to the screen. It enables us, users, see
+the data. return statement, on the other hand, makes data
+visible to the program. Let’s have a look at the following function:
+
+
PYTHON
+
+
def add(a, b):
+print(a + b)
+
+
Question: What will we see if we execute the
+following commands?
+
+
PYTHON
+
+
A = add(7, 3)
+print(A)
+
+
+
+
+
+
+
+
+
+
Python will first execute the function add with
+a = 7 and b = 3, and, therefore, print
+10. However, because function add does not
+have a line that starts with return (no return
+“statement”), it will, by default, return nothing which, in Python
+world, is called None. Therefore, A will be
+assigned to None and the last line (print(A))
+will print None. As a result, we will see:
+
+
OUTPUT
+
+
10
+None
+
+
+
+
+
+
+
+
+
+
+
Selecting Characters From Strings
+
+
+
If the variable s refers to a string, then
+s[0] is the string’s first character and s[-1]
+is its last. Write a function called outer that returns a
+string made up of just the first and last characters of its input. A
+call to your function should look like this:
Write a function rescale that takes an array as input
+and returns a corresponding array of values scaled to lie in the range
+0.0 to 1.0. (Hint: If L and H are the lowest
+and highest values in the original array, then the replacement for a
+value v should be (v-L) / (H-L).)
Run the commands help(numpy.arange) and
+help(numpy.linspace) to see how to use these functions to
+generate regularly-spaced values, then use those values to test your
+rescale function. Once you’ve successfully tested your
+function, add a docstring that explains what it does.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
"""Takes an array as input, and returns a corresponding array scaled so
+that 0 corresponds to the minimum and 1 to the maximum value of the input array.
+
+Examples:
+>>> rescale(numpy.arange(10.0))
+array([ 0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
+ 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ])
+>>> rescale(numpy.linspace(0, 100, 5))
+array([ 0. , 0.25, 0.5 , 0.75, 1. ])
+"""
+
+
+
+
+
+
+
+
+
+
+
Defining Defaults
+
+
+
Rewrite the rescale function so that it scales data to
+lie between 0.0 and 1.0 by default, but will
+allow the caller to specify lower and upper bounds if they want. Compare
+your implementation to your neighbor’s: do the two functions always
+behave the same way?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def rescale(input_array, low_val=0.0, high_val=1.0):
+"""rescales input array values to lie between low_val and high_val"""
+ L = numpy.amin(input_array)
+ H = numpy.amax(input_array)
+ intermed_array = (input_array - L) / (H - L)
+ output_array = intermed_array * (high_val - low_val) + low_val
+return output_array
+
+
+
+
+
+
+
+
+
+
+
Variables Inside and Outside Functions
+
+
+
What does the following piece of code display when run — and why?
+
+
PYTHON
+
+
f =0
+k =0
+
+def f2k(f):
+ k = ((f -32) * (5.0/9.0)) +273.15
+return k
+
+print(f2k(8))
+print(f2k(41))
+print(f2k(32))
+
+print(k)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
259.81666666666666
+278.15
+273.15
+0
+
+
k is 0 because the k inside the function
+f2k doesn’t know about the k defined outside
+the function. When the f2k function is called, it creates a
+local variable
+k. The function does not return any values and does not
+alter k outside of its local copy. Therefore the original
+value of k remains unchanged. Beware that a local
+k is created because f2k internal statements
+affect a new value to it. If k was only
+read, it would simply retrieve the global k
+value.
+
+
+
+
+
+
+
+
+
+
Mixing Default and Non-Default Parameters
+
+
+
Given the following code:
+
+
PYTHON
+
+
def numbers(one, two=2, three, four=4):
+ n =str(one) +str(two) +str(three) +str(four)
+return n
+
+print(numbers(1, three=3))
+
+
what do you expect will be printed? What is actually printed? What
+rule do you think Python is following?
+
+
1234
+
one2three4
+
1239
+
SyntaxError
+
+
Given that, what does the following piece of code display when
+run?
+
+
PYTHON
+
+
def func(a, b=3, c=6):
+print('a: ', a, 'b: ', b, 'c:', c)
+
+func(-1, 2)
+
+
+
a: b: 3 c: 6
+
a: -1 b: 3 c: 6
+
a: -1 b: 2 c: 6
+
a: b: -1 c: 2
+
+
+
+
+
+
+
+
+
+
Attempting to define the numbers function results in
+4. SyntaxError. The defined parameters two and
+four are given default values. Because one and
+three are not given default values, they are required to be
+included as arguments when the function is called and must be placed
+before any parameters that have default values in the function
+definition.
+
The given call to func displays
+a: -1 b: 2 c: 6. -1 is assigned to the first parameter
+a, 2 is assigned to the next parameter b, and
+c is not passed a value, so it uses its default value
+6.
+
+
+
+
+
+
+
+
+
+
Readable Code
+
+
+
Revise a function you wrote for one of the previous exercises to try
+to make the code more readable. Then, collaborate with one of your
+neighbors to critique each other’s functions and discuss how your
+function implementations could be further improved to make them more
+readable.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
+
Define a function using
+def function_name(parameter).
+
The body of a function must be indented.
+
Call a function using function_name(value).
+
Numbers are stored as integers or floating-point numbers.
+
Variables defined within a function can only be seen and used within
+the body of the function.
+
Variables created outside of any function are called global
+variables.
+
Within a function, we can access global variables.
+
Variables created within a function override global variables if
+their names match.
+
Use help(thing) to view help for something.
+
Put docstrings in functions to provide help for that function.
+
Specify default values for parameters when defining a function using
+name=value in the parameter list.
+
Parameters can be passed by matching based on name, by position, or
+by omitting them (in which case the default value is used).
+
Put code whose parameters change frequently in a function, then call
+it with different parameter values to customize its behavior.
identify different errors and correct bugs associated with them
+
+
+
+
+
+
+
Every programmer encounters errors, both those who are just
+beginning, and those who have been programming for years. Encountering
+errors and exceptions can be very frustrating at times, and can make
+coding feel like a hopeless endeavour. However, understanding what the
+different types of errors are and when you are likely to encounter them
+can help a lot. Once you know why you get certain types of
+errors, they become much easier to fix.
+
Errors in Python have a very specific form, called a traceback. Let’s examine one:
+
+
PYTHON
+
+
# This code has an intentional error. You can type it directly or
+# use it for reference to understand the error message below.
+def favorite_ice_cream():
+ ice_creams = [
+'chocolate',
+'vanilla',
+'strawberry'
+ ]
+print(ice_creams[3])
+
+favorite_ice_cream()
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-1-70bd89baa4df> in <module>()
+ 9 print(ice_creams[3])
+ 10
+----> 11 favorite_ice_cream()
+
+<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
+ 7 'strawberry'
+ 8 ]
+----> 9 print(ice_creams[3])
+ 10
+ 11 favorite_ice_cream()
+
+IndexError: list index out of range
+
+
This particular traceback has two levels. You can determine the
+number of levels by looking for the number of arrows on the left hand
+side. In this case:
+
+
The first shows code from the cell above, with an arrow pointing
+to Line 11 (which is favorite_ice_cream()).
+
The second shows some code in the function
+favorite_ice_cream, with an arrow pointing to Line 9 (which
+is print(ice_creams[3])).
+
+
The last level is the actual place where the error occurred. The
+other level(s) show what function the program executed to get to the
+next level down. So, in this case, the program first performed a function call to the function
+favorite_ice_cream. Inside this function, the program
+encountered an error on Line 6, when it tried to run the code
+print(ice_creams[3]).
+
+
+
+
+
+
Long Tracebacks
+
+
+
Sometimes, you might see a traceback that is very long -- sometimes
+they might even be 20 levels deep! This can make it seem like something
+horrible happened, but the length of the error message does not reflect
+severity, rather, it indicates that your program called many functions
+before it encountered the error. Most of the time, the actual place
+where the error occurred is at the bottom-most level, so you can skip
+down the traceback to the bottom.
+
+
+
+
So what error did the program actually encounter? In the last line of
+the traceback, Python helpfully tells us the category or type of error
+(in this case, it is an IndexError) and a more detailed
+error message (in this case, it says “list index out of range”).
+
If you encounter an error and don’t know what it means, it is still
+important to read the traceback closely. That way, if you fix the error,
+but encounter a new one, you can tell that the error changed.
+Additionally, sometimes knowing where the error occurred is
+enough to fix it, even if you don’t entirely understand the message.
+
If you do encounter an error you don’t recognize, try looking at the
+official
+documentation on errors. However, note that you may not always be
+able to find the error there, as it is possible to create custom errors.
+In that case, hopefully the custom error message is informative enough
+to help you figure out what went wrong. Libraries like pandas and numpy
+have these custom errors, but the procedure to figure them out is the
+same: go to the earliest line in the error, and look at the error
+message for it. The documentation for these libraries will often provide
+the information you need about any functions you are using. There are
+also large communities of users for data libraries that can help as
+well!
+
+
+
+
+
+
Reading Error Messages
+
+
+
Read the Python code and the resulting traceback below, and answer
+the following questions:
+
+
How many levels does the traceback have?
+
What is the function name where the error occurred?
+
On which line number in this function did the error occur?
+
What is the type of error?
+
What is the error message?
+
+
+
PYTHON
+
+
# This code has an intentional error. Do not type it directly;
+# use it for reference to understand the error message below.
+def print_message(day):
+ messages = [
+'Hello, world!',
+'Today is Tuesday!',
+'It is the middle of the week.',
+'Today is Donnerstag in German!',
+'Last day of the week!',
+'Hooray for the weekend!',
+'Aw, the weekend is almost over.'
+ ]
+print(messages[day])
+
+def print_sunday_message():
+ print_message(7)
+
+print_sunday_message()
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-7-3ad455d81842> in <module>
+ 16 print_message(7)
+ 17
+---> 18 print_sunday_message()
+ 19
+
+<ipython-input-7-3ad455d81842> in print_sunday_message()
+ 14
+ 15 def print_sunday_message():
+---> 16 print_message(7)
+ 17
+ 18 print_sunday_message()
+
+<ipython-input-7-3ad455d81842> in print_message(day)
+ 11 'Aw, the weekend is almost over.'
+ 12 ]
+---> 13 print(messages[day])
+ 14
+ 15 def print_sunday_message():
+
+IndexError: list index out of range
+
+
+
+
+
+
+
+
+
+
+
3 levels
+
print_message
+
13
+
IndexError
+
+list index out of range You can then infer that
+7 is not the right index to use with
+messages.
+
+
+
+
+
+
+
+
+
+
+
Better errors on newer Pythons
+
+
+
Newer versions of Python have improved error printouts. If you are
+debugging errors, it is often helpful to use the latest Python version,
+even if you support older versions of Python.
+
+
+
+
Type Errors
+
+
+
+
One of the most common types of errors in Python are called type
+errors. These errors occur when you try to perform an operation on
+an object in python that cannot support it. This happens easily when
+working with large datasets where there are expected value types like
+either strings or integers. When we write a function expecting integers,
+we will not get an error until we encounter an operation that cannot
+handle strings. For example:
File "<ipython-input-3-6bb841ea1423>", line 3
+ letter=my_string["e"]
+ ^
+TypeError: string indices must be integers
+
+
We get this error because we are trying to use an index to access
+part of our string, which requires an integer. Instead, we entered a
+character and received a type error. This is fixed by replacing “e” with
+2.
+
In the case of datasets, we often see type errors when a mathematical
+operation, such as taking a mean, is performed on a column that contains
+characters, either as a result of formatting or introduced through
+error. As a result, correcting the error can involve simply removing the
+characters from the strings using regular expressions, or if the
+characters have resulted in incorrect data, removing those observations
+from the dataset.
+
Syntax Errors
+
+
+
+
When you forget a colon at the end of a line, accidentally add one
+space too many when indenting under an if statement, or
+forget a parenthesis, you will encounter a syntax error. This means that
+Python couldn’t figure out how to read your program. This is similar to
+forgetting punctuation in English: for example, this text is difficult
+to read there is no punctuation there is also no capitalization why is
+this hard because you have to figure out where each sentence ends you
+also have to figure out where each sentence begins to some extent it
+might be ambiguous if there should be a sentence break or not
+
People can typically figure out what is meant by text with no
+punctuation, but people are much smarter than computers. If Python
+doesn’t know how to read the program, it will give up and inform you
+with an error. For example:
Here, Python tells us that there is a SyntaxError on
+line 1, and even puts a little arrow in the place where there is an
+issue. In this case the problem is that the function definition is
+missing a colon at the end.
+
Actually, the function above has two issues with syntax. If
+we fix the problem with the colon, we see that there is also an
+IndentationError, which means that the lines in the
+function definition do not all have the same indentation:
Both SyntaxError and IndentationError
+indicate a problem with the syntax of your program, but an
+IndentationError is more specific: it always means
+that there is a problem with how your code is indented.
+
+
+
+
+
+
Tabs and Spaces
+
+
+
Some indentation errors are harder to spot than others. In
+particular, mixing spaces and tabs can be difficult to spot because they
+are both whitespace. In the
+example below, the first two lines in the body of the function
+some_function are indented with tabs, while the third line
+— with spaces. If you’re working in a Jupyter notebook, be sure to copy
+and paste this example rather than trying to type it in manually because
+Jupyter automatically replaces tabs with spaces.
Visually it is impossible to spot the error. Fortunately, Python does
+not allow you to mix tabs and spaces.
+
+
ERROR
+
+
File "<ipython-input-5-653b36fbcd41>", line 4
+ return msg
+ ^
+TabError: inconsistent use of tabs and spaces in indentation
+
+
+
+
+
Variable Name Errors
+
+
+
+
Another very common type of error is called a NameError,
+and occurs when you try to use a variable that does not exist. For
+example:
+
+
PYTHON
+
+
print(a)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-7-9d7b17ad5387> in <module>()
+----> 1 print(a)
+
+NameError: name 'a' is not defined
+
+
Variable name errors come with some of the most informative error
+messages, which are usually of the form “name ‘the_variable_name’ is not
+defined”.
+
Why does this error message occur? That’s a harder question to
+answer, because it depends on what your code is supposed to do. However,
+there are a few very common reasons why you might have an undefined
+variable. The first is that you meant to use a string, but forgot to put quotes around
+it:
+
+
PYTHON
+
+
print(hello)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-8-9553ee03b645> in <module>()
+----> 1 print(hello)
+
+NameError: name 'hello' is not defined
+
+
The second reason is that you might be trying to use a variable that
+does not yet exist. In the following example, count should
+have been defined (e.g., with count = 0) before the for
+loop:
+
+
PYTHON
+
+
for number inrange(10):
+ count = count + number
+print('The count is:', count)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-9-dd6a12d7ca5c> in <module>()
+ 1 for number in range(10):
+----> 2 count = count + number
+ 3 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+
Finally, the third possibility is that you made a typo when you were
+writing your code. Let’s say we fixed the error above by adding the line
+Count = 0 before the for loop. Frustratingly, this actually
+does not fix the error. Remember that variables are case-sensitive, so the variable
+count is different from Count. We still get
+the same error, because we still have not defined
+count:
+
+
PYTHON
+
+
Count =0
+for number inrange(10):
+ count = count + number
+print('The count is:', count)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-10-d77d40059aea> in <module>()
+ 1 Count = 0
+ 2 for number in range(10):
+----> 3 count = count + number
+ 4 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+
Index Errors
+
+
+
+
Next up are errors having to do with containers (like lists and
+strings) and the items within them. If you try to access an item in a
+list or a string that does not exist, then you will get an error. This
+makes sense: if you asked someone what day they would like to get
+coffee, and they answered “caturday”, you might be a bit annoyed. Python
+gets similarly annoyed if you try to ask it for an item that doesn’t
+exist:
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-11-d817f55b7d6c> in <module>()
+ 3 print('Letter #2 is', letters[1])
+ 4 print('Letter #3 is', letters[2])
+----> 5 print('Letter #4 is', letters[3])
+
+IndexError: list index out of range
+
+
Here, Python is telling us that there is an IndexError
+in our code, meaning we tried to access a list index that did not
+exist.
+
File Errors
+
+
+
+
The last type of error we’ll cover today are the most common type of
+error when using Python with data, those associated with reading and
+writing files: FileNotFoundError. If you try to read a file
+that does not exist, you will receive a FileNotFoundError
+telling you so. If you attempt to write to a file that was opened
+read-only, Python 3 returns an UnsupportedOperationError.
+More generally, problems with input and output manifest as
+OSErrors, which may show up as a more specific subclass;
+you can see the
+list in the Python docs. They all have a unique UNIX
+errno, which is you can see in the error message.
+
+
PYTHON
+
+
file_handle =open('myfile.txt', 'r')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+FileNotFoundError Traceback (most recent call last)
+<ipython-input-14-f6e1ac4aee96> in <module>()
+----> 1 file_handle = open('myfile.txt', 'r')
+
+FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
+
+
One reason for receiving this error is that you specified an
+incorrect path to the file. For example, if I am currently in a folder
+called myproject, and I have a file in
+myproject/writing/myfile.txt, but I try to open
+myfile.txt, this will fail. The correct path would be
+writing/myfile.txt. It is also possible that the file name
+or its path contains a typo. There may also be specific settings based
+on your organization if you are using shared, networked, or cloud-based
+drives. It is best to check with your IT administrators if you are still
+encountering issues reading in a file after troubleshooting.
+
A related issue can occur if you use the “read” flag instead of the
+“write” flag. Python will not give you an error if you try to open a
+file for writing when the file does not exist. However, if you meant to
+open a file for reading, but accidentally opened it for writing, and
+then try to read from it, you will get an
+UnsupportedOperation error telling you that the file was
+not opened for reading:
If you are getting a read or write error on file or folder that you
+are able to open and/or edit with other programs, you may need to
+contact an IT administrator to check the permissions granted to you and
+any programs you are using.
+
These are the most common errors with files, though many others
+exist. If you get an error that you’ve never seen before, searching the
+Internet for that error type often reveals common reasons why you might
+get that error.
+
+
+
+
+
+
Identifying Syntax Errors
+
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. Is it a
+SyntaxError or an IndentationError?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
+
PYTHON
+
+
def another_function
+print('Syntax errors are annoying.')
+print('But at least Python tells us about them!')
+print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+
+
+
SyntaxError for missing (): at end of first
+line, IndentationError for mismatch between second and
+third lines. A fixed version is:
+
+
PYTHON
+
+
def another_function():
+print('Syntax errors are annoying.')
+print('But at least Python tells us about them!')
+print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+
+
+
+
Identifying Variable Name Errors
+
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. What type of
+NameError do you think this is? In other words, is it a
+string with no quotes, a misspelled variable, or a variable that should
+have been defined but was not?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
+
PYTHON
+
+
for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (Number %3) ==0:
+ message = message + a
+else:
+ message = message +'b'
+print(message)
+
+
+
+
+
+
+
+
+
+
3 NameErrors for number being misspelled,
+for message not defined, and for a not being
+in quotes.
+
Fixed version:
+
+
PYTHON
+
+
message =''
+for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (number %3) ==0:
+ message = message +'a'
+else:
+ message = message +'b'
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Identifying Index Errors
+
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. What type of error is
+it?
+
Fix the error.
+
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+
+
+
+
IndexError; the last entry is seasons[3],
+so seasons[4] doesn’t make sense. A fixed version is:
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[-1])
+
+
+
+
+
+
A Final Note About Correcting Errors
+
+
+
+
There are a lot of very helpful answers for many error messages,
+however when working with official statistics, we need to also exercise
+some caution. Be aware and be wary of any answers that ask you to
+download a package from someone’s personal GitHub repository or other
+file sharing service. Try to find the type of error first and understand
+what the issue is before downloading anything claiming to fix the error.
+If the error is the result of an issue with a version of a package,
+check if there are any security vulnerabilities with that version, and
+use a package manager to move between package versions.
');
+ },
+
+ createChildNavList: function($parent) {
+ var $childList = this.createNavList();
+ $parent.append($childList);
+ return $childList;
+ },
+
+ generateNavEl: function(anchor, text) {
+ var $a = $('');
+ $a.attr('href', '#' + anchor);
+ $a.text(text);
+ var $li = $('');
+ $li.append($a);
+ return $li;
+ },
+
+ generateNavItem: function(headingEl) {
+ var anchor = this.generateAnchor(headingEl);
+ var $heading = $(headingEl);
+ var text = $heading.data('toc-text') || $heading.text();
+ return this.generateNavEl(anchor, text);
+ },
+
+ // Find the first heading level (`
`, then `
`, etc.) that has more than one element. Defaults to 1 (for `
`).
+ getTopLevel: function($scope) {
+ for (var i = 1; i <= 6; i++) {
+ var $headings = this.findOrFilter($scope, 'h' + i);
+ if ($headings.length > 1) {
+ return i;
+ }
+ }
+
+ return 1;
+ },
+
+ // returns the elements for the top level, and the next below it
+ getHeadings: function($scope, topLevel) {
+ var topSelector = 'h' + topLevel;
+
+ var secondaryLevel = topLevel + 1;
+ var secondarySelector = 'h' + secondaryLevel;
+
+ return this.findOrFilter($scope, topSelector + ',' + secondarySelector);
+ },
+
+ getNavLevel: function(el) {
+ return parseInt(el.tagName.charAt(1), 10);
+ },
+
+ populateNav: function($topContext, topLevel, $headings) {
+ var $context = $topContext;
+ var $prevNav;
+
+ var helpers = this;
+ $headings.each(function(i, el) {
+ var $newNav = helpers.generateNavItem(el);
+ var navLevel = helpers.getNavLevel(el);
+
+ // determine the proper $context
+ if (navLevel === topLevel) {
+ // use top level
+ $context = $topContext;
+ } else if ($prevNav && $context === $topContext) {
+ // create a new level of the tree and switch to it
+ $context = helpers.createChildNavList($prevNav);
+ } // else use the current $context
+
+ $context.append($newNav);
+
+ $prevNav = $newNav;
+ });
+ },
+
+ parseOps: function(arg) {
+ var opts;
+ if (arg.jquery) {
+ opts = {
+ $nav: arg
+ };
+ } else {
+ opts = arg;
+ }
+ opts.$scope = opts.$scope || $(document.body);
+ return opts;
+ }
+ },
+
+ // accepts a jQuery object, or an options object
+ init: function(opts) {
+ opts = this.helpers.parseOps(opts);
+
+ // ensure that the data attribute is in place for styling
+ opts.$nav.attr('data-toggle', 'toc');
+
+ var $topContext = this.helpers.createChildNavList(opts.$nav);
+ var topLevel = this.helpers.getTopLevel(opts.$scope);
+ var $headings = this.helpers.getHeadings(opts.$scope, topLevel);
+ this.helpers.populateNav($topContext, topLevel, $headings);
+ }
+ };
+
+ $(function() {
+ $('nav[data-toggle="toc"]').each(function(i, el) {
+ var $nav = $(el);
+ Toc.init($nav);
+ });
+ });
+})();
diff --git a/config.yaml b/config.yaml
new file mode 100644
index 0000000..d4938f1
--- /dev/null
+++ b/config.yaml
@@ -0,0 +1,90 @@
+#------------------------------------------------------------
+# Values for this lesson.
+#------------------------------------------------------------
+
+# Which carpentry is this (swc, dc, lc, or cp)?
+# swc: Software Carpentry
+# dc: Data Carpentry
+# lc: Library Carpentry
+# cp: Carpentries (to use for instructor training for instance)
+# incubator: The Carpentries Incubator
+carpentry: 'cp'
+
+# Overall title for pages.
+title: 'Python for Official Statistics'
+
+# Date the lesson was created (YYYY-MM-DD, this is empty by default)
+created: 2023-03-06
+
+# Comma-separated list of keywords for the lesson
+keywords: 'software, data, lesson, The Carpentries'
+
+# Life cycle stage of the lesson
+# possible values: pre-alpha, alpha, beta, stable
+life_cycle: 'pre-alpha'
+
+# License of the lesson materials (recommended CC-BY 4.0)
+license: 'CC-BY 4.0'
+
+# Link to the source repository for this lesson
+source: 'https://github.com/UNECE/ModernStats_Python'
+
+# Default branch of your lesson
+branch: 'main'
+
+# Who to contact if there are any issues
+contact: 'team@carpentries.org'
+
+# Navigation ------------------------------------------------
+#
+# Use the following menu items to specify the order of
+# individual pages in each dropdown section. Leave blank to
+# include all pages in the folder.
+#
+# Example -------------
+#
+# episodes:
+# - introduction.md
+# - first-steps.md
+#
+# learners:
+# - setup.md
+#
+# instructors:
+# - instructor-notes.md
+#
+# profiles:
+# - one-learner.md
+# - another-learner.md
+
+# Order of episodes in your lesson
+episodes:
+- 01-introduction.md
+- 02-python_fundamentals.md
+- 03-data_transformation.md
+- 04-lists.md
+- 05-loops.md
+- 06-alternative_loops.md
+- 07-functions.md
+- 08-data_analysis.md
+- 09-visualizations.md
+- 10-errors_exceptions.md
+
+# Information for Learners
+learners:
+
+# Information for Instructors
+instructors:
+
+# Learner Profiles
+profiles:
+
+# Customisation ---------------------------------------------
+#
+# This space below is where custom yaml items (e.g. pinning
+# sandpaper and varnish versions) should live
+
+
+url: 'https://UNECE.github.io/ModernStats_Python'
+analytics: carpentries
+lang: en
diff --git a/discuss.html b/discuss.html
new file mode 100644
index 0000000..f009eac
--- /dev/null
+++ b/discuss.html
@@ -0,0 +1,449 @@
+
+Python for Official Statistics: Discussion
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
+
+
+
+
diff --git a/index.html b/index.html
new file mode 100644
index 0000000..e2d7808
--- /dev/null
+++ b/index.html
@@ -0,0 +1,443 @@
+
+Python for Official Statistics: Summary and Setup
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Python for Official Statistics
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Summary and Setup
+
+
+
Python for Official Statistics will teach participants the basics of
+Python for its use in creating Official Statistics. Participants will
+learn basic programming principles, and employ them in the manipulation
+of data and data structures.
How do I find reliable and safe resources or code online?
+
+
+
+
+
+
+
Objectives
+
identify basic concepts in programming
+
+
+
+
+
+
Programming in Python
+
+
In most general terms, programming is the process of writing
+instructions for a computer. In this course we will be using Python as
+the language to communicate with the computer.
+
+
Strictly speaking, Python is an interpreted language, rather than a
+compiled language, meaning we are not communicating directly with the
+computer when we use Python. When we run Python code, our Python source
+code is first translated into byte code, which is then executed by the
+Python virtual machine.
+
+
Programming is a wide topic including a variety of techniques and
+tools. In this course we’ll be focusing on programming for statistical
+analysis.
+
+
IDEs
+
IDE stands for Integrated Development Environment. IDEs are where you
+will write, edit, and debug python scripts, so you want to choose one
+that makes you feel comfortable and includes the functionality that you
+need. Some open-source IDEs for Python include JupyterLab and Visual Studio
+Code.
+
+
+
Packages
+
Packages, or libraries, are extensions to the statistical programming
+language. They contain code, data, and documentation in a standardised
+collection format that can be installed by users, typically via a
+centralised software repository. A typical Python workflow will use base
+Python (the core operations and functions provided by your Python
+installation) as well as specialised data analysis and scientific
+packages like NumPy, SciPy and Pandas.
+
+
Best Practices
+
+
Let’s overview some base concepts that any programmer should always
+keep in mind.
+
+
Documentation
+
Have you ever returned to a task and tried to read a note that you
+quickly scrawled for yourself the last time you were working on it? Have
+you ever inherited a project from a colleague and found you have no idea
+what remains to be done?
+
It can be very challenging to return to your own work or a
+colleague’s and this goes doubly for programming. Documentation is one
+way we can reduce the burden on future selves and our colleagues.
+
+
Inline Documentation
+
As a new programmer, inline documentation can be the most helpful.
+Inline documentation refers to writing comments on the same line as your
+code. For example, if we wrote a line of code to sum 1+1, we might
+document it as follows:
+
+
PYTHON
+
+
1+1# adding the numbers 1 and 1 together.
+
+
Although this is a very simple line of code and it might seem like
+overkill to document it in this way, these types of comments can be very
+helpful in jogging your memory when returning to a project. Inline
+comments can also help you to break multi-step programs into digestible
+and readable pieces.
+
+
+
External Documentation
+
Sometimes you require more detail than you can comfortably fit in
+your inline documentation. In this case it can be helpful to create
+separate files to document your project. This type of documentation will
+typically focus on the goals, scope, and any special instructions
+relating to your project rather than the details fo your code. The most
+common type of external documentation is a README file. It is best
+practice to create a basic README file for any project. A basic README
+should include:
+
a brief description of the project,
+
any special instructions for installation or use,
+
the authors and any references.
+
README files are just text files and it is best practice is to save
+your README file as a README.md markdown document. This
+file format is automatically recognised by code repositories like
+GitHub, so your README contents are displayed alongside your code
+repository.
+
+
+
DocStrings
+
In chapter 7: functions we’ll learn
+about documentation specific to functions known as DocStrings.
+
+
+
Getting Help
+
+
Later on, in chapter 10: Errors
+and Exceptions we will cover errors in more detail. However, before
+we get there it’s very likely you’ll need some assistance writing Python
+code.
+
+
Built-in Help
+
There is a help
+function built into base Python. You can use it to investigate
+built-in functions, data types, and more. For example, say we want to
+know more about the print() function in Python:
+
+
PYTHON
+
+
help(print)
+
+
+
OUTPUT
+
+
Help on built-in function print in module builtins:
+
+print(...)
+ print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
+
+ Prints the values to a stream, or to sys.stdout by default.
+ Optional keyword arguments:
+ file: a file-like object (stream); defaults to the current sys.stdout.
+ sep: string inserted between values, default a space.
+ end: string appended after the last value, default a newline.
+-- More --
+
+
+
+
Finding Resources online
+
Stack Overflow is a valuable
+resource for programmers of all levels. It can be daunting to post your
+own question! Fortunately, chances are someone else has already asked a
+similar question!
It can also be helpful to do a general search for a particular topic
+or error message. It’s very likely the first few results will be from
+StackOverflow, followed by a few from official documentation and then
+you may start seeing results from personal blogs or third parties. These
+third party results can sometime be valuable but we should be cautious!
+Here are a few things to keep in mind when you are looking for online
+resources:
+
Don’t download or install anything unless you are certain of what it
+is and why you need it.
+
Don’t copy or run code unless you fully understand what it
+does.
+
Python is an open-source language; official documentation and
+resources will not be behind a paywall.
+
You may not find a resource or solution to fit your exact needs. Try
+to be flexible and adapt online solutions to fit your needs.
+
+
+
+
+
+
Key Points
+
+
+
Python is an interpreted language.
+
Code is commonly developed inside an integrated development
+environment.
+
A typical Python workflow uses base Python and additional Python
+packages developed for statistical programming purposes.
+
In-line and external documentation helps ensure that your code is
+readable.
+
You can find help through the built-in help function and external
+resources.
Can I change the value associated with a variable after I create
+it?
+
+
+
+
+
+
+
Objectives
+
Assign values to variables.
+
+
+
+
+
+
Variables
+
+
Any Python interpreter can be used as a calculator:
+
+
PYTHON
+
+
3+5*4
+
+
+
OUTPUT
+
+
23
+
+
This is great but not very interesting. To do anything useful with
+data, we need to assign its value to a variable. In Python, we
+can assign a value to a variable, using the equals sign
+=. For example, we can track the weight of a patient who
+weighs 60 kilograms by assigning the value 60 to a variable
+weight_kg:
+
+
PYTHON
+
+
weight_kg =60
+
+
From now on, whenever we use weight_kg, Python will
+substitute the value we assigned to it. In layperson’s terms, a
+variable is a name for a value.
+weight0 is a valid variable name, whereas
+0weight is not
+
+weight and Weight are different
+variables
+
Types of data
+
+
Python knows various types of data. Three common ones are:
+
integer numbers
+
floating point numbers, and
+
strings.
+
In the example above, variable weight_kg has an integer
+value of 60. If we want to more precisely track the weight
+of our patient, we can use a floating point value by executing:
+
+
PYTHON
+
+
weight_kg =60.3
+
+
To create a string, we add single or double quotes around some text.
+To identify and track a patient throughout our study, we can assign each
+person a unique identifier by storing it in a string:
+
+
PYTHON
+
+
patient_id ='001'
+
+
Using Variables in Python
+
+
Once we have data stored with variable names, we can make use of it
+in calculations. We may want to store our patient’s weight in pounds as
+well as kilograms:
+
+
PYTHON
+
+
weight_lb =2.2* weight_kg
+
+
We might decide to add a prefix to our patient identifier:
+
+
PYTHON
+
+
patient_id ='inflam_'+ patient_id
+
+
Built-in Python functions
+
+
To carry out common tasks with data and variables in Python, the
+language provides us with several built-in functions. To display information to
+the screen, we use the print function:
+
+
PYTHON
+
+
print(weight_lb)
+print(patient_id)
+
+
+
OUTPUT
+
+
132.66
+inflam_001
+
+
When we want to make use of a function, referred to as calling the
+function, we follow its name by parentheses. The parentheses are
+important: if you leave them off, the function doesn’t actually run!
+Sometimes you will include values or variables inside the parentheses
+for the function to use. In the case of print, we use the
+parentheses to tell the function what value we want to display. We will
+learn more about how functions work and how to create our own in later
+episodes.
+
We can display multiple things at once using only one
+print call:
+
+
PYTHON
+
+
print(patient_id, 'weight in kilograms:', weight_kg)
+
+
+
OUTPUT
+
+
inflam_001 weight in kilograms: 60.3
+
+
We can also call a function inside of another function call. For example,
+Python has a built-in function called type that tells you a
+value’s data type:
+
+
PYTHON
+
+
print(type(60.3))
+print(type(patient_id))
+
+
+
OUTPUT
+
+
<class 'float'>
+<class 'str'>
+
+
Moreover, we can do arithmetic with variables right inside the
+print function:
+
+
PYTHON
+
+
print('weight in pounds:', 2.2* weight_kg)
+
+
+
OUTPUT
+
+
weight in pounds: 132.66
+
+
The above command, however, did not change the value of
+weight_kg:
+
+
PYTHON
+
+
print(weight_kg)
+
+
+
OUTPUT
+
+
60.3
+
+
To change the value of the weight_kg variable, we have
+to assignweight_kg a new value using the
+equals = sign:
+
+
PYTHON
+
+
weight_kg =65.0
+print('weight in kilograms is now:', weight_kg)
+
+
+
OUTPUT
+
+
weight in kilograms is now: 65.0
+
+
+
+
+
+
+
Variables as Sticky Notes
+
+
+
A variable in Python is analogous to a sticky note with a name
+written on it: assigning a value to a variable is like putting that
+sticky note on a particular value.
+
Using this analogy, we can investigate how assigning a value to one
+variable does not change values of other, seemingly
+related, variables. For example, let’s store the subject’s weight in
+pounds in its own variable:
+
+
PYTHON
+
+
# There are 2.2 pounds per kilogram
+weight_lb =2.2* weight_kg
+print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
+
+
+
OUTPUT
+
+
weight in kilograms: 65.0 and in pounds: 143.0
+
+
Everything in a line of code following the ‘#’ symbol is a comment that is ignored by Python.
+Comments allow programmers to leave explanatory notes for other
+programmers or their future selves.
+
Similar to above, the expression 2.2 * weight_kg is
+evaluated to 143.0, and then this value is assigned to the
+variable weight_lb (i.e. the sticky note
+weight_lb is placed on 143.0). At this point,
+each variable is “stuck” to completely distinct and unrelated
+values.
+
Let’s now change weight_kg:
+
+
PYTHON
+
+
weight_kg =100.0
+print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
+
+
+
OUTPUT
+
+
weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
+
+
Since weight_lb doesn’t “remember” where its value comes
+from, it is not updated when we change weight_kg.
+
+
+
+
+
+
+
+
+
Check Your Understanding
+
+
+
What values do the variables mass and age
+have after each of the following statements? Test your answer by
+executing the lines.
+
+
PYTHON
+
+
mass =47.5
+age =122
+mass = mass *2.0
+age = age -20
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
`mass` holds a value of 47.5, `age` does not exist
+`mass` still holds a value of 47.5, `age` holds a value of 122
+`mass` now has a value of 95.0, `age`'s value is still 122
+`mass` still has a value of 95.0, `age` now holds 102
+
+
+
+
+
+
+
+
+
+
+
Sorting Out References
+
+
+
Python allows you to assign multiple values to multiple variables in
+one line by separating the variables and values with commas. What does
+the following program print out?
+
+
PYTHON
+
+
first, second ='Grace', 'Hopper'
+third, fourth = second, first
+print(third, fourth)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
Hopper Grace
+
+
+
+
+
+
+
+
+
+
+
Seeing Data Types
+
+
+
What are the data types of the following variables?
+
+
diff --git a/instructor/03-data_transformation.html b/instructor/03-data_transformation.html
new file mode 100644
index 0000000..2f61a80
--- /dev/null
+++ b/instructor/03-data_transformation.html
@@ -0,0 +1,865 @@
+
+Python for Official Statistics: Data Transformation
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
Explain what a library is and what libraries are used for.
+
Import a Python library and use the functions it contains.
+
Read tabular data from a file into a program.
+
Select individual values and subsections from data.
+
Perform operations on arrays of data.
+
+
+
+
+
+
Words are useful, but what’s more useful are the sentences and
+stories we build with them. Similarly, while a lot of powerful, general
+tools are built into Python, specialized tools built up from these basic
+units live in libraries that can be
+called upon when needed.
+
Loading data into Python
+
+
To begin processing the clinical trial inflammation data, we need to
+load it into Python. Python can work with many different file types.
+Text files can be loaded into Python by using the base Python
+function
+
+
PYTHON
+
+
Open("filename.txt", "r")
+
+
where “r” means read only, or if you want to write to the file, you
+can use “w”.
+
However, our patient data is in a csv. file, which is more commonly
+loaded by using a library. Python has hundreds of thousands of libraries
+to choose from to help carry out your work. Importing a library is like
+getting a piece of lab equipment out of a storage locker and setting it
+up on the bench. Libraries provide additional functionality to the basic
+Python package, much like a new piece of equipment adds functionality to
+a lab space. Just like in the lab, importing too many libraries can
+sometimes complicate and slow down your programs - so we only import
+what we need for each program. There are a couple common Python
+libraries to load (and work with data).
+
pandas
+
+
The first library we will present is called pandas pandas is a
+Python library containing a set of functions and specialised data
+structures that have been designed to help Python programmers to perform
+data analysis tasks in a structured way.
+
Most of the things that pandas can do can be done with basic Python,
+but the collected set of pandas functions and data structure makes the
+data analysis tasks more consistent in terms of syntax and therefore
+aids readabilty.
+
Remember to write the library name with a lower case ‘p’ because the
+name of the package and Python is case sensitive.
+
+
Importing the pandas library
+
Importing the pandas library is done in exactly the same way as for
+any other library. In almost all examples of Python code using the
+pandas library, it will have been imported and given an alias of
+pd. We will follow the same convention.
+
+
PYTHON
+
+
import pandas as pd
+
+
+
+
Pandas data structures
+
There are two main data structure used by pandas, they are the Series
+and the Dataframe. The Series equates in general to a vector or a list.
+The Dataframe is equivalent to a table. Each column in a pandas
+Dataframe is a pandas Series data structure.
+
We will mainly be looking at the Dataframe.
+
We can easily create a Pandas Dataframe by reading a .csv file
+
+
+
Reading a csv file
+
When we read a csv dataset in base Python we did so by opening the
+dataset, reading and processing a record at a time and then closing the
+dataset after we had read the last record. Reading datasets in this way
+is slow and places all of the responsibility for extracting individual
+data items of information from the records on the programmer.
+
The main advantage of this approach, however, is that you only have
+to store one dataset record in memory at a time. This means that if you
+have the time, you can process datasets of any size.
+
In Pandas, csv files are read as complete datasets. You do not have
+to explicitly open and close the dataset. All of the dataset records are
+assembled into a Dataframe. If your dataset has column headers in the
+first record then these can be used as the Dataframe column names. You
+can explicitly state this in the parameters to the call, but pandas is
+usually able to infer that there ia a header row and use it
+automatically.
+
To tell Python that we’d like to start using pandas, we need to import it:
+
+
PYTHON
+
+
import pandas as pd
+
+
Often, libraries are given an alias or a short form name, in this
+case pandas is given the alias “pd”. Aliases for common data analysis
+libraries include:
+
+
PYTHON
+
+
import pandas as pd
+import numpy as np
+import matplotlib as plt
+import seaborn as sns
+
+
Once we’ve imported the library, we can ask the library to read our
+data file for us:
+
+
PYTHON
+
+
pd.read_csv("filename.csv)
+
+
pandas is a commonly used library for working with and analysing
+data. However, we will be working with a different package for the
+remainder of this course. If you would like to learn more about data
+manipulation and analysis using pandas, we recommend checking out Data Analysis and
+Visualization with Python for Social Scientists.
+
+
numpy
+
+
The second package that we will present is called NumPy, which stands for Numerical
+Python. In general, you should use this library when you want to do
+fancy things with lots of numbers, especially if you have matrices or
+arrays. Numpy matrices are typically lighter weight with better
+performance, particularly when working with large datasets.
+
We will be using this package to work with our clinical trial
+inflammation data.
+
To tell Python that we’d like to start using NumPy, we need to import it:
+
+
PYTHON
+
+
import numpy as np
+
+
Now that we have imported the library, we can ask the library (by
+using the alisa np) to read our data file for us:
The expression np.loadtxt(...) is a function call that asks Python
+to run the function
+loadtxt which belongs to the np library. The
+dot notation in Python is used most of all as an object
+attribute/property specifier or for invoking its method.
+object.property will give you the object.property value,
+object_name.method() will invoke on object_name method.
+
As an example, John Smith is the John that belongs to the Smith
+family. We could use the dot notation to write his name
+smith.john, just as loadtxt is a function that
+belongs to the np library.
+
np.loadtxt has two parameters: the name of the file we
+want to read and the delimiter
+that separates values on a line. These both need to be character strings
+(or strings for short), so we put
+them in quotes.
+
Since we haven’t told it to do anything else with the function’s
+output, the notebook displays it.
+In this case, that output is the data we just loaded. By default, only a
+few rows and columns are shown (with ... to omit elements
+when displaying big arrays). Note that, to save space when displaying
+NumPy arrays, Python does not show us trailing zeros, so
+1.0 becomes 1..
+
Our call to np.loadtxt read our file but didn’t save the
+data in memory. To do that, we need to assign the array to a variable.
+In a similar manner to how we assign a single value to a variable, we
+can also assign an array of values to a variable using the same syntax.
+Let’s re-run np.loadtxt and save the returned data:
+
+
PYTHON
+
+
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+
This statement doesn’t produce any output because we’ve assigned the
+output to the variable data. If we want to check that the
+data have been loaded, we can print the variable’s value:
Now that the data are in memory, we can manipulate them. First, let’s
+ask what type of thing
+data refers to:
+
+
PYTHON
+
+
print(type(data))
+
+
+
OUTPUT
+
+
<class 'np.ndarray'>
+
+
The output tells us that data currently refers to an
+N-dimensional array, the functionality for which is provided by the
+NumPy library. These data correspond to arthritis patients’
+inflammation. The rows are the individual patients, and the columns are
+their daily inflammation measurements.
+
+
+
+
+
+
Data Type
+
+
+
A Numpy array contains one or more elements of the same type. The
+type function will only tell you that a variable is a NumPy
+array but won’t tell you the type of thing inside the array. We can find
+out the type of the data contained in the NumPy array.
With the following command, we can see the array’s shape:
+
+
PYTHON
+
+
print(data.shape)
+
+
+
OUTPUT
+
+
(60, 40)
+
+
The output tells us that the data array variable
+contains 60 rows and 40 columns. When we created the variable
+data to store our arthritis data, we did not only create
+the array; we also created information about the array, called members or attributes. This extra
+information describes data in the same way an adjective
+describes a noun. data.shape is an attribute of
+data which describes the dimensions of data.
+We use the same dotted notation for the attributes of variables that we
+use for the functions in libraries because they have the same
+part-and-whole relationship.
+
If we want to get a single number from the array, we must provide an
+index in square brackets after the
+variable name, just as we do in math when referring to an element of a
+matrix. Our inflammation data has two dimensions, so we will need to use
+two indices to refer to one specific value:
+
+
PYTHON
+
+
print('first value in data:', data[0, 0])
+
+
+
OUTPUT
+
+
first value in data: 0.0
+
+
+
PYTHON
+
+
print('middle value in data:', data[29, 19])
+
+
+
OUTPUT
+
+
middle value in data: 16.0
+
+
The expression data[29, 19] accesses the element at row
+30, column 20. While this expression may not surprise you,
+data[0, 0] might. Programming languages like Fortran,
+MATLAB and R start counting at 1 because that’s what human beings have
+done for thousands of years. Languages in the C family (including C++,
+Java, Perl, and Python) count from 0 because it represents an offset
+from the first value in the array (the second value is offset by one
+index from the first value). This is closer to the way that computers
+represent arrays (if you are interested in the historical reasons behind
+counting indices from zero, you can read Mike
+Hoye’s blog post). As a result, if we have an M×N array in Python,
+its indices go from 0 to M-1 on the first axis and 0 to N-1 on the
+second. It takes a bit of getting used to, but one way to remember the
+rule is that the index is how many steps we have to take from the start
+to get the item we want.
+
+
+
+
+
+
In the Corner
+
+
+
What may also surprise you is that when Python displays an array, it
+shows the element with index [0, 0] in the upper left
+corner rather than the lower left. This is consistent with the way
+mathematicians draw matrices but different from the Cartesian
+coordinates. The indices are (row, column) instead of (column, row) for
+the same reason, which can be confusing when plotting data.
+
+
+
+
Slicing data
+
+
An index like [30, 20] selects a single element of an
+array, but we can select whole sections as well. For example, we can
+select the first ten days (columns) of values for the first four
+patients (rows) like this:
The slice0:4 means,
+“Start at index 0 and go up to, but not including, index 4”. Again, the
+up-to-but-not-including takes a bit of getting used to, but the rule is
+that the difference between the upper and lower bounds is the number of
+values in the slice.
We also don’t have to include the upper and lower bound on the slice.
+If we don’t include the lower bound, Python uses 0 by default; if we
+don’t include the upper, the slice runs to the end of the axis, and if
+we don’t include either (i.e., if we use ‘:’ on its own), the slice
+includes everything:
+
+
PYTHON
+
+
small = data[:3, 36:]
+print('small is:')
+print(small)
+
+
The above example selects rows 0 through 2 and columns 36 through to
+the end of the array.
+
+
OUTPUT
+
+
small is:
+[[ 2. 3. 0. 0.]
+ [ 1. 1. 0. 1.]
+ [ 2. 2. 1. 1.]]
+
+
diff --git a/instructor/04-lists.html b/instructor/04-lists.html
new file mode 100644
index 0000000..5ba60d6
--- /dev/null
+++ b/instructor/04-lists.html
@@ -0,0 +1,1107 @@
+
+Python for Official Statistics: List and Dictionary Methods
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
Understand the properties and behaviours of lists and
+dictionaries
+
Access values in lists and dictionaries
+
Create and access values from nest lists and dictionaries
+
+
+
+
+
+
Values can also be stored in other Python data types such as lists,
+dictionaries, sets and tuples. Storing objects in a list is a fast and
+versatile way to apply transformations across a sequence of values.
+Storing objects in dictionary as key-value pairs is useful for
+extracting specific values i.e. performing lookup operations.
+
Create and access lists
+
+
Lists have the following properties and behaviours:
+
A single list can store different primitive object types and even
+other lists
+
Lists are ordered and have a 0-based index
+
Lists can be appended to using the methods append() or
+insert()
+
+
Values inside a list can be removed using the methods
+remove() or pop()
+
+
Two lists can be concatenated with the operator +
+
+
Values inside a list can be conditionally iterated through
+
A list is mutable i.e. the values inside a list can be modified in
+place
+
To create a list, values are contained within square brackets
+i.e. [] and individually separated by commas. The function
+list() can also be used to create a list of values from an
+iterable object like a string, set or tuple.
+
+
PYTHON
+
+
# Create a list of integers using []
+list_1 = [1, 3, 5, 7]
+print(list_1)
+
+
+
OUTPUT
+
+
[1, 3, 5, 7]
+
+
+
PYTHON
+
+
# Unlike atomic vectors in R, a list can contain multiple primitive object types
+list_2 = [1, "one", 1.0, True]
+print(list_2)
+
+
+
OUTPUT
+
+
[1, 'one', 1.0, True]
+
+
+
PYTHON
+
+
# You can also use list() on an iterable object to convert it into a list
+string ='abcdefg'
+list_3 =list(string)
+print(list_3)
+
+
+
OUTPUT
+
+
['a', 'b', 'c', 'd', 'e', 'f', 'g']
+
+
Because lists have a 0-based index, we can access individual values
+by their list index position. For 0-based indexes, the first value
+always starts at position 0 i.e. the first element has an index of 0.
+Accessing multiple values by their index positions is also referred to
+as slicing or subsetting a list.
+
Note that we can use negative numbers as indices in Python. When we
+do so, the index -1 gives us the last element in the list,
+-2 gives us the second to last element in the list, and so
+on.
# A syntax quirk for slicing values is to +1 to the last value's index
+# To extract from index 0 to 2, we need to slice from [0:2+1] or [0:3]
+
+# Extract the first three values from list_3
+print('first 3 values:', list_3[0:3])
+
+# Start from index 0 and extract values from each subsequent second position
+print('every second value:', list_3[0::2])
+
+# Start from index 1, end at index 3 and extract from each subsequent second position
+print('every second value from index 1 to 3:', list_3[1:4:2])
+
+
+
OUTPUT
+
+
first 3 values: ['a', 'b', 'c']
+every second value: ['a', 'c', 'e', 'g']
+every second value from index 1 to 3: ['b', 'd']
+
+
Change list values
+
+
Data which can be modified in place is called mutable, while data
+which cannot be modified is called immutable. Strings and numbers are
+immutable in that when we want to change the value of a string or number
+variable, we can only replace the old value with a completely new
+value.
+
+
PYTHON
+
+
string ='abcde'
+string[0] ='b'# Produces a type error as strings are immutable
+
+# TypeError: 'str' object does not support item assignment
+
+
In contrast, lists are mutable and we can modify them after they have
+been created. We can change individual values, append new values, or
+reorder the whole list through sorting.
+
+
PYTHON
+
+
list_4 = ['apple', 'pear', 'plum']
+print('original list_4:', list_4)
+
+# Change the first value i.e. modify the list in place
+list_4[0] ='banana'
+print('modified list_4:', list_4)
+
+# Add new value to list using the method .insert(index number, value)
+list_4.insert(1, 'apple') # Index 1 refers to the second position
+print('appended list_4:', list_4)
# Sorting a list also modifies it in place
+list_5 = [2, 1, 3, 7]
+list_5.sort()
+print('list_5:', list_5)
+
+
+
OUTPUT
+
+
list_5: [1, 2, 3, 7]
+
+
However, be careful when modifying data in-place. If two variables
+refer to the same list, and you modify the list value, it will change
+for both variables!
+
+
PYTHON
+
+
# When we assign list_6 to list_5, it means both list_6 and list_5 point to the
+# same list object, not that list_6 is a copy of list_5.
+
+list_6 = list_5
+print('list_5:', list_5)
+print('list_6:', list_6)
+
+# Change the first value in list_6 from 1 to 2
+list_6[0] =2
+
+print('modified list_6:', list_6)
+print('unmodified list_5:', list_5)
+
+# Warning: list_5 and list_6 have both been modified in place!
Because of this behaviour, code which modifies data in place should
+be handled with care. You can also avoid this behaviour by expliciting
+creating a copy of the original list and modifying only the object copy.
+This is why creating a copy of the original data object can be useful in
+Python.
+
+
PYTHON
+
+
list_5 = [1, 2, 3, 7]
+list_7 = list_5.copy()
+print('list_5:', list_5)
+print('list_7:', list_7)
+
+# As list_7 is a completely new object copied from list_5, modifying list_7 does
+# not affect list_5.
+
+list_7[0] =2
+print('modified list_7:', list_7)
+print('unmodified list_5:', list_5)
There are a lot of functions and methods which can be applied to
+lists, such as len(), max(),
+index() and so forth. Mathematical operations do not work
+on lists of integers, with the exception of +.
+
Note that + concatenates two lists into a single longer
+list, rather than outputting the sum of two lists of numbers.
+
+
PYTHON
+
+
list_8 = [1, 2, 3]
+list_9 = [4, 5, 6]
+
+list_8 + list_9 # This concatenates the lists and does not sum the two lists together
+
+
+
OUTPUT
+
+
[1, 2, 3, 4, 5, 6]
+
+
In your spare time after this workshop, you can search for different
+list functions and methods and test them out yourselves.
+
Nested lists
+
+
We have previously mentioned that lists can be used to store other
+Python object types, including lists. This means that we can create
+nested lists in Python i.e. lists containing lists containing values.
+This property is useful when we have a collection of values that we want
+to access or transform as a subgroup.
+
To create a nested list, we also use [] or
+list() to contain one or more lists of values of
+interest.
+
+
PYTHON
+
+
veg_stock = [
+ ['lettuce', 'lettuce', 'tomato', 'zucchini'],
+ ['lettuce', 'lettuce', 'carrot', 'zucchini'],
+ ['lettuce', 'basil', 'tomato', 'zucchini']
+ ]
+
+# Check that veg_stock is a list object
+print(type(veg_stock))
+
+# Check that the first value in veg_stock is itself a list
+print(veg_stock[0], 'has type', type(veg_stock[0]))
+
+
+
OUTPUT
+
+
<class 'list'>
+['lettuce', 'lettuce', 'tomato', 'zucchini'] has type <class 'list'>
+
+
To extract the first sub-list within the veg_stock list
+object, we refer to its index like we would with any other value inside
+a list i.e. veg_stock[1] points to the second sub-list
+within the veg_stock list.
+
To access an individual string value inside a sub-list, we make use
+of a second index, which points to an individual value inside the
+sub-list.
+
+
PYTHON
+
+
print(veg_stock[0]) # Access the first sub-list
+print(veg_stock[0][0]) # Access the first value in the first sub-list
+
+print(type(veg_stock[0])) # The first value in veg_stock is a list
+print(type(veg_stock[0][0])) # The first value in the first list in veg_stock is a string
In general, however, when we are analysing a large collection of
+values, the best practice is to structure those values in columns and
+rows as a tabular Pandas data frame object. This is covered in another
+Carpentries Course called Python
+for Social Sciences.
+
Lists are still incredibly versatile and useful when you have a
+collection of values that need to be efficiently accessed or
+transformed. For example, data frame column names are commonly extracted
+and stored inside a list, so that the same transformation can then be
+mapped across multiple columns.
+
Create and access dictionaries
+
+
A dictionary is a Python data type that is particularly suited for
+enabling quick lookup operations on unstructured data sets.
+
A dictionary can therefore be thought of as an unordered list where
+every item or value is associated with a unique key (i.e. a self-defined
+index of unique strings or numbers). The index values are called keys
+and a dictionary contains key-value pairs with the format
+{key: value(s)}.
+
Dictionaries can be created by listing individual key-values pairs
+inside {} or using dict().
+
+
PYTHON
+
+
# A key-value pair can contain single or multiple values
+# Keys are treated as case sensitive and unique
+# Multiple values are first stored inside a list
+
+teams = {
+'data science': ['Mei Ling', 'Paul', 'Gwen', 'Suresh'],
+'user design': ['Amy', 'Linh', 'Sasha'],
+'software dev': ['David', 'Prya'],
+'comms': 'Taylor'
+ }
+
+
When using dict(), we need to indicate which key is
+associated with which value. This can be done directly using tuples,
+direct association i.e. using = or using
+zip(), which creates a set of tuples from an iterable
+list.
+
+
PYTHON
+
+
# To use dict(), key-value pairs are can be stored inside tuples
+ds_emp_status =dict([
+ ('Mei Ling', 'full time'),
+ ('Paul', 'full time'),
+ ('Gwen', 'part time'),
+ ('Suresh', 'part time')
+ ])
+
+# Key-value pairs can also be assigned by direct association
+# Keys cannot be strings i.e. wrapped in '' using this approach
+ud_emp_status =dict(
+ Amy ='full time',
+ Linh ='full time',
+ Sasha ='casual'
+ )
+
+# zip() can also be used if each key has only one value
+sd_emp_status =dict(zip(
+ ['David', 'Prya'],
+ ['full time', 'full time']
+ ))
+
+
To access a specific value inside a dictionary, we need to specify
+its key using []. This is similar to slicing or subsetting
+a list by specifying its index using [].
+
+
PYTHON
+
+
# Access the values associated with the key 'data science'
+print(teams['data science'])
+
+print('The object teams is of type', type(teams))
+print('The dict value', teams['data science'], 'is of type', type(teams['data science']))
+
+
+
OUTPUT
+
+
['Mei Ling', 'Paul', 'Gwen', 'Suresh']
+The data object teams is of type <class 'dict'>
+The value ['Mei Ling', 'Paul', 'Gwen', 'Suresh'] is of type <class 'list'>
+
+
We can also access a value from a dictionary using the
+get() method.
+
+
PYTHON
+
+
print(teams.get('user design'))
+
+# get() also enables us to return an alternate string when the key is not found
+# This prevents our code from returning an error message that halts the analysis
+
+print(teams.get('data engineering', 'WARNING: key does not exist'))
+
+
+
OUTPUT
+
+
['Amy', 'Linh', 'Sasha']
+WARNING: key does not exist
+
+
To access data inside a dictionary, we can also perform the following
+other actions:
+
Check whether a key exists in a dictionary using the keyword
+in
+
+
Retrieve unique dictionary keys using dict.keys()
+
+
Retrieve dictionary values using dict.values()
+
+
Retrieve dictionary items using dict.items()
+
+
+
PYTHON
+
+
# Check whether a key exists in a dictionary
+print('data science'in teams)
+print('Data Science'in teams) # Keys are case sensitive
+
+# Retrieve all dictionary keys
+print(teams.keys())
+print(sd_emp_status.keys())
+
+# Retrieve all dictionary values
+print(sd_emp_status.values())
+
+# Retrieve all dictionary key-value pairs
+print(sd_emp_status.items())
To add a new key-value pair to an existing dictionary, we can create
+a new key and directly attach a new value to it using = or
+alternatively use the method update().
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# Add new key-value pair using direct assignment
+sd_emp_status['Mohammad'] ='full time'
+
+# Add new key-value pair using update({'key': 'value'})
+sd_emp_status.update({'Carrie': 'part time'})
+
+print('updated dict items:', sd_emp_status.items())
Because keys are unique, a dictionary cannot contain two keys with
+the same name. This means that adding an item using a key that is
+already present in the dictionary will cause the previous value to be
+overwritten.
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# As the key 'Carrie' already exists, its value will be overwritten
+sd_emp_status['Carrie'] ='full time'
+print('updated dict items:', sd_emp_status.items())
To remove a key-value pair for an existing dictionary, we can use the
+del keyword or the method pop(). Using
+pop() also enables us to return an alternate string if we
+trt to remove a non-existing key, which prevents our code from returning
+an error message that halts the analysis.
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# Delete dictionary keys using del and pop()
+del sd_emp_status['Mohammad']
+sd_emp_status.pop('Carrie')
+sd_emp_status.pop('Anuradha', 'WARNING: key does not exist') # Does not generate an error
+
+print('modified dict items:', sd_emp_status.items())
Similar to lists, dictionaries can be nested as we can also store
+dictionaries as values inside a key-value pair using {}.
+Nested dictionaries are useful when we need to store unstructured data
+in a complex structure. For example, JSON data is commonly used for
+transmitting data in web applications and often exists in a nested
+structure that can be stored using nested dictionaries in Python.
+
+
PYTHON
+
+
# Individual dictionaries are enclosed in {} and separated by a comma
+nested_dict = {
+'dict_1': { # First key is a dictionary of key-value pairs
+'key_1a': 'value_1a',
+'key_1b': 'value_1b'
+ },
+'dict_2': { # Second key is another dictionary of key-value pairs
+'key_2a': 'value_2a',
+'key_2b': 'value_2b'
+ }
+ }
+
+print(nested_dict)
Similar to working with nested lists, to extract a value from the
+first sub-dictionary, we specify both the main dictionary and
+sub-dictionary keys using [].
+
+
PYTHON
+
+
# Extract the value for key 2a in dict_2
+print('original value:', nested_dict['dict_2']['key_2a'])
+
+# Adding or updating a value can be done through the same approach
+nested_dict['dict_2']['key_2a'] ="modified_value_2a"
+
+print('modified value:', nested_dict['dict_2']['key_2a'])
+
+
+
OUTPUT
+
+
original value: value_2a
+modified value: modified_value_2a
+
+
Optional: converting lists and dictionaries to Pandas data
+frames
+
+
Lists and dictionaries can be easily converted into a tabular Pandas
+data frame format. This can be useful when you need to create a small
+data set for unit testing purposes.
+
+
PYTHON
+
+
# Import pandas library
+import pandas as pd
+
+# Create a dictionary with each key-value pair representing a data frame column
+data = {
+'col_1': [3, 2, 1, 0],
+'col_2': ['a', 'b', 'c', 'd']
+ }
+
+df = pd.DataFrame.from_dict(data)
+
+print(df) # Outputs data as a tabular Pandas data frame
+print(type(df))
+
+
+
OUTPUT
+
+
col_1 col_2
+0 3 a
+1 2 b
+2 1 c
+3 0 d
+<class 'pandas.core.frame.DataFrame'>
+
+
+
+
+
+
+
Key Points
+
+
+
Lists can contain any Python object including other lists
+
Lists are ordered i.e. indexed and can therefore be sliced by index
+number
+
Unlike strings and integers, the values inside a list can be
+modified in place
+
A list which contains other lists is referred to as a nested
+list
+
Dictionaries behave like unordered lists and are defined using
+key-value pairs
+
Dictionary keys are unique
+
A dictionary which contains other dictionaries is referred to as a
+nested dictionary
+
Values inside nested lists and dictionaries can be accessed by an
+additional index
+
+
diff --git a/instructor/05-loops.html b/instructor/05-loops.html
new file mode 100644
index 0000000..cfd400f
--- /dev/null
+++ b/instructor/05-loops.html
@@ -0,0 +1,1593 @@
+
+Python for Official Statistics: Loops and Conditional Logic
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
In the episode about visualizing
+data, we will see Python code that plots values of interest from our
+first inflammation dataset (inflammation-01.csv), which
+revealed some suspicious features.
+
We have a dozen data sets right now and potentially more on the way
+if Dr. Maverick can keep up their surprisingly fast clinical trial rate.
+We want to create plots for all of our data sets with a single
+statement. To do that, we’ll have to teach the computer how to repeat
+things.
+
An example task that we might want to repeat is accessing numbers in
+a list, which we will do by printing each number on a line of its
+own.
+
+
PYTHON
+
+
odds = [1, 3, 5, 7]
+
+
In Python, a list is basically an ordered
+collection of elements, and every element has a unique number associated
+with it — its index. This means that we can access elements in a list
+using their indices. For example, we can get the first number in the
+list odds, by using odds[0]. One way to print
+each number is to use four print statements:
Not scalable. Imagine you need to print a list
+that has hundreds of elements. It might be easier to type them in
+manually.
+
Difficult to maintain. If we want to decorate
+each printed element with an asterisk or any other character, we would
+have to change four lines of code. While this might not be a problem for
+small lists, it would definitely be a problem for longer ones.
+
Fragile. If we use it with a list that has more
+elements than what we initially envisioned, it will only display part of
+the list’s elements. A shorter list, on the other hand, will cause an
+error because it will be trying to display elements of the list that do
+not exist.
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-3-7974b6cdaf14> in <module>()
+ 3 print(odds[1])
+ 4 print(odds[2])
+----> 5 print(odds[3])
+
+IndexError: list index out of range
This is shorter — certainly shorter than something that prints every
+number in a hundred-number list — and more robust as well:
+
+
PYTHON
+
+
odds = [1, 3, 5, 7, 9, 11]
+for num in odds:
+print(num)
+
+
+
OUTPUT
+
+
1
+3
+5
+7
+9
+11
+
+
The improved version uses a for
+loop to repeat an operation — in this case, printing — once for each
+thing in a sequence. The general form of a loop is:
+
+
PYTHON
+
+
for variable in collection:
+# do things using variable, such as print
+
+
Using the odds example above, the loop might look like this:
+
where each number (num) in the variable
+odds is looped through and printed one number after
+another. The other numbers in the diagram denote which loop cycle the
+number was printed in (1 being the first loop cycle, and 6 being the
+final loop cycle).
+
We can call the loop
+variable anything we like, but there must be a colon at the end of
+the line starting the loop, and we must indent anything we want to run
+inside the loop. Unlike many other languages, there is no command to
+signify the end of the loop body (e.g., end for);
+everything indented after the for statement belongs to the
+loop.
+
+
+
+
+
+
What’s in a name?
+
+
+
In the example above, the loop variable was given the name
+num as a mnemonic; it is short for ‘number’. We can choose
+any name we want for variables. We might just as easily have chosen the
+name banana for the loop variable, as long as we use the
+same name when we invoke the variable inside the loop:
It is a good idea to choose variable names that are meaningful,
+otherwise it would be more difficult to understand what the loop is
+doing.
+
+
+
+
Here’s another loop that repeatedly updates a variable:
+
+
PYTHON
+
+
length =0
+names = ['Curie', 'Darwin', 'Turing']
+for value in names:
+ length = length +1
+print('There are', length, 'names in the list.')
+
+
+
OUTPUT
+
+
There are 3 names in the list.
+
+
It’s worth tracing the execution of this little program step by step.
+Since there are three names in names, the statement on line
+4 will be executed three times. The first time around,
+length is zero (the value assigned to it on line 1) and
+value is Curie. The statement adds 1 to the
+old value of length, producing 1, and updates
+length to refer to that new value. The next time around,
+value is Darwin and length is 1,
+so length is updated to be 2. After one more update,
+length is 3; since there is nothing left in
+names for Python to process, the loop finishes and the
+print function on line 5 tells us our final answer.
+
Note that a loop variable
+is a variable that is being used to record progress in a loop. It still
+exists after the loop is over, and we can re-use variables previously
+defined as loop variables as
+well:
+
+
PYTHON
+
+
name ='Rosalind'
+for name in ['Curie', 'Darwin', 'Turing']:
+print(name)
+print('after the loop, name is', name)
+
+
+
OUTPUT
+
+
Curie
+Darwin
+Turing
+after the loop, name is Turing
+
+
Note also that finding the length of an object is such a common
+operation that Python actually has a built-in function to do it called
+len:
+
+
PYTHON
+
+
print(len([0, 1, 2, 3]))
+
+
+
OUTPUT
+
+
4
+
+
len is much faster than any function we could write
+ourselves, and much easier to read than a two-line loop; it will also
+give us the length of many other data types we haven’t seen yet, so we
+should always use it when we can.
+
+
+
+
+
+
From 1 to N
+
+
+
Python has a built-in function called range that
+generates a sequence of numbers range can accept 1, 2, or 3
+parameters.
+
If one parameter is given, range generates a sequence
+of that length, starting at zero and incrementing by 1. For example,
+range(3) produces the numbers 0, 1, 2.
+
If two parameters are given, range starts at the first
+and ends just before the second, incrementing by one. For example,
+range(2, 5) produces 2, 3, 4.
+
If range is given 3 parameters, it starts at the first
+one, ends just before the second one, and increments by the third one.
+For example, range(3, 10, 2) produces
+3, 5, 7, 9.
+
Using range, write a loop that uses range
+to print the first 3 natural numbers:
+
+
OUTPUT
+
+
1
+2
+3
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
for number inrange(1, 4):
+print(number)
+
+
+
+
+
+
+
+
+
+
+
Understanding the loops
+
+
+
Given the following loop:
+
+
PYTHON
+
+
word ='oxygen'
+for letter in word:
+print(letter)
+
+
How many times is the body of the loop executed?
+
3 times
+
4 times
+
5 times
+
6 times
+
+
+
+
+
+
+
+
+
The body of the loop is executed 6 times.
+
+
+
+
+
+
+
+
+
+
Computing Powers With Loops
+
+
+
Exponentiation is built into Python:
+
+
PYTHON
+
+
print(5**3)
+
+
+
OUTPUT
+
+
125
+
+
Write a loop that calculates the same result as 5 ** 3
+using multiplication (and without exponentiation).
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
result =1
+for number inrange(0, 3):
+ result = result *5
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Summing a List
+
+
+
Write a loop that calculates the sum of elements in a list by adding
+each element and printing the final value, so
+[124, 402, 36] prints 562
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
numbers = [124, 402, 36]
+summed =0
+for num in numbers:
+ summed = summed + num
+print(summed)
+
+
+
+
+
+
+
+
+
+
+
Computing the Value of a Polynomial
+
+
+
The built-in function enumerate takes a sequence (e.g.,
+a list) and generates a new sequence of the
+same length. Each element of the new sequence is a pair composed of the
+index (0, 1, 2,…) and the value from the original sequence:
+
+
PYTHON
+
+
for idx, val inenumerate(a_list):
+# Do something using idx and val
+
+
The code above loops through a_list, assigning the index
+to idx and the value to val.
+
Suppose you have encoded a polynomial as a list of coefficients in
+the following way: the first element is the constant term, the second
+element is the coefficient of the linear term, the third is the
+coefficient of the quadratic term, etc.
Write a loop using enumerate(coefs) which computes the
+value y of any polynomial, given x and
+coefs.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
y =0
+for idx, coef inenumerate(coefs):
+ y = y + coef * x**idx
+
+
+
+
+
+
Making Choices with Conditional Logic
+
+
How can we use Python to automatically recognize different situations
+we encounter with our data and take a different action for each? In this
+lesson, we’ll learn how to write code that runs only when certain
+conditions are true.
+
+
Conditionals
+
We can ask Python to take different actions, depending on a
+condition, with an if statement:
+
+
PYTHON
+
+
num =37
+if num >100:
+print('greater')
+else:
+print('not greater')
+print('done')
+
+
+
OUTPUT
+
+
not greater
+done
+
+
The second line of this code uses the keyword if to tell
+Python that we want to make a choice. If the test that follows the
+if statement is true, the body of the if
+(i.e., the set of lines indented underneath it) is executed, and
+“greater” is printed. If the test is false, the body of the
+else is executed instead, and “not greater” is printed.
+Only one or the other is ever executed before continuing on with program
+execution to print “done”:
+
Conditional
+statements don’t have to include an else. If there
+isn’t one, Python simply does nothing if the test is false:
+
+
PYTHON
+
+
num =53
+print('before conditional...')
+if num >100:
+print(num, 'is greater than 100')
+print('...after conditional')
+
+
+
OUTPUT
+
+
before conditional...
+...after conditional
+
+
We can also chain several tests together using elif,
+which is short for “else if”. The following Python code uses
+elif to print the sign of a number.
+
+
PYTHON
+
+
num =-3
+
+if num >0:
+print(num, 'is positive')
+elif num ==0:
+print(num, 'is zero')
+else:
+print(num, 'is negative')
+
+
+
OUTPUT
+
+
-3 is negative
+
+
Note that to test for equality we use a double equals sign
+== rather than a single equals sign = which is
+used to assign values.
+
+
+
+
+
+
Comparing in Python
+
+
+
Along with the > and == operators we
+have already used for comparing values in our conditionals, there are a
+few more options to know about:
+
+>: greater than
+
+<: less than
+
+==: equal to
+
+!=: does not equal
+
+>=: greater than or equal to
+
+<=: less than or equal to
+
+
+
+
We can also combine tests using and and or.
+and is only true if both parts are true:
+
+
PYTHON
+
+
if (1>0) and (-1>=0):
+print('both parts are true')
+else:
+print('at least one part is false')
+
+
+
OUTPUT
+
+
at least one part is false
+
+
while or is true if at least one part is true:
+
+
PYTHON
+
+
if (1<0) or (1>=0):
+print('at least one test is true')
+
+
+
OUTPUT
+
+
at least one test is true
+
+
+
+
+
+
+
+True and False
+
+
+
True and False are special words in Python
+called booleans, which represent truth values. A statement
+such as 1 < 0 returns the value False,
+while -1 < 0 returns the value True.
+
+
+
+
+
+
Checking Our Data
+
Now that we’ve seen how conditionals work, we can use them to check
+for the suspicious features we saw in our inflammation data. We are
+about to use functions provided by the numpy module again.
+Therefore, if you’re working in a new Python session, make sure to load
+the module with:
+
+
PYTHON
+
+
import numpy
+
+
From the first couple of plots, we saw that maximum daily
+inflammation exhibits a strange behavior and raises one unit a day.
+Wouldn’t it be a good idea to detect such behavior and report it as
+suspicious? Let’s do that! However, instead of checking every single day
+of the study, let’s merely check if maximum inflammation in the
+beginning (day 0) and in the middle (day 20) of the study are equal to
+the corresponding day numbers.
We also saw a different problem in the third dataset; the minima per
+day were all zero (looks like a healthy person snuck into our study). We
+can also check for this with an elif condition:
+
+
PYTHON
+
+
elif numpy.sum(numpy.amin(data, axis=0)) ==0:
+print('Minima add up to zero!')
+
+
And if neither of these conditions are true, we can use
+else to give the all-clear:
In this way, we have asked Python to do something different depending
+on the condition of our data. Here we printed messages in all cases, but
+we could also imagine not using the else catch-all so that
+messages are only printed when something is wrong, freeing us from
+having to manually examine every plot for features we’ve seen
+before.
Which of the following would be printed if you were to run this code?
+Why did you pick this answer?
+
A
+
B
+
C
+
B and C
+
+
+
+
+
+
+
+
+
C gets printed because the first two conditions,
+4 > 5 and 4 == 5, are not true, but
+4 < 5 is true. In this case, only one of these
+conditions can be true for at a time, but in other scenarios multiple
+elif conditions could be met. In these scenarios, only the
+action associated with the first true elif condition will
+occur, starting from the top of the conditional section.
+
This contrasts with the case of multiple if statements,
+where every action can occur as long as their condition is met.
+
+
+
+
+
+
+
+
+
+
What Is Truth?
+
+
+
True and False booleans are not the only
+values in Python that are true and false. In fact, any value
+can be used in an if or elif. After reading
+and running the code below, explain what the rule is for which values
+are considered true and which are > considered false.
+
+
PYTHON
+
+
if'':
+print('empty string is true')
+if'word':
+print('word is true')
+if []:
+print('empty list is true')
+if [1, 2, 3]:
+print('non-empty list is true')
+if0:
+print('zero is true')
+if1:
+print('one is true')
+
+
+
+
+
+
+
+
+
+
That’s Not Not What I Meant
+
+
+
Sometimes it is useful to check whether some condition is
+not true. The Boolean operator not can do this
+explicitly. After reading and running the code below, write some
+if statements that use not to test the rule
+that you formulated in the previous challenge.
+
+
PYTHON
+
+
ifnot'':
+print('empty string is not true')
+ifnot'word':
+print('word is not true')
+ifnotnotTrue:
+print('not not True is true')
+
+
+
+
+
+
+
+
+
+
Close Enough
+
+
+
Write some conditions that print True if the variable
+a is within 10% of the variable b and
+False otherwise. Compare your implementation with your
+partner’s. Do you get the same answer for all possible pairs of
+numbers?
a =5
+b =5.1
+
+ifabs(a - b) <=0.1*abs(b):
+print('True')
+else:
+print('False')
+
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
print(abs(a - b) <=0.1*abs(b))
+
+
This works because the Booleans True and
+False have string representations which can be printed.
+
+
+
+
+
+
+
+
+
+
In-Place Operators
+
+
+
Python (and most other languages in the C family) provides in-place operators that
+work like this:
+
+
PYTHON
+
+
x =1# original value
+x +=1# add one to x, assigning result back to x
+x *=3# multiply x by 3
+print(x)
+
+
+
OUTPUT
+
+
6
+
+
Write some code that sums the positive and negative numbers in a list
+separately, using in-place operators. Do you think the result is more or
+less readable than writing the same without in-place operators?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
positive_sum =0
+negative_sum =0
+test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
+for num in test_list:
+if num >0:
+ positive_sum += num
+elif num ==0:
+pass
+else:
+ negative_sum += num
+print(positive_sum, negative_sum)
+
+
Here pass means “don’t do anything”. In this particular
+case, it’s not actually needed, since if num == 0 neither
+sum needs to change, but it illustrates the use of elif and
+pass.
+
+
+
+
+
+
+
+
+
+
Sorting a List Into Buckets
+
+
+
In our data folder, large data sets are stored in files
+whose names start with “inflammation-” and small data sets – in files
+whose names start with “small-”. We also have some other files that we
+do not care about at this point. We’d like to break all these files into
+three lists called large_files, small_files,
+and other_files, respectively.
+
Add code to the template below to do this. Note that the string
+method startswith
+returns True if and only if the string it is called on
+starts with the string passed as an argument, that is:
+
+
PYTHON
+
+
'String'.startswith('Str')
+
+
+
OUTPUT
+
+
True
+
+
But
+
+
PYTHON
+
+
'String'.startswith('str')
+
+
+
OUTPUT
+
+
False
+
+
Use the following Python code as your starting point:
Write a loop that counts the number of vowels in a character
+string.
+
Test it on a few individual words and full sentences.
+
Once you are done, compare your solution to your neighbor’s. Did you
+make the same decisions about how to handle the letter ‘y’ (which some
+people think is a vowel, and some do not)?
+
+
Solution
+
vowels = 'aeiouAEIOU'
+sentence = 'Mary had a little lamb.'
+count = 0
+for char in sentence:
+ if char in vowels:
+ count += 1
+
+print('The number of vowels in this string is ' + str(count))
+
{.challenge}
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Use for variable in sequence to process the elements of
+a sequence one at a time.
+
The body of a for loop must be indented.
+
Use len(thing) to determine the length of something
+that contains other values.
+
Use if condition to start a conditional statement,
+elif condition to provide additional tests, and
+else to provide a default.
+
The bodies of the branches of conditional statements must be
+indented.
+
Use == to test for equality.
+
+X and Y is only true if both X and
+Y are true.
+
+X or Y is true if either X or
+Y, or both, are true.
+
Zero, the empty string, and the empty list are considered false; all
+other numbers, strings, and lists are considered true.
+
+
diff --git a/instructor/06-alternative_loops.html b/instructor/06-alternative_loops.html
new file mode 100644
index 0000000..09acb1d
--- /dev/null
+++ b/instructor/06-alternative_loops.html
@@ -0,0 +1,491 @@
+
+Python for Official Statistics: Alternatives to Loops
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
What are functions, and how can I use them in Python?
+
How can I define new functions?
+
What’s the difference between defining and calling a function?
+
What happens when I call a function?
+
+
+
+
+
+
+
Objectives
+
identify what a function is
+
create new functions
+
Set default values for function parameters.
+
Explain why we should divide programs into small, single-purpose
+functions.
+
+
+
+
+
+
At this point, we’ve seen that code can have Python make decisions
+about what it sees in our data. What if we want to convert some of our
+data, like taking a temperature in Fahrenheit and converting it to
+Celsius. We could write something like this for converting a single
+number
But we would be in trouble as soon as we had to do this more than a
+couple times. Cutting and pasting it is going to make our code get very
+long and very repetitive, very quickly. We’d like a way to package our
+code so that it is easier to reuse, a shorthand way of re-executing
+longer pieces of code. In Python we can use ‘functions’. Let’s start by
+defining a function fahr_to_celsius that converts
+temperatures from Fahrenheit to Celsius:
+
+
PYTHON
+
+
def explicit_fahr_to_celsius(temp):
+# Assign the converted value to a variable
+ converted = ((temp -32) * (5/9))
+# Return the value of the new variable
+return converted
+
+def fahr_to_celsius(temp):
+# Return converted value more efficiently using the return
+# function without creating a new variable. This code does
+# the same thing as the previous function but it is more explicit
+# in explaining how the return command works.
+return ((temp -32) * (5/9))
+
+
The function definition opens with the keyword def
+followed by the name of the function (fahr_to_celsius) and
+a parenthesized list of parameter names (temp). The body of the function — the statements
+that are executed when it runs — is indented below the definition line.
+The body concludes with a return keyword followed by the
+return value.
+
When we call the function, the values we pass to it are assigned to
+those variables so that we can use them inside the function. Inside the
+function, we use a return
+statement to send a result back to whoever asked for it.
+
Let’s try running our function.
+
+
PYTHON
+
+
fahr_to_celsius(32)
+
+
This command should call our function, using “32” as the input and
+return the function value.
+
In fact, calling our own function is no different from calling any
+other function:
+
+
PYTHON
+
+
print('freezing point of water:', fahr_to_celsius(32), 'C')
+print('boiling point of water:', fahr_to_celsius(212), 'C')
+
+
+
OUTPUT
+
+
freezing point of water: 0.0 C
+boiling point of water: 100.0 C
+
+
We’ve successfully called the function that we defined, and we have
+access to the value that we returned.
+
Composing Functions
+
+
Now that we’ve seen how to turn Fahrenheit into Celsius, we can also
+write the function to turn Celsius into Kelvin:
+
+
PYTHON
+
+
def celsius_to_kelvin(temp_c):
+return temp_c +273.15
+
+print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))
+
+
+
OUTPUT
+
+
freezing point of water in Kelvin: 273.15
+
+
What about converting Fahrenheit to Kelvin? We could write out the
+formula, but we don’t need to. Instead, we can compose the two functions we have
+already created:
+
+
PYTHON
+
+
def fahr_to_kelvin(temp_f):
+ temp_c = fahr_to_celsius(temp_f)
+ temp_k = celsius_to_kelvin(temp_c)
+return temp_k
+
+print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))
+
+
+
OUTPUT
+
+
boiling point of water in Kelvin: 373.15
+
+
This is our first taste of how larger programs are built: we define
+basic operations, then combine them in ever-larger chunks to get the
+effect we want. Real-life functions will usually be larger than the ones
+shown here — typically half a dozen to a few dozen lines — but they
+shouldn’t ever be much longer than that, or the next person who reads it
+won’t be able to understand what’s going on.
+
Variable Scope
+
+
In composing our temperature conversion functions, we created
+variables inside of those functions, temp,
+temp_c, temp_f, and temp_k. We
+refer to these variables as local variables because they no
+longer exist once the function is done executing. If we try to access
+their values outside of the function, we will encounter an error:
+
+
PYTHON
+
+
print('Again, temperature in Kelvin was:', temp_k)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-1-eed2471d229b> in <module>
+----> 1 print('Again, temperature in Kelvin was:', temp_k)
+
+NameError: name 'temp_k' is not defined
+
+
If you want to reuse the temperature in Kelvin after you have
+calculated it with fahr_to_kelvin, you can store the result
+of the function call in a variable:
+
+
PYTHON
+
+
temp_kelvin = fahr_to_kelvin(212.0)
+print('temperature in Kelvin was:', temp_kelvin)
+
+
+
OUTPUT
+
+
temperature in Kelvin was: 373.15
+
+
The variable temp_kelvin, being defined outside any
+function, is said to be global.
+
Inside a function, one can read the value of such global
+variables:
+
+
PYTHON
+
+
def print_temperatures():
+print('temperature in Fahrenheit was:', temp_fahr)
+print('temperature in Kelvin was:', temp_kelvin)
+
+temp_fahr =212.0
+temp_kelvin = fahr_to_kelvin(temp_fahr)
+
+print_temperatures()
+
+
+
OUTPUT
+
+
temperature in Fahrenheit was: 212.0
+temperature in Kelvin was: 373.15
+
+
By giving our functions human-readable names, we can more easily read
+and understand what is happening in the for loop. Even
+better, if at some later date we want to use either of those pieces of
+code again, we can do so in a single line.
+
Testing and Documenting
+
+
Once we start putting things in functions so that we can re-use them,
+we need to start testing that those functions are working correctly. To
+see how to do this, let’s write a function to offset a dataset so that
+it’s mean value shifts to a user-defined value:
We could test this on our actual data, but since we don’t know what
+the values ought to be, it will be hard to tell if the result was
+correct. Instead, let’s use NumPy to create a matrix of 0’s and then
+offset its values to have a mean value of 3:
+
+
PYTHON
+
+
z = numpy.zeros((2,2))
+print(offset_mean(z, 3))
+
+
+
OUTPUT
+
+
[[ 3. 3.]
+ [ 3. 3.]]
+
+
That looks right, so let’s try offset_mean on our real
+data:
+
+
PYTHON
+
+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print(offset_mean(data, 0))
It’s hard to tell from the default output whether the result is
+correct, but there are a few tests that we can run to reassure us:
+
+
PYTHON
+
+
print('original min, mean, and max are:', numpy.amin(data), numpy.mean(data), numpy.amax(data))
+offset_data = offset_mean(data, 0)
+print('min, mean, and max of offset data are:',
+ numpy.amin(offset_data),
+ numpy.mean(offset_data),
+ numpy.amax(offset_data))
+
+
+
OUTPUT
+
+
original min, mean, and max are: 0.0 6.14875 20.0
+min, mean, and and max of offset data are: -6.14875 2.84217094304e-16 13.85125
+
+
That seems almost right: the original mean was about 6.1, so the
+lower bound from zero is now about -6.1. The mean of the offset data
+isn’t quite zero — we’ll explore why not in the challenges — but it’s
+pretty close. We can even go further and check that the standard
+deviation hasn’t changed:
+
+
PYTHON
+
+
print('std dev before and after:', numpy.std(data), numpy.std(offset_data))
+
+
+
OUTPUT
+
+
std dev before and after: 4.61383319712 4.61383319712
+
+
Those values look the same, but we probably wouldn’t notice if they
+were different in the sixth decimal place. Let’s do this instead:
+
+
PYTHON
+
+
print('difference in standard deviations before and after:',
+ numpy.std(data) - numpy.std(offset_data))
+
+
+
OUTPUT
+
+
difference in standard deviations before and after: -3.5527136788e-15
+
+
Again, the difference is very small. It’s still possible that our
+function is wrong, but it seems unlikely enough that we should probably
+get back to doing our analysis.
+
Documentation
+
+
We have one more task first, though: we should write some documentation for our function
+to remind ourselves later what it’s for and how to use it.
+
The usual way to put documentation in software is to add comments like this:
+
+
PYTHON
+
+
# offset_mean(data, target_mean_value):
+# return a new array containing the original data with its mean offset to match the desired value.
+def offset_mean(data, target_mean_value):
+return (data - numpy.mean(data)) + target_mean_value
+
+
There’s a better way, though. If the first thing in a function is a
+string that isn’t assigned to a variable, that string is attached to the
+function as its documentation:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value."""
+return (data - numpy.mean(data)) + target_mean_value
+
+
This is better because we can now ask Python’s built-in help system
+to show us the documentation for the function:
+
+
PYTHON
+
+
help(offset_mean)
+
+
+
OUTPUT
+
+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+ Return a new array containing the original data with its mean offset to match the desired value.
+
+
A string like this is called a docstring. We don’t need to use
+triple quotes when we write one, but if we do, we can break the string
+across multiple lines:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value.
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3], 0)
+ array([-1., 0., 1.])
+ """
+return (data - numpy.mean(data)) + target_mean_value
+
+help(offset_mean)
+
+
+
OUTPUT
+
+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+ Return a new array containing the original data
+ with its mean offset to match the desired value.
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3], 0)
+ array([-1., 0., 1.])
+
+
Defining Defaults
+
+
We have passed parameters to functions in two ways: directly, as in
+type(data), and by name, as in
+numpy.loadtxt(fname='something.csv', delimiter=','). In
+fact, we can pass the filename to loadtxt without the
+fname=:
Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1041, in loa
+dtxt
+ dtype = np.dtype(dtype)
+ File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/core/_internal.py", line 199, in
+_commastring
+ newitem = (dtype, eval(repeats))
+ File "<string>", line 1
+ ,
+ ^
+SyntaxError: unexpected EOF while parsing
+
+
To understand what’s going on, and make our own functions easier to
+use, let’s re-define our offset_mean function like
+this:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value=0.0):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value, (0 by default).
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3])
+ array([-1., 0., 1.])
+ """
+return (data - numpy.mean(data)) + target_mean_value
+
+
The key change is that the second parameter is now written
+target_mean_value=0.0 instead of just
+target_mean_value. If we call the function with two
+arguments, it works as it did before:
But we can also now call it with just one parameter, in which case
+target_mean_value is automatically assigned the default value of 0.0:
+
+
PYTHON
+
+
more_data =5+ numpy.zeros((2, 2))
+print('data before mean offset:')
+print(more_data)
+print('offset data:')
+print(offset_mean(more_data))
+
+
+
OUTPUT
+
+
data before mean offset:
+[[ 5. 5.]
+ [ 5. 5.]]
+offset data:
+[[ 0. 0.]
+ [ 0. 0.]]
+
+
This is handy: if we usually want a function to work one way, but
+occasionally need it to do something else, we can allow people to pass a
+parameter when they need to but provide a default to make the normal
+case easier. The example below shows how Python matches values to
+parameters:
As this example shows, parameters are matched up from left to right,
+and any that haven’t been given a value explicitly get their default
+value. We can override this behavior by naming the value as we pass it
+in:
+
+
PYTHON
+
+
print('only setting the value of c')
+display(c=77)
+
+
+
OUTPUT
+
+
only setting the value of c
+a: 1 b: 2 c: 77
+
+
With that in hand, let’s look at the help for
+numpy.loadtxt:
+
+
PYTHON
+
+
help(numpy.loadtxt)
+
+
+
OUTPUT
+
+
Help on function loadtxt in module numpy.lib.npyio:
+
+loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+ Load data from a text file.
+
+ Each row in the text file must have the same number of values.
+
+ Parameters
+ ----------
+...
+
+
There’s a lot of information here, but the most important part is the
+first couple of lines:
This tells us that loadtxt has one parameter called
+fname that doesn’t have a default value, and eight others
+that do. If we call the function like this:
+
+
PYTHON
+
+
numpy.loadtxt('inflammation-01.csv', ',')
+
+
then the filename is assigned to fname (which is what we
+want), but the delimiter string ',' is assigned to
+dtype rather than delimiter, because
+dtype is the second parameter in the list. However
+',' isn’t a known dtype so our code produced
+an error message when we tried to run it. When we call
+loadtxt we don’t have to provide fname= for
+the filename because it’s the first item in the list, but if we want the
+',' to be assigned to the variable delimiter,
+we do have to provide delimiter= for the second
+parameter since delimiter is not the second parameter in
+the list.
+
Readable functions
+
+
Consider these two functions:
+
+
PYTHON
+
+
def s(p):
+ a =0
+for v in p:
+ a += v
+ m = a /len(p)
+ d =0
+for v in p:
+ d += (v - m) * (v - m)
+return numpy.sqrt(d / (len(p) -1))
+
+def std_dev(sample):
+ sample_sum =0
+for value in sample:
+ sample_sum += value
+
+ sample_mean = sample_sum /len(sample)
+
+ sum_squared_devs =0
+for value in sample:
+ sum_squared_devs += (value - sample_mean) * (value - sample_mean)
+
+return numpy.sqrt(sum_squared_devs / (len(sample) -1))
+
+
The functions s and std_dev are
+computationally equivalent (they both calculate the sample standard
+deviation), but to a human reader, they look very different. You
+probably found std_dev much easier to read and understand
+than s.
+
As this example illustrates, both documentation and a programmer’s
+coding style combine to determine how easy it is for others to
+read and understand the programmer’s code. Choosing meaningful variable
+names and using blank spaces to break the code into logical “chunks” are
+helpful techniques for producing readable code. This is useful
+not only for sharing code with others, but also for the original
+programmer. If you need to revisit code that you wrote months ago and
+haven’t thought about since then, you will appreciate the value of
+readable code!
+
+
+
+
+
+
Combining Strings
+
+
+
“Adding” two strings produces their concatenation:
+'a' + 'b' is 'ab'. Write a function called
+fence that takes two parameters called
+original and wrapper and returns a new string
+that has the wrapper character at the beginning and end of the original.
+A call to your function should look like this:
+
+
PYTHON
+
+
print(fence('name', '*'))
+
+
+
OUTPUT
+
+
*name*
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def fence(original, wrapper):
+return wrapper + original + wrapper
+
+
+
+
+
+
+
+
+
+
+
Return versus print
+
+
+
Note that return and print are not
+interchangeable. print is a Python function that
+prints data to the screen. It enables us, users, see
+the data. return statement, on the other hand, makes data
+visible to the program. Let’s have a look at the following function:
+
+
PYTHON
+
+
def add(a, b):
+print(a + b)
+
+
Question: What will we see if we execute the
+following commands?
+
+
PYTHON
+
+
A = add(7, 3)
+print(A)
+
+
+
+
+
+
+
+
+
+
Python will first execute the function add with
+a = 7 and b = 3, and, therefore, print
+10. However, because function add does not
+have a line that starts with return (no return
+“statement”), it will, by default, return nothing which, in Python
+world, is called None. Therefore, A will be
+assigned to None and the last line (print(A))
+will print None. As a result, we will see:
+
+
OUTPUT
+
+
10
+None
+
+
+
+
+
+
+
+
+
+
+
Selecting Characters From Strings
+
+
+
If the variable s refers to a string, then
+s[0] is the string’s first character and s[-1]
+is its last. Write a function called outer that returns a
+string made up of just the first and last characters of its input. A
+call to your function should look like this:
Write a function rescale that takes an array as input
+and returns a corresponding array of values scaled to lie in the range
+0.0 to 1.0. (Hint: If L and H are the lowest
+and highest values in the original array, then the replacement for a
+value v should be (v-L) / (H-L).)
Run the commands help(numpy.arange) and
+help(numpy.linspace) to see how to use these functions to
+generate regularly-spaced values, then use those values to test your
+rescale function. Once you’ve successfully tested your
+function, add a docstring that explains what it does.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
"""Takes an array as input, and returns a corresponding array scaled so
+that 0 corresponds to the minimum and 1 to the maximum value of the input array.
+
+Examples:
+>>> rescale(numpy.arange(10.0))
+array([ 0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
+ 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ])
+>>> rescale(numpy.linspace(0, 100, 5))
+array([ 0. , 0.25, 0.5 , 0.75, 1. ])
+"""
+
+
+
+
+
+
+
+
+
+
+
Defining Defaults
+
+
+
Rewrite the rescale function so that it scales data to
+lie between 0.0 and 1.0 by default, but will
+allow the caller to specify lower and upper bounds if they want. Compare
+your implementation to your neighbor’s: do the two functions always
+behave the same way?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def rescale(input_array, low_val=0.0, high_val=1.0):
+"""rescales input array values to lie between low_val and high_val"""
+ L = numpy.amin(input_array)
+ H = numpy.amax(input_array)
+ intermed_array = (input_array - L) / (H - L)
+ output_array = intermed_array * (high_val - low_val) + low_val
+return output_array
+
+
+
+
+
+
+
+
+
+
+
Variables Inside and Outside Functions
+
+
+
What does the following piece of code display when run — and why?
+
+
PYTHON
+
+
f =0
+k =0
+
+def f2k(f):
+ k = ((f -32) * (5.0/9.0)) +273.15
+return k
+
+print(f2k(8))
+print(f2k(41))
+print(f2k(32))
+
+print(k)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
259.81666666666666
+278.15
+273.15
+0
+
+
k is 0 because the k inside the function
+f2k doesn’t know about the k defined outside
+the function. When the f2k function is called, it creates a
+local variable
+k. The function does not return any values and does not
+alter k outside of its local copy. Therefore the original
+value of k remains unchanged. Beware that a local
+k is created because f2k internal statements
+affect a new value to it. If k was only
+read, it would simply retrieve the global k
+value.
+
+
+
+
+
+
+
+
+
+
Mixing Default and Non-Default Parameters
+
+
+
Given the following code:
+
+
PYTHON
+
+
def numbers(one, two=2, three, four=4):
+ n =str(one) +str(two) +str(three) +str(four)
+return n
+
+print(numbers(1, three=3))
+
+
what do you expect will be printed? What is actually printed? What
+rule do you think Python is following?
+
1234
+
one2three4
+
1239
+
SyntaxError
+
Given that, what does the following piece of code display when
+run?
+
+
PYTHON
+
+
def func(a, b=3, c=6):
+print('a: ', a, 'b: ', b, 'c:', c)
+
+func(-1, 2)
+
+
a: b: 3 c: 6
+
a: -1 b: 3 c: 6
+
a: -1 b: 2 c: 6
+
a: b: -1 c: 2
+
+
+
+
+
+
+
+
+
Attempting to define the numbers function results in
+4. SyntaxError. The defined parameters two and
+four are given default values. Because one and
+three are not given default values, they are required to be
+included as arguments when the function is called and must be placed
+before any parameters that have default values in the function
+definition.
+
The given call to func displays
+a: -1 b: 2 c: 6. -1 is assigned to the first parameter
+a, 2 is assigned to the next parameter b, and
+c is not passed a value, so it uses its default value
+6.
+
+
+
+
+
+
+
+
+
+
Readable Code
+
+
+
Revise a function you wrote for one of the previous exercises to try
+to make the code more readable. Then, collaborate with one of your
+neighbors to critique each other’s functions and discuss how your
+function implementations could be further improved to make them more
+readable.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
Define a function using
+def function_name(parameter).
+
The body of a function must be indented.
+
Call a function using function_name(value).
+
Numbers are stored as integers or floating-point numbers.
+
Variables defined within a function can only be seen and used within
+the body of the function.
+
Variables created outside of any function are called global
+variables.
+
Within a function, we can access global variables.
+
Variables created within a function override global variables if
+their names match.
+
Use help(thing) to view help for something.
+
Put docstrings in functions to provide help for that function.
+
Specify default values for parameters when defining a function using
+name=value in the parameter list.
+
Parameters can be passed by matching based on name, by position, or
+by omitting them (in which case the default value is used).
+
Put code whose parameters change frequently in a function, then call
+it with different parameter values to customize its behavior.
+
+
diff --git a/instructor/08-data_analysis.html b/instructor/08-data_analysis.html
new file mode 100644
index 0000000..3421e46
--- /dev/null
+++ b/instructor/08-data_analysis.html
@@ -0,0 +1,493 @@
+
+Python for Official Statistics: Data Analysis
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
+
+
diff --git a/instructor/10-errors_exceptions.html b/instructor/10-errors_exceptions.html
new file mode 100644
index 0000000..26bcb19
--- /dev/null
+++ b/instructor/10-errors_exceptions.html
@@ -0,0 +1,1186 @@
+
+Python for Official Statistics: Errors and Exceptions
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
identify different errors and correct bugs associated with them
+
+
+
+
+
+
Every programmer encounters errors, both those who are just
+beginning, and those who have been programming for years. Encountering
+errors and exceptions can be very frustrating at times, and can make
+coding feel like a hopeless endeavour. However, understanding what the
+different types of errors are and when you are likely to encounter them
+can help a lot. Once you know why you get certain types of
+errors, they become much easier to fix.
+
Errors in Python have a very specific form, called a traceback. Let’s examine one:
+
+
PYTHON
+
+
# This code has an intentional error. You can type it directly or
+# use it for reference to understand the error message below.
+def favorite_ice_cream():
+ ice_creams = [
+'chocolate',
+'vanilla',
+'strawberry'
+ ]
+print(ice_creams[3])
+
+favorite_ice_cream()
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-1-70bd89baa4df> in <module>()
+ 9 print(ice_creams[3])
+ 10
+----> 11 favorite_ice_cream()
+
+<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
+ 7 'strawberry'
+ 8 ]
+----> 9 print(ice_creams[3])
+ 10
+ 11 favorite_ice_cream()
+
+IndexError: list index out of range
+
+
This particular traceback has two levels. You can determine the
+number of levels by looking for the number of arrows on the left hand
+side. In this case:
+
The first shows code from the cell above, with an arrow pointing
+to Line 11 (which is favorite_ice_cream()).
+
The second shows some code in the function
+favorite_ice_cream, with an arrow pointing to Line 9 (which
+is print(ice_creams[3])).
+
The last level is the actual place where the error occurred. The
+other level(s) show what function the program executed to get to the
+next level down. So, in this case, the program first performed a function call to the function
+favorite_ice_cream. Inside this function, the program
+encountered an error on Line 6, when it tried to run the code
+print(ice_creams[3]).
+
+
+
+
+
+
Long Tracebacks
+
+
+
Sometimes, you might see a traceback that is very long -- sometimes
+they might even be 20 levels deep! This can make it seem like something
+horrible happened, but the length of the error message does not reflect
+severity, rather, it indicates that your program called many functions
+before it encountered the error. Most of the time, the actual place
+where the error occurred is at the bottom-most level, so you can skip
+down the traceback to the bottom.
+
+
+
+
So what error did the program actually encounter? In the last line of
+the traceback, Python helpfully tells us the category or type of error
+(in this case, it is an IndexError) and a more detailed
+error message (in this case, it says “list index out of range”).
+
If you encounter an error and don’t know what it means, it is still
+important to read the traceback closely. That way, if you fix the error,
+but encounter a new one, you can tell that the error changed.
+Additionally, sometimes knowing where the error occurred is
+enough to fix it, even if you don’t entirely understand the message.
+
If you do encounter an error you don’t recognize, try looking at the
+official
+documentation on errors. However, note that you may not always be
+able to find the error there, as it is possible to create custom errors.
+In that case, hopefully the custom error message is informative enough
+to help you figure out what went wrong. Libraries like pandas and numpy
+have these custom errors, but the procedure to figure them out is the
+same: go to the earliest line in the error, and look at the error
+message for it. The documentation for these libraries will often provide
+the information you need about any functions you are using. There are
+also large communities of users for data libraries that can help as
+well!
+
+
+
+
+
+
Reading Error Messages
+
+
+
Read the Python code and the resulting traceback below, and answer
+the following questions:
+
How many levels does the traceback have?
+
What is the function name where the error occurred?
+
On which line number in this function did the error occur?
+
What is the type of error?
+
What is the error message?
+
+
PYTHON
+
+
# This code has an intentional error. Do not type it directly;
+# use it for reference to understand the error message below.
+def print_message(day):
+ messages = [
+'Hello, world!',
+'Today is Tuesday!',
+'It is the middle of the week.',
+'Today is Donnerstag in German!',
+'Last day of the week!',
+'Hooray for the weekend!',
+'Aw, the weekend is almost over.'
+ ]
+print(messages[day])
+
+def print_sunday_message():
+ print_message(7)
+
+print_sunday_message()
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-7-3ad455d81842> in <module>
+ 16 print_message(7)
+ 17
+---> 18 print_sunday_message()
+ 19
+
+<ipython-input-7-3ad455d81842> in print_sunday_message()
+ 14
+ 15 def print_sunday_message():
+---> 16 print_message(7)
+ 17
+ 18 print_sunday_message()
+
+<ipython-input-7-3ad455d81842> in print_message(day)
+ 11 'Aw, the weekend is almost over.'
+ 12 ]
+---> 13 print(messages[day])
+ 14
+ 15 def print_sunday_message():
+
+IndexError: list index out of range
+
+
+
+
+
+
+
+
+
+
3 levels
+
print_message
+
13
+
IndexError
+
+list index out of range You can then infer that
+7 is not the right index to use with
+messages.
+
+
+
+
+
+
+
+
+
+
Better errors on newer Pythons
+
+
+
Newer versions of Python have improved error printouts. If you are
+debugging errors, it is often helpful to use the latest Python version,
+even if you support older versions of Python.
+
+
+
+
Type Errors
+
+
One of the most common types of errors in Python are called type
+errors. These errors occur when you try to perform an operation on
+an object in python that cannot support it. This happens easily when
+working with large datasets where there are expected value types like
+either strings or integers. When we write a function expecting integers,
+we will not get an error until we encounter an operation that cannot
+handle strings. For example:
File "<ipython-input-3-6bb841ea1423>", line 3
+ letter=my_string["e"]
+ ^
+TypeError: string indices must be integers
+
+
We get this error because we are trying to use an index to access
+part of our string, which requires an integer. Instead, we entered a
+character and received a type error. This is fixed by replacing “e” with
+2.
+
In the case of datasets, we often see type errors when a mathematical
+operation, such as taking a mean, is performed on a column that contains
+characters, either as a result of formatting or introduced through
+error. As a result, correcting the error can involve simply removing the
+characters from the strings using regular expressions, or if the
+characters have resulted in incorrect data, removing those observations
+from the dataset.
+
Syntax Errors
+
+
When you forget a colon at the end of a line, accidentally add one
+space too many when indenting under an if statement, or
+forget a parenthesis, you will encounter a syntax error. This means that
+Python couldn’t figure out how to read your program. This is similar to
+forgetting punctuation in English: for example, this text is difficult
+to read there is no punctuation there is also no capitalization why is
+this hard because you have to figure out where each sentence ends you
+also have to figure out where each sentence begins to some extent it
+might be ambiguous if there should be a sentence break or not
+
People can typically figure out what is meant by text with no
+punctuation, but people are much smarter than computers. If Python
+doesn’t know how to read the program, it will give up and inform you
+with an error. For example:
Here, Python tells us that there is a SyntaxError on
+line 1, and even puts a little arrow in the place where there is an
+issue. In this case the problem is that the function definition is
+missing a colon at the end.
+
Actually, the function above has two issues with syntax. If
+we fix the problem with the colon, we see that there is also an
+IndentationError, which means that the lines in the
+function definition do not all have the same indentation:
Both SyntaxError and IndentationError
+indicate a problem with the syntax of your program, but an
+IndentationError is more specific: it always means
+that there is a problem with how your code is indented.
+
+
+
+
+
+
Tabs and Spaces
+
+
+
Some indentation errors are harder to spot than others. In
+particular, mixing spaces and tabs can be difficult to spot because they
+are both whitespace. In the
+example below, the first two lines in the body of the function
+some_function are indented with tabs, while the third line
+— with spaces. If you’re working in a Jupyter notebook, be sure to copy
+and paste this example rather than trying to type it in manually because
+Jupyter automatically replaces tabs with spaces.
Visually it is impossible to spot the error. Fortunately, Python does
+not allow you to mix tabs and spaces.
+
+
ERROR
+
+
File "<ipython-input-5-653b36fbcd41>", line 4
+ return msg
+ ^
+TabError: inconsistent use of tabs and spaces in indentation
+
+
+
+
+
Variable Name Errors
+
+
Another very common type of error is called a NameError,
+and occurs when you try to use a variable that does not exist. For
+example:
+
+
PYTHON
+
+
print(a)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-7-9d7b17ad5387> in <module>()
+----> 1 print(a)
+
+NameError: name 'a' is not defined
+
+
Variable name errors come with some of the most informative error
+messages, which are usually of the form “name ‘the_variable_name’ is not
+defined”.
+
Why does this error message occur? That’s a harder question to
+answer, because it depends on what your code is supposed to do. However,
+there are a few very common reasons why you might have an undefined
+variable. The first is that you meant to use a string, but forgot to put quotes around
+it:
+
+
PYTHON
+
+
print(hello)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-8-9553ee03b645> in <module>()
+----> 1 print(hello)
+
+NameError: name 'hello' is not defined
+
+
The second reason is that you might be trying to use a variable that
+does not yet exist. In the following example, count should
+have been defined (e.g., with count = 0) before the for
+loop:
+
+
PYTHON
+
+
for number inrange(10):
+ count = count + number
+print('The count is:', count)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-9-dd6a12d7ca5c> in <module>()
+ 1 for number in range(10):
+----> 2 count = count + number
+ 3 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+
Finally, the third possibility is that you made a typo when you were
+writing your code. Let’s say we fixed the error above by adding the line
+Count = 0 before the for loop. Frustratingly, this actually
+does not fix the error. Remember that variables are case-sensitive, so the variable
+count is different from Count. We still get
+the same error, because we still have not defined
+count:
+
+
PYTHON
+
+
Count =0
+for number inrange(10):
+ count = count + number
+print('The count is:', count)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-10-d77d40059aea> in <module>()
+ 1 Count = 0
+ 2 for number in range(10):
+----> 3 count = count + number
+ 4 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+
Index Errors
+
+
Next up are errors having to do with containers (like lists and
+strings) and the items within them. If you try to access an item in a
+list or a string that does not exist, then you will get an error. This
+makes sense: if you asked someone what day they would like to get
+coffee, and they answered “caturday”, you might be a bit annoyed. Python
+gets similarly annoyed if you try to ask it for an item that doesn’t
+exist:
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-11-d817f55b7d6c> in <module>()
+ 3 print('Letter #2 is', letters[1])
+ 4 print('Letter #3 is', letters[2])
+----> 5 print('Letter #4 is', letters[3])
+
+IndexError: list index out of range
+
+
Here, Python is telling us that there is an IndexError
+in our code, meaning we tried to access a list index that did not
+exist.
+
File Errors
+
+
The last type of error we’ll cover today are the most common type of
+error when using Python with data, those associated with reading and
+writing files: FileNotFoundError. If you try to read a file
+that does not exist, you will receive a FileNotFoundError
+telling you so. If you attempt to write to a file that was opened
+read-only, Python 3 returns an UnsupportedOperationError.
+More generally, problems with input and output manifest as
+OSErrors, which may show up as a more specific subclass;
+you can see the
+list in the Python docs. They all have a unique UNIX
+errno, which is you can see in the error message.
+
+
PYTHON
+
+
file_handle =open('myfile.txt', 'r')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+FileNotFoundError Traceback (most recent call last)
+<ipython-input-14-f6e1ac4aee96> in <module>()
+----> 1 file_handle = open('myfile.txt', 'r')
+
+FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
+
+
One reason for receiving this error is that you specified an
+incorrect path to the file. For example, if I am currently in a folder
+called myproject, and I have a file in
+myproject/writing/myfile.txt, but I try to open
+myfile.txt, this will fail. The correct path would be
+writing/myfile.txt. It is also possible that the file name
+or its path contains a typo. There may also be specific settings based
+on your organization if you are using shared, networked, or cloud-based
+drives. It is best to check with your IT administrators if you are still
+encountering issues reading in a file after troubleshooting.
+
A related issue can occur if you use the “read” flag instead of the
+“write” flag. Python will not give you an error if you try to open a
+file for writing when the file does not exist. However, if you meant to
+open a file for reading, but accidentally opened it for writing, and
+then try to read from it, you will get an
+UnsupportedOperation error telling you that the file was
+not opened for reading:
If you are getting a read or write error on file or folder that you
+are able to open and/or edit with other programs, you may need to
+contact an IT administrator to check the permissions granted to you and
+any programs you are using.
+
These are the most common errors with files, though many others
+exist. If you get an error that you’ve never seen before, searching the
+Internet for that error type often reveals common reasons why you might
+get that error.
+
+
+
+
+
+
Identifying Syntax Errors
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. Is it a
+SyntaxError or an IndentationError?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
PYTHON
+
+
def another_function
+print('Syntax errors are annoying.')
+print('But at least Python tells us about them!')
+print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+
+
+
SyntaxError for missing (): at end of first
+line, IndentationError for mismatch between second and
+third lines. A fixed version is:
+
+
PYTHON
+
+
def another_function():
+print('Syntax errors are annoying.')
+print('But at least Python tells us about them!')
+print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+
+
+
+
Identifying Variable Name Errors
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. What type of
+NameError do you think this is? In other words, is it a
+string with no quotes, a misspelled variable, or a variable that should
+have been defined but was not?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
PYTHON
+
+
for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (Number %3) ==0:
+ message = message + a
+else:
+ message = message +'b'
+print(message)
+
+
+
+
+
+
+
+
+
+
3 NameErrors for number being misspelled,
+for message not defined, and for a not being
+in quotes.
+
Fixed version:
+
+
PYTHON
+
+
message =''
+for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (number %3) ==0:
+ message = message +'a'
+else:
+ message = message +'b'
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Identifying Index Errors
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. What type of error is
+it?
+
Fix the error.
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+
+
+
+
IndexError; the last entry is seasons[3],
+so seasons[4] doesn’t make sense. A fixed version is:
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[-1])
+
+
+
+
+
+
A Final Note About Correcting Errors
+
+
There are a lot of very helpful answers for many error messages,
+however when working with official statistics, we need to also exercise
+some caution. Be aware and be wary of any answers that ask you to
+download a package from someone’s personal GitHub repository or other
+file sharing service. Try to find the type of error first and understand
+what the issue is before downloading anything claiming to fix the error.
+If the error is the result of an issue with a version of a package,
+check if there are any security vulnerabilities with that version, and
+use a package manager to move between package versions.
+
+
diff --git a/instructor/404.html b/instructor/404.html
new file mode 100644
index 0000000..c1d20b6
--- /dev/null
+++ b/instructor/404.html
@@ -0,0 +1,445 @@
+
+Python for Official Statistics: Page not found
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Python for Official Statistics
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Page not found
+
+
Our apologies!
+
+
We cannot seem to find the page you are looking for. Here are some
+tips that may help:
+
+
diff --git a/instructor/CODE_OF_CONDUCT.html b/instructor/CODE_OF_CONDUCT.html
new file mode 100644
index 0000000..6cc6dee
--- /dev/null
+++ b/instructor/CODE_OF_CONDUCT.html
@@ -0,0 +1,458 @@
+
+Python for Official Statistics: Contributor Code of Conduct
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
to Share—copy and redistribute the material in any
+medium or format
+
to Adapt—remix, transform, and build upon the
+material
+
for any purpose, even commercially.
+
The licensor cannot revoke these freedoms as long as you follow the
+license terms.
+
Under the following terms:
+
Attribution—You must give appropriate credit
+(mentioning that your work is derived from work that is Copyright (c)
+The Carpentries and, where practical, linking to https://carpentries.org/), provide a link to the
+license, and indicate if changes were made. You may do so in any
+reasonable manner, but not in any way that suggests the licensor
+endorses you or your use.
+
No additional restrictions—You may not apply
+legal terms or technological measures that legally restrict others from
+doing anything the license permits. With the understanding
+that:
+
Notices:
+
You do not have to comply with the license for elements of the
+material in the public domain or where your use is permitted by an
+applicable exception or limitation.
+
No warranties are given. The license may not give you all of the
+permissions necessary for your intended use. For example, other rights
+such as publicity, privacy, or moral rights may limit how you use the
+material.
+
Software
+
+
Except where otherwise noted, the example programs and other software
+provided by The Carpentries are made available under the OSI-approved MIT
+license.
+
Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+“Software”), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
Trademark
+
+
“The Carpentries”, “Software Carpentry”, “Data Carpentry”, and
+“Library Carpentry” and their respective logos are registered trademarks
+of Community Initiatives.
How do I find reliable and safe resources or code online?
+
+
+
+
+
+
+
+
Objectives
+
+
identify basic concepts in programming
+
+
+
+
+
+
+
Programming in Python
+
+
+
+
In most general terms, programming is the process of writing
+instructions for a computer. In this course we will be using Python as
+the language to communicate with the computer.
+
+
Strictly speaking, Python is an interpreted language, rather than a
+compiled language, meaning we are not communicating directly with the
+computer when we use Python. When we run Python code, our Python source
+code is first translated into byte code, which is then executed by the
+Python virtual machine.
+
+
Programming is a wide topic including a variety of techniques and
+tools. In this course we’ll be focusing on programming for statistical
+analysis.
+
+
IDEs
+
+
IDE stands for Integrated Development Environment. IDEs are where you
+will write, edit, and debug python scripts, so you want to choose one
+that makes you feel comfortable and includes the functionality that you
+need. Some open-source IDEs for Python include JupyterLab and Visual Studio
+Code.
+
+
+
Packages
+
+
Packages, or libraries, are extensions to the statistical programming
+language. They contain code, data, and documentation in a standardised
+collection format that can be installed by users, typically via a
+centralised software repository. A typical Python workflow will use base
+Python (the core operations and functions provided by your Python
+installation) as well as specialised data analysis and scientific
+packages like NumPy, SciPy and Pandas.
+
+
Best Practices
+
+
+
+
Let’s overview some base concepts that any programmer should always
+keep in mind.
+
+
Documentation
+
+
Have you ever returned to a task and tried to read a note that you
+quickly scrawled for yourself the last time you were working on it? Have
+you ever inherited a project from a colleague and found you have no idea
+what remains to be done?
+
It can be very challenging to return to your own work or a
+colleague’s and this goes doubly for programming. Documentation is one
+way we can reduce the burden on future selves and our colleagues.
+
+
Inline Documentation
+
+
As a new programmer, inline documentation can be the most helpful.
+Inline documentation refers to writing comments on the same line as your
+code. For example, if we wrote a line of code to sum 1+1, we might
+document it as follows:
+
+
PYTHON
+
+
1+1# adding the numbers 1 and 1 together.
+
+
Although this is a very simple line of code and it might seem like
+overkill to document it in this way, these types of comments can be very
+helpful in jogging your memory when returning to a project. Inline
+comments can also help you to break multi-step programs into digestible
+and readable pieces.
+
+
+
External Documentation
+
+
Sometimes you require more detail than you can comfortably fit in
+your inline documentation. In this case it can be helpful to create
+separate files to document your project. This type of documentation will
+typically focus on the goals, scope, and any special instructions
+relating to your project rather than the details fo your code. The most
+common type of external documentation is a README file. It is best
+practice to create a basic README file for any project. A basic README
+should include:
+
+
a brief description of the project,
+
any special instructions for installation or use,
+
the authors and any references.
+
+
README files are just text files and it is best practice is to save
+your README file as a README.md markdown document. This
+file format is automatically recognised by code repositories like
+GitHub, so your README contents are displayed alongside your code
+repository.
+
+
+
DocStrings
+
+
In chapter 7: functions we’ll learn
+about documentation specific to functions known as DocStrings.
+
+
+
Getting Help
+
+
+
+
Later on, in chapter 10: Errors
+and Exceptions we will cover errors in more detail. However, before
+we get there it’s very likely you’ll need some assistance writing Python
+code.
+
+
Built-in Help
+
+
There is a help
+function built into base Python. You can use it to investigate
+built-in functions, data types, and more. For example, say we want to
+know more about the print() function in Python:
+
+
PYTHON
+
+
help(print)
+
+
+
OUTPUT
+
+
Help on built-in function print in module builtins:
+
+print(...)
+ print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
+
+ Prints the values to a stream, or to sys.stdout by default.
+ Optional keyword arguments:
+ file: a file-like object (stream); defaults to the current sys.stdout.
+ sep: string inserted between values, default a space.
+ end: string appended after the last value, default a newline.
+-- More --
+
+
+
+
Finding Resources online
+
+
Stack Overflow is a valuable
+resource for programmers of all levels. It can be daunting to post your
+own question! Fortunately, chances are someone else has already asked a
+similar question!
It can also be helpful to do a general search for a particular topic
+or error message. It’s very likely the first few results will be from
+StackOverflow, followed by a few from official documentation and then
+you may start seeing results from personal blogs or third parties. These
+third party results can sometime be valuable but we should be cautious!
+Here are a few things to keep in mind when you are looking for online
+resources:
+
+
Don’t download or install anything unless you are certain of what it
+is and why you need it.
+
Don’t copy or run code unless you fully understand what it
+does.
+
Python is an open-source language; official documentation and
+resources will not be behind a paywall.
+
You may not find a resource or solution to fit your exact needs. Try
+to be flexible and adapt online solutions to fit your needs.
+
+
+
+
+
+
+
Key Points
+
+
+
+
Python is an interpreted language.
+
Code is commonly developed inside an integrated development
+environment.
+
A typical Python workflow uses base Python and additional Python
+packages developed for statistical programming purposes.
+
In-line and external documentation helps ensure that your code is
+readable.
+
You can find help through the built-in help function and external
+resources.
Can I change the value associated with a variable after I create
+it?
+
+
+
+
+
+
+
+
Objectives
+
+
Assign values to variables.
+
+
+
+
+
+
+
Variables
+
+
+
+
Any Python interpreter can be used as a calculator:
+
+
PYTHON
+
+
3+5*4
+
+
+
OUTPUT
+
+
23
+
+
This is great but not very interesting. To do anything useful with
+data, we need to assign its value to a variable. In Python, we
+can assign a value to a variable, using the equals sign
+=. For example, we can track the weight of a patient who
+weighs 60 kilograms by assigning the value 60 to a variable
+weight_kg:
+
+
PYTHON
+
+
weight_kg =60
+
+
From now on, whenever we use weight_kg, Python will
+substitute the value we assigned to it. In layperson’s terms, a
+variable is a name for a value.
+weight0 is a valid variable name, whereas
+0weight is not
+
+weight and Weight are different
+variables
+
Types of data
+
+
+
+
Python knows various types of data. Three common ones are:
+
+
integer numbers
+
floating point numbers, and
+
strings.
+
+
In the example above, variable weight_kg has an integer
+value of 60. If we want to more precisely track the weight
+of our patient, we can use a floating point value by executing:
+
+
PYTHON
+
+
weight_kg =60.3
+
+
To create a string, we add single or double quotes around some text.
+To identify and track a patient throughout our study, we can assign each
+person a unique identifier by storing it in a string:
+
+
PYTHON
+
+
patient_id ='001'
+
+
Using Variables in Python
+
+
+
+
Once we have data stored with variable names, we can make use of it
+in calculations. We may want to store our patient’s weight in pounds as
+well as kilograms:
+
+
PYTHON
+
+
weight_lb =2.2* weight_kg
+
+
We might decide to add a prefix to our patient identifier:
+
+
PYTHON
+
+
patient_id ='inflam_'+ patient_id
+
+
Built-in Python functions
+
+
+
+
To carry out common tasks with data and variables in Python, the
+language provides us with several built-in functions. To display information to
+the screen, we use the print function:
+
+
PYTHON
+
+
print(weight_lb)
+print(patient_id)
+
+
+
OUTPUT
+
+
132.66
+inflam_001
+
+
When we want to make use of a function, referred to as calling the
+function, we follow its name by parentheses. The parentheses are
+important: if you leave them off, the function doesn’t actually run!
+Sometimes you will include values or variables inside the parentheses
+for the function to use. In the case of print, we use the
+parentheses to tell the function what value we want to display. We will
+learn more about how functions work and how to create our own in later
+episodes.
+
We can display multiple things at once using only one
+print call:
+
+
PYTHON
+
+
print(patient_id, 'weight in kilograms:', weight_kg)
+
+
+
OUTPUT
+
+
inflam_001 weight in kilograms: 60.3
+
+
We can also call a function inside of another function call. For example,
+Python has a built-in function called type that tells you a
+value’s data type:
+
+
PYTHON
+
+
print(type(60.3))
+print(type(patient_id))
+
+
+
OUTPUT
+
+
<class 'float'>
+<class 'str'>
+
+
Moreover, we can do arithmetic with variables right inside the
+print function:
+
+
PYTHON
+
+
print('weight in pounds:', 2.2* weight_kg)
+
+
+
OUTPUT
+
+
weight in pounds: 132.66
+
+
The above command, however, did not change the value of
+weight_kg:
+
+
PYTHON
+
+
print(weight_kg)
+
+
+
OUTPUT
+
+
60.3
+
+
To change the value of the weight_kg variable, we have
+to assignweight_kg a new value using the
+equals = sign:
+
+
PYTHON
+
+
weight_kg =65.0
+print('weight in kilograms is now:', weight_kg)
+
+
+
OUTPUT
+
+
weight in kilograms is now: 65.0
+
+
+
+
+
+
+
Variables as Sticky Notes
+
+
+
A variable in Python is analogous to a sticky note with a name
+written on it: assigning a value to a variable is like putting that
+sticky note on a particular value.
+
Using this analogy, we can investigate how assigning a value to one
+variable does not change values of other, seemingly
+related, variables. For example, let’s store the subject’s weight in
+pounds in its own variable:
+
+
PYTHON
+
+
# There are 2.2 pounds per kilogram
+weight_lb =2.2* weight_kg
+print('weight in kilograms:', weight_kg, 'and in pounds:', weight_lb)
+
+
+
OUTPUT
+
+
weight in kilograms: 65.0 and in pounds: 143.0
+
+
Everything in a line of code following the ‘#’ symbol is a comment that is ignored by Python.
+Comments allow programmers to leave explanatory notes for other
+programmers or their future selves.
+
Similar to above, the expression 2.2 * weight_kg is
+evaluated to 143.0, and then this value is assigned to the
+variable weight_lb (i.e. the sticky note
+weight_lb is placed on 143.0). At this point,
+each variable is “stuck” to completely distinct and unrelated
+values.
+
Let’s now change weight_kg:
+
+
PYTHON
+
+
weight_kg =100.0
+print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
+
+
+
OUTPUT
+
+
weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
+
+
Since weight_lb doesn’t “remember” where its value comes
+from, it is not updated when we change weight_kg.
+
+
+
+
+
+
+
+
+
Check Your Understanding
+
+
+
What values do the variables mass and age
+have after each of the following statements? Test your answer by
+executing the lines.
+
+
PYTHON
+
+
mass =47.5
+age =122
+mass = mass *2.0
+age = age -20
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
`mass` holds a value of 47.5, `age` does not exist
+`mass` still holds a value of 47.5, `age` holds a value of 122
+`mass` now has a value of 95.0, `age`'s value is still 122
+`mass` still has a value of 95.0, `age` now holds 102
+
+
+
+
+
+
+
+
+
+
+
Sorting Out References
+
+
+
Python allows you to assign multiple values to multiple variables in
+one line by separating the variables and values with commas. What does
+the following program print out?
+
+
PYTHON
+
+
first, second ='Grace', 'Hopper'
+third, fourth = second, first
+print(third, fourth)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
Hopper Grace
+
+
+
+
+
+
+
+
+
+
+
Seeing Data Types
+
+
+
What are the data types of the following variables?
Explain what a library is and what libraries are used for.
+
Import a Python library and use the functions it contains.
+
Read tabular data from a file into a program.
+
Select individual values and subsections from data.
+
Perform operations on arrays of data.
+
+
+
+
+
+
+
Words are useful, but what’s more useful are the sentences and
+stories we build with them. Similarly, while a lot of powerful, general
+tools are built into Python, specialized tools built up from these basic
+units live in libraries that can be
+called upon when needed.
+
Loading data into Python
+
+
+
+
To begin processing the clinical trial inflammation data, we need to
+load it into Python. Python can work with many different file types.
+Text files can be loaded into Python by using the base Python
+function
+
+
PYTHON
+
+
Open("filename.txt", "r")
+
+
where “r” means read only, or if you want to write to the file, you
+can use “w”.
+
However, our patient data is in a csv. file, which is more commonly
+loaded by using a library. Python has hundreds of thousands of libraries
+to choose from to help carry out your work. Importing a library is like
+getting a piece of lab equipment out of a storage locker and setting it
+up on the bench. Libraries provide additional functionality to the basic
+Python package, much like a new piece of equipment adds functionality to
+a lab space. Just like in the lab, importing too many libraries can
+sometimes complicate and slow down your programs - so we only import
+what we need for each program. There are a couple common Python
+libraries to load (and work with data).
+
pandas
+
+
+
+
The first library we will present is called pandas pandas is a
+Python library containing a set of functions and specialised data
+structures that have been designed to help Python programmers to perform
+data analysis tasks in a structured way.
+
Most of the things that pandas can do can be done with basic Python,
+but the collected set of pandas functions and data structure makes the
+data analysis tasks more consistent in terms of syntax and therefore
+aids readabilty.
+
Remember to write the library name with a lower case ‘p’ because the
+name of the package and Python is case sensitive.
+
+
Importing the pandas library
+
+
Importing the pandas library is done in exactly the same way as for
+any other library. In almost all examples of Python code using the
+pandas library, it will have been imported and given an alias of
+pd. We will follow the same convention.
+
+
PYTHON
+
+
import pandas as pd
+
+
+
+
Pandas data structures
+
+
There are two main data structure used by pandas, they are the Series
+and the Dataframe. The Series equates in general to a vector or a list.
+The Dataframe is equivalent to a table. Each column in a pandas
+Dataframe is a pandas Series data structure.
+
We will mainly be looking at the Dataframe.
+
We can easily create a Pandas Dataframe by reading a .csv file
+
+
+
Reading a csv file
+
+
When we read a csv dataset in base Python we did so by opening the
+dataset, reading and processing a record at a time and then closing the
+dataset after we had read the last record. Reading datasets in this way
+is slow and places all of the responsibility for extracting individual
+data items of information from the records on the programmer.
+
The main advantage of this approach, however, is that you only have
+to store one dataset record in memory at a time. This means that if you
+have the time, you can process datasets of any size.
+
In Pandas, csv files are read as complete datasets. You do not have
+to explicitly open and close the dataset. All of the dataset records are
+assembled into a Dataframe. If your dataset has column headers in the
+first record then these can be used as the Dataframe column names. You
+can explicitly state this in the parameters to the call, but pandas is
+usually able to infer that there ia a header row and use it
+automatically.
+
To tell Python that we’d like to start using pandas, we need to import it:
+
+
PYTHON
+
+
import pandas as pd
+
+
Often, libraries are given an alias or a short form name, in this
+case pandas is given the alias “pd”. Aliases for common data analysis
+libraries include:
+
+
PYTHON
+
+
import pandas as pd
+import numpy as np
+import matplotlib as plt
+import seaborn as sns
+
+
Once we’ve imported the library, we can ask the library to read our
+data file for us:
+
+
PYTHON
+
+
pd.read_csv("filename.csv)
+
+
pandas is a commonly used library for working with and analysing
+data. However, we will be working with a different package for the
+remainder of this course. If you would like to learn more about data
+manipulation and analysis using pandas, we recommend checking out Data Analysis and
+Visualization with Python for Social Scientists.
+
+
numpy
+
+
+
+
The second package that we will present is called NumPy, which stands for Numerical
+Python. In general, you should use this library when you want to do
+fancy things with lots of numbers, especially if you have matrices or
+arrays. Numpy matrices are typically lighter weight with better
+performance, particularly when working with large datasets.
+
We will be using this package to work with our clinical trial
+inflammation data.
+
To tell Python that we’d like to start using NumPy, we need to import it:
+
+
PYTHON
+
+
import numpy as np
+
+
Now that we have imported the library, we can ask the library (by
+using the alisa np) to read our data file for us:
The expression np.loadtxt(...) is a function call that asks Python
+to run the function
+loadtxt which belongs to the np library. The
+dot notation in Python is used most of all as an object
+attribute/property specifier or for invoking its method.
+object.property will give you the object.property value,
+object_name.method() will invoke on object_name method.
+
As an example, John Smith is the John that belongs to the Smith
+family. We could use the dot notation to write his name
+smith.john, just as loadtxt is a function that
+belongs to the np library.
+
np.loadtxt has two parameters: the name of the file we
+want to read and the delimiter
+that separates values on a line. These both need to be character strings
+(or strings for short), so we put
+them in quotes.
+
Since we haven’t told it to do anything else with the function’s
+output, the notebook displays it.
+In this case, that output is the data we just loaded. By default, only a
+few rows and columns are shown (with ... to omit elements
+when displaying big arrays). Note that, to save space when displaying
+NumPy arrays, Python does not show us trailing zeros, so
+1.0 becomes 1..
+
Our call to np.loadtxt read our file but didn’t save the
+data in memory. To do that, we need to assign the array to a variable.
+In a similar manner to how we assign a single value to a variable, we
+can also assign an array of values to a variable using the same syntax.
+Let’s re-run np.loadtxt and save the returned data:
+
+
PYTHON
+
+
data = np.loadtxt(fname='inflammation-01.csv', delimiter=',')
+
+
This statement doesn’t produce any output because we’ve assigned the
+output to the variable data. If we want to check that the
+data have been loaded, we can print the variable’s value:
Now that the data are in memory, we can manipulate them. First, let’s
+ask what type of thing
+data refers to:
+
+
PYTHON
+
+
print(type(data))
+
+
+
OUTPUT
+
+
<class 'np.ndarray'>
+
+
The output tells us that data currently refers to an
+N-dimensional array, the functionality for which is provided by the
+NumPy library. These data correspond to arthritis patients’
+inflammation. The rows are the individual patients, and the columns are
+their daily inflammation measurements.
+
+
+
+
+
+
Data Type
+
+
+
A Numpy array contains one or more elements of the same type. The
+type function will only tell you that a variable is a NumPy
+array but won’t tell you the type of thing inside the array. We can find
+out the type of the data contained in the NumPy array.
With the following command, we can see the array’s shape:
+
+
PYTHON
+
+
print(data.shape)
+
+
+
OUTPUT
+
+
(60, 40)
+
+
The output tells us that the data array variable
+contains 60 rows and 40 columns. When we created the variable
+data to store our arthritis data, we did not only create
+the array; we also created information about the array, called members or attributes. This extra
+information describes data in the same way an adjective
+describes a noun. data.shape is an attribute of
+data which describes the dimensions of data.
+We use the same dotted notation for the attributes of variables that we
+use for the functions in libraries because they have the same
+part-and-whole relationship.
+
If we want to get a single number from the array, we must provide an
+index in square brackets after the
+variable name, just as we do in math when referring to an element of a
+matrix. Our inflammation data has two dimensions, so we will need to use
+two indices to refer to one specific value:
+
+
PYTHON
+
+
print('first value in data:', data[0, 0])
+
+
+
OUTPUT
+
+
first value in data: 0.0
+
+
+
PYTHON
+
+
print('middle value in data:', data[29, 19])
+
+
+
OUTPUT
+
+
middle value in data: 16.0
+
+
The expression data[29, 19] accesses the element at row
+30, column 20. While this expression may not surprise you,
+data[0, 0] might. Programming languages like Fortran,
+MATLAB and R start counting at 1 because that’s what human beings have
+done for thousands of years. Languages in the C family (including C++,
+Java, Perl, and Python) count from 0 because it represents an offset
+from the first value in the array (the second value is offset by one
+index from the first value). This is closer to the way that computers
+represent arrays (if you are interested in the historical reasons behind
+counting indices from zero, you can read Mike
+Hoye’s blog post). As a result, if we have an M×N array in Python,
+its indices go from 0 to M-1 on the first axis and 0 to N-1 on the
+second. It takes a bit of getting used to, but one way to remember the
+rule is that the index is how many steps we have to take from the start
+to get the item we want.
+
+
+
+
+
+
In the Corner
+
+
+
What may also surprise you is that when Python displays an array, it
+shows the element with index [0, 0] in the upper left
+corner rather than the lower left. This is consistent with the way
+mathematicians draw matrices but different from the Cartesian
+coordinates. The indices are (row, column) instead of (column, row) for
+the same reason, which can be confusing when plotting data.
+
+
+
+
Slicing data
+
+
+
+
An index like [30, 20] selects a single element of an
+array, but we can select whole sections as well. For example, we can
+select the first ten days (columns) of values for the first four
+patients (rows) like this:
The slice0:4 means,
+“Start at index 0 and go up to, but not including, index 4”. Again, the
+up-to-but-not-including takes a bit of getting used to, but the rule is
+that the difference between the upper and lower bounds is the number of
+values in the slice.
We also don’t have to include the upper and lower bound on the slice.
+If we don’t include the lower bound, Python uses 0 by default; if we
+don’t include the upper, the slice runs to the end of the axis, and if
+we don’t include either (i.e., if we use ‘:’ on its own), the slice
+includes everything:
+
+
PYTHON
+
+
small = data[:3, 36:]
+print('small is:')
+print(small)
+
+
The above example selects rows 0 through 2 and columns 36 through to
+the end of the array.
+
+
OUTPUT
+
+
small is:
+[[ 2. 3. 0. 0.]
+ [ 1. 1. 0. 1.]
+ [ 2. 2. 1. 1.]]
Understand the properties and behaviours of lists and
+dictionaries
+
Access values in lists and dictionaries
+
Create and access values from nest lists and dictionaries
+
+
+
+
+
+
+
Values can also be stored in other Python data types such as lists,
+dictionaries, sets and tuples. Storing objects in a list is a fast and
+versatile way to apply transformations across a sequence of values.
+Storing objects in dictionary as key-value pairs is useful for
+extracting specific values i.e. performing lookup operations.
+
Create and access lists
+
+
+
+
Lists have the following properties and behaviours:
+
+
A single list can store different primitive object types and even
+other lists
+
Lists are ordered and have a 0-based index
+
Lists can be appended to using the methods append() or
+insert()
+
+
Values inside a list can be removed using the methods
+remove() or pop()
+
+
Two lists can be concatenated with the operator +
+
+
Values inside a list can be conditionally iterated through
+
A list is mutable i.e. the values inside a list can be modified in
+place
+
+
To create a list, values are contained within square brackets
+i.e. [] and individually separated by commas. The function
+list() can also be used to create a list of values from an
+iterable object like a string, set or tuple.
+
+
PYTHON
+
+
# Create a list of integers using []
+list_1 = [1, 3, 5, 7]
+print(list_1)
+
+
+
OUTPUT
+
+
[1, 3, 5, 7]
+
+
+
PYTHON
+
+
# Unlike atomic vectors in R, a list can contain multiple primitive object types
+list_2 = [1, "one", 1.0, True]
+print(list_2)
+
+
+
OUTPUT
+
+
[1, 'one', 1.0, True]
+
+
+
PYTHON
+
+
# You can also use list() on an iterable object to convert it into a list
+string ='abcdefg'
+list_3 =list(string)
+print(list_3)
+
+
+
OUTPUT
+
+
['a', 'b', 'c', 'd', 'e', 'f', 'g']
+
+
Because lists have a 0-based index, we can access individual values
+by their list index position. For 0-based indexes, the first value
+always starts at position 0 i.e. the first element has an index of 0.
+Accessing multiple values by their index positions is also referred to
+as slicing or subsetting a list.
+
Note that we can use negative numbers as indices in Python. When we
+do so, the index -1 gives us the last element in the list,
+-2 gives us the second to last element in the list, and so
+on.
# A syntax quirk for slicing values is to +1 to the last value's index
+# To extract from index 0 to 2, we need to slice from [0:2+1] or [0:3]
+
+# Extract the first three values from list_3
+print('first 3 values:', list_3[0:3])
+
+# Start from index 0 and extract values from each subsequent second position
+print('every second value:', list_3[0::2])
+
+# Start from index 1, end at index 3 and extract from each subsequent second position
+print('every second value from index 1 to 3:', list_3[1:4:2])
+
+
+
OUTPUT
+
+
first 3 values: ['a', 'b', 'c']
+every second value: ['a', 'c', 'e', 'g']
+every second value from index 1 to 3: ['b', 'd']
+
+
Change list values
+
+
+
+
Data which can be modified in place is called mutable, while data
+which cannot be modified is called immutable. Strings and numbers are
+immutable in that when we want to change the value of a string or number
+variable, we can only replace the old value with a completely new
+value.
+
+
PYTHON
+
+
string ='abcde'
+string[0] ='b'# Produces a type error as strings are immutable
+
+# TypeError: 'str' object does not support item assignment
+
+
In contrast, lists are mutable and we can modify them after they have
+been created. We can change individual values, append new values, or
+reorder the whole list through sorting.
+
+
PYTHON
+
+
list_4 = ['apple', 'pear', 'plum']
+print('original list_4:', list_4)
+
+# Change the first value i.e. modify the list in place
+list_4[0] ='banana'
+print('modified list_4:', list_4)
+
+# Add new value to list using the method .insert(index number, value)
+list_4.insert(1, 'apple') # Index 1 refers to the second position
+print('appended list_4:', list_4)
# Sorting a list also modifies it in place
+list_5 = [2, 1, 3, 7]
+list_5.sort()
+print('list_5:', list_5)
+
+
+
OUTPUT
+
+
list_5: [1, 2, 3, 7]
+
+
However, be careful when modifying data in-place. If two variables
+refer to the same list, and you modify the list value, it will change
+for both variables!
+
+
PYTHON
+
+
# When we assign list_6 to list_5, it means both list_6 and list_5 point to the
+# same list object, not that list_6 is a copy of list_5.
+
+list_6 = list_5
+print('list_5:', list_5)
+print('list_6:', list_6)
+
+# Change the first value in list_6 from 1 to 2
+list_6[0] =2
+
+print('modified list_6:', list_6)
+print('unmodified list_5:', list_5)
+
+# Warning: list_5 and list_6 have both been modified in place!
Because of this behaviour, code which modifies data in place should
+be handled with care. You can also avoid this behaviour by expliciting
+creating a copy of the original list and modifying only the object copy.
+This is why creating a copy of the original data object can be useful in
+Python.
+
+
PYTHON
+
+
list_5 = [1, 2, 3, 7]
+list_7 = list_5.copy()
+print('list_5:', list_5)
+print('list_7:', list_7)
+
+# As list_7 is a completely new object copied from list_5, modifying list_7 does
+# not affect list_5.
+
+list_7[0] =2
+print('modified list_7:', list_7)
+print('unmodified list_5:', list_5)
There are a lot of functions and methods which can be applied to
+lists, such as len(), max(),
+index() and so forth. Mathematical operations do not work
+on lists of integers, with the exception of +.
+
Note that + concatenates two lists into a single longer
+list, rather than outputting the sum of two lists of numbers.
+
+
PYTHON
+
+
list_8 = [1, 2, 3]
+list_9 = [4, 5, 6]
+
+list_8 + list_9 # This concatenates the lists and does not sum the two lists together
+
+
+
OUTPUT
+
+
[1, 2, 3, 4, 5, 6]
+
+
In your spare time after this workshop, you can search for different
+list functions and methods and test them out yourselves.
+
Nested lists
+
+
+
+
We have previously mentioned that lists can be used to store other
+Python object types, including lists. This means that we can create
+nested lists in Python i.e. lists containing lists containing values.
+This property is useful when we have a collection of values that we want
+to access or transform as a subgroup.
+
To create a nested list, we also use [] or
+list() to contain one or more lists of values of
+interest.
+
+
PYTHON
+
+
veg_stock = [
+ ['lettuce', 'lettuce', 'tomato', 'zucchini'],
+ ['lettuce', 'lettuce', 'carrot', 'zucchini'],
+ ['lettuce', 'basil', 'tomato', 'zucchini']
+ ]
+
+# Check that veg_stock is a list object
+print(type(veg_stock))
+
+# Check that the first value in veg_stock is itself a list
+print(veg_stock[0], 'has type', type(veg_stock[0]))
+
+
+
OUTPUT
+
+
<class 'list'>
+['lettuce', 'lettuce', 'tomato', 'zucchini'] has type <class 'list'>
+
+
To extract the first sub-list within the veg_stock list
+object, we refer to its index like we would with any other value inside
+a list i.e. veg_stock[1] points to the second sub-list
+within the veg_stock list.
+
To access an individual string value inside a sub-list, we make use
+of a second index, which points to an individual value inside the
+sub-list.
+
+
PYTHON
+
+
print(veg_stock[0]) # Access the first sub-list
+print(veg_stock[0][0]) # Access the first value in the first sub-list
+
+print(type(veg_stock[0])) # The first value in veg_stock is a list
+print(type(veg_stock[0][0])) # The first value in the first list in veg_stock is a string
In general, however, when we are analysing a large collection of
+values, the best practice is to structure those values in columns and
+rows as a tabular Pandas data frame object. This is covered in another
+Carpentries Course called Python
+for Social Sciences.
+
Lists are still incredibly versatile and useful when you have a
+collection of values that need to be efficiently accessed or
+transformed. For example, data frame column names are commonly extracted
+and stored inside a list, so that the same transformation can then be
+mapped across multiple columns.
+
Create and access dictionaries
+
+
+
+
A dictionary is a Python data type that is particularly suited for
+enabling quick lookup operations on unstructured data sets.
+
A dictionary can therefore be thought of as an unordered list where
+every item or value is associated with a unique key (i.e. a self-defined
+index of unique strings or numbers). The index values are called keys
+and a dictionary contains key-value pairs with the format
+{key: value(s)}.
+
Dictionaries can be created by listing individual key-values pairs
+inside {} or using dict().
+
+
PYTHON
+
+
# A key-value pair can contain single or multiple values
+# Keys are treated as case sensitive and unique
+# Multiple values are first stored inside a list
+
+teams = {
+'data science': ['Mei Ling', 'Paul', 'Gwen', 'Suresh'],
+'user design': ['Amy', 'Linh', 'Sasha'],
+'software dev': ['David', 'Prya'],
+'comms': 'Taylor'
+ }
+
+
When using dict(), we need to indicate which key is
+associated with which value. This can be done directly using tuples,
+direct association i.e. using = or using
+zip(), which creates a set of tuples from an iterable
+list.
+
+
PYTHON
+
+
# To use dict(), key-value pairs are can be stored inside tuples
+ds_emp_status =dict([
+ ('Mei Ling', 'full time'),
+ ('Paul', 'full time'),
+ ('Gwen', 'part time'),
+ ('Suresh', 'part time')
+ ])
+
+# Key-value pairs can also be assigned by direct association
+# Keys cannot be strings i.e. wrapped in '' using this approach
+ud_emp_status =dict(
+ Amy ='full time',
+ Linh ='full time',
+ Sasha ='casual'
+ )
+
+# zip() can also be used if each key has only one value
+sd_emp_status =dict(zip(
+ ['David', 'Prya'],
+ ['full time', 'full time']
+ ))
+
+
To access a specific value inside a dictionary, we need to specify
+its key using []. This is similar to slicing or subsetting
+a list by specifying its index using [].
+
+
PYTHON
+
+
# Access the values associated with the key 'data science'
+print(teams['data science'])
+
+print('The object teams is of type', type(teams))
+print('The dict value', teams['data science'], 'is of type', type(teams['data science']))
+
+
+
OUTPUT
+
+
['Mei Ling', 'Paul', 'Gwen', 'Suresh']
+The data object teams is of type <class 'dict'>
+The value ['Mei Ling', 'Paul', 'Gwen', 'Suresh'] is of type <class 'list'>
+
+
We can also access a value from a dictionary using the
+get() method.
+
+
PYTHON
+
+
print(teams.get('user design'))
+
+# get() also enables us to return an alternate string when the key is not found
+# This prevents our code from returning an error message that halts the analysis
+
+print(teams.get('data engineering', 'WARNING: key does not exist'))
+
+
+
OUTPUT
+
+
['Amy', 'Linh', 'Sasha']
+WARNING: key does not exist
+
+
To access data inside a dictionary, we can also perform the following
+other actions:
+
+
Check whether a key exists in a dictionary using the keyword
+in
+
+
Retrieve unique dictionary keys using dict.keys()
+
+
Retrieve dictionary values using dict.values()
+
+
Retrieve dictionary items using dict.items()
+
+
+
+
PYTHON
+
+
# Check whether a key exists in a dictionary
+print('data science'in teams)
+print('Data Science'in teams) # Keys are case sensitive
+
+# Retrieve all dictionary keys
+print(teams.keys())
+print(sd_emp_status.keys())
+
+# Retrieve all dictionary values
+print(sd_emp_status.values())
+
+# Retrieve all dictionary key-value pairs
+print(sd_emp_status.items())
To add a new key-value pair to an existing dictionary, we can create
+a new key and directly attach a new value to it using = or
+alternatively use the method update().
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# Add new key-value pair using direct assignment
+sd_emp_status['Mohammad'] ='full time'
+
+# Add new key-value pair using update({'key': 'value'})
+sd_emp_status.update({'Carrie': 'part time'})
+
+print('updated dict items:', sd_emp_status.items())
Because keys are unique, a dictionary cannot contain two keys with
+the same name. This means that adding an item using a key that is
+already present in the dictionary will cause the previous value to be
+overwritten.
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# As the key 'Carrie' already exists, its value will be overwritten
+sd_emp_status['Carrie'] ='full time'
+print('updated dict items:', sd_emp_status.items())
To remove a key-value pair for an existing dictionary, we can use the
+del keyword or the method pop(). Using
+pop() also enables us to return an alternate string if we
+trt to remove a non-existing key, which prevents our code from returning
+an error message that halts the analysis.
+
+
PYTHON
+
+
print('original dict items:', sd_emp_status.items())
+
+# Delete dictionary keys using del and pop()
+del sd_emp_status['Mohammad']
+sd_emp_status.pop('Carrie')
+sd_emp_status.pop('Anuradha', 'WARNING: key does not exist') # Does not generate an error
+
+print('modified dict items:', sd_emp_status.items())
Similar to lists, dictionaries can be nested as we can also store
+dictionaries as values inside a key-value pair using {}.
+Nested dictionaries are useful when we need to store unstructured data
+in a complex structure. For example, JSON data is commonly used for
+transmitting data in web applications and often exists in a nested
+structure that can be stored using nested dictionaries in Python.
+
+
PYTHON
+
+
# Individual dictionaries are enclosed in {} and separated by a comma
+nested_dict = {
+'dict_1': { # First key is a dictionary of key-value pairs
+'key_1a': 'value_1a',
+'key_1b': 'value_1b'
+ },
+'dict_2': { # Second key is another dictionary of key-value pairs
+'key_2a': 'value_2a',
+'key_2b': 'value_2b'
+ }
+ }
+
+print(nested_dict)
Similar to working with nested lists, to extract a value from the
+first sub-dictionary, we specify both the main dictionary and
+sub-dictionary keys using [].
+
+
PYTHON
+
+
# Extract the value for key 2a in dict_2
+print('original value:', nested_dict['dict_2']['key_2a'])
+
+# Adding or updating a value can be done through the same approach
+nested_dict['dict_2']['key_2a'] ="modified_value_2a"
+
+print('modified value:', nested_dict['dict_2']['key_2a'])
+
+
+
OUTPUT
+
+
original value: value_2a
+modified value: modified_value_2a
+
+
Optional: converting lists and dictionaries to Pandas data
+frames
+
+
+
+
Lists and dictionaries can be easily converted into a tabular Pandas
+data frame format. This can be useful when you need to create a small
+data set for unit testing purposes.
+
+
PYTHON
+
+
# Import pandas library
+import pandas as pd
+
+# Create a dictionary with each key-value pair representing a data frame column
+data = {
+'col_1': [3, 2, 1, 0],
+'col_2': ['a', 'b', 'c', 'd']
+ }
+
+df = pd.DataFrame.from_dict(data)
+
+print(df) # Outputs data as a tabular Pandas data frame
+print(type(df))
+
+
+
OUTPUT
+
+
col_1 col_2
+0 3 a
+1 2 b
+2 1 c
+3 0 d
+<class 'pandas.core.frame.DataFrame'>
+
+
+
+
+
+
+
Key Points
+
+
+
+
Lists can contain any Python object including other lists
+
Lists are ordered i.e. indexed and can therefore be sliced by index
+number
+
Unlike strings and integers, the values inside a list can be
+modified in place
+
A list which contains other lists is referred to as a nested
+list
+
Dictionaries behave like unordered lists and are defined using
+key-value pairs
+
Dictionary keys are unique
+
A dictionary which contains other dictionaries is referred to as a
+nested dictionary
+
Values inside nested lists and dictionaries can be accessed by an
+additional index
In the episode about visualizing
+data, we will see Python code that plots values of interest from our
+first inflammation dataset (inflammation-01.csv), which
+revealed some suspicious features.
+
We have a dozen data sets right now and potentially more on the way
+if Dr. Maverick can keep up their surprisingly fast clinical trial rate.
+We want to create plots for all of our data sets with a single
+statement. To do that, we’ll have to teach the computer how to repeat
+things.
+
An example task that we might want to repeat is accessing numbers in
+a list, which we will do by printing each number on a line of its
+own.
+
+
PYTHON
+
+
odds = [1, 3, 5, 7]
+
+
In Python, a list is basically an ordered
+collection of elements, and every element has a unique number associated
+with it — its index. This means that we can access elements in a list
+using their indices. For example, we can get the first number in the
+list odds, by using odds[0]. One way to print
+each number is to use four print statements:
Not scalable. Imagine you need to print a list
+that has hundreds of elements. It might be easier to type them in
+manually.
+
Difficult to maintain. If we want to decorate
+each printed element with an asterisk or any other character, we would
+have to change four lines of code. While this might not be a problem for
+small lists, it would definitely be a problem for longer ones.
+
Fragile. If we use it with a list that has more
+elements than what we initially envisioned, it will only display part of
+the list’s elements. A shorter list, on the other hand, will cause an
+error because it will be trying to display elements of the list that do
+not exist.
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-3-7974b6cdaf14> in <module>()
+ 3 print(odds[1])
+ 4 print(odds[2])
+----> 5 print(odds[3])
+
+IndexError: list index out of range
This is shorter — certainly shorter than something that prints every
+number in a hundred-number list — and more robust as well:
+
+
PYTHON
+
+
odds = [1, 3, 5, 7, 9, 11]
+for num in odds:
+print(num)
+
+
+
OUTPUT
+
+
1
+3
+5
+7
+9
+11
+
+
The improved version uses a for
+loop to repeat an operation — in this case, printing — once for each
+thing in a sequence. The general form of a loop is:
+
+
PYTHON
+
+
for variable in collection:
+# do things using variable, such as print
+
+
Using the odds example above, the loop might look like this:
+
where each number (num) in the variable
+odds is looped through and printed one number after
+another. The other numbers in the diagram denote which loop cycle the
+number was printed in (1 being the first loop cycle, and 6 being the
+final loop cycle).
+
We can call the loop
+variable anything we like, but there must be a colon at the end of
+the line starting the loop, and we must indent anything we want to run
+inside the loop. Unlike many other languages, there is no command to
+signify the end of the loop body (e.g., end for);
+everything indented after the for statement belongs to the
+loop.
+
+
+
+
+
+
What’s in a name?
+
+
+
In the example above, the loop variable was given the name
+num as a mnemonic; it is short for ‘number’. We can choose
+any name we want for variables. We might just as easily have chosen the
+name banana for the loop variable, as long as we use the
+same name when we invoke the variable inside the loop:
It is a good idea to choose variable names that are meaningful,
+otherwise it would be more difficult to understand what the loop is
+doing.
+
+
+
+
Here’s another loop that repeatedly updates a variable:
+
+
PYTHON
+
+
length =0
+names = ['Curie', 'Darwin', 'Turing']
+for value in names:
+ length = length +1
+print('There are', length, 'names in the list.')
+
+
+
OUTPUT
+
+
There are 3 names in the list.
+
+
It’s worth tracing the execution of this little program step by step.
+Since there are three names in names, the statement on line
+4 will be executed three times. The first time around,
+length is zero (the value assigned to it on line 1) and
+value is Curie. The statement adds 1 to the
+old value of length, producing 1, and updates
+length to refer to that new value. The next time around,
+value is Darwin and length is 1,
+so length is updated to be 2. After one more update,
+length is 3; since there is nothing left in
+names for Python to process, the loop finishes and the
+print function on line 5 tells us our final answer.
+
Note that a loop variable
+is a variable that is being used to record progress in a loop. It still
+exists after the loop is over, and we can re-use variables previously
+defined as loop variables as
+well:
+
+
PYTHON
+
+
name ='Rosalind'
+for name in ['Curie', 'Darwin', 'Turing']:
+print(name)
+print('after the loop, name is', name)
+
+
+
OUTPUT
+
+
Curie
+Darwin
+Turing
+after the loop, name is Turing
+
+
Note also that finding the length of an object is such a common
+operation that Python actually has a built-in function to do it called
+len:
+
+
PYTHON
+
+
print(len([0, 1, 2, 3]))
+
+
+
OUTPUT
+
+
4
+
+
len is much faster than any function we could write
+ourselves, and much easier to read than a two-line loop; it will also
+give us the length of many other data types we haven’t seen yet, so we
+should always use it when we can.
+
+
+
+
+
+
From 1 to N
+
+
+
Python has a built-in function called range that
+generates a sequence of numbers range can accept 1, 2, or 3
+parameters.
+
+
If one parameter is given, range generates a sequence
+of that length, starting at zero and incrementing by 1. For example,
+range(3) produces the numbers 0, 1, 2.
+
If two parameters are given, range starts at the first
+and ends just before the second, incrementing by one. For example,
+range(2, 5) produces 2, 3, 4.
+
If range is given 3 parameters, it starts at the first
+one, ends just before the second one, and increments by the third one.
+For example, range(3, 10, 2) produces
+3, 5, 7, 9.
+
+
Using range, write a loop that uses range
+to print the first 3 natural numbers:
+
+
OUTPUT
+
+
1
+2
+3
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
for number inrange(1, 4):
+print(number)
+
+
+
+
+
+
+
+
+
+
+
Understanding the loops
+
+
+
Given the following loop:
+
+
PYTHON
+
+
word ='oxygen'
+for letter in word:
+print(letter)
+
+
How many times is the body of the loop executed?
+
+
3 times
+
4 times
+
5 times
+
6 times
+
+
+
+
+
+
+
+
+
+
The body of the loop is executed 6 times.
+
+
+
+
+
+
+
+
+
+
Computing Powers With Loops
+
+
+
Exponentiation is built into Python:
+
+
PYTHON
+
+
print(5**3)
+
+
+
OUTPUT
+
+
125
+
+
Write a loop that calculates the same result as 5 ** 3
+using multiplication (and without exponentiation).
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
result =1
+for number inrange(0, 3):
+ result = result *5
+print(result)
+
+
+
+
+
+
+
+
+
+
+
Summing a List
+
+
+
Write a loop that calculates the sum of elements in a list by adding
+each element and printing the final value, so
+[124, 402, 36] prints 562
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
numbers = [124, 402, 36]
+summed =0
+for num in numbers:
+ summed = summed + num
+print(summed)
+
+
+
+
+
+
+
+
+
+
+
Computing the Value of a Polynomial
+
+
+
The built-in function enumerate takes a sequence (e.g.,
+a list) and generates a new sequence of the
+same length. Each element of the new sequence is a pair composed of the
+index (0, 1, 2,…) and the value from the original sequence:
+
+
PYTHON
+
+
for idx, val inenumerate(a_list):
+# Do something using idx and val
+
+
The code above loops through a_list, assigning the index
+to idx and the value to val.
+
Suppose you have encoded a polynomial as a list of coefficients in
+the following way: the first element is the constant term, the second
+element is the coefficient of the linear term, the third is the
+coefficient of the quadratic term, etc.
Write a loop using enumerate(coefs) which computes the
+value y of any polynomial, given x and
+coefs.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
y =0
+for idx, coef inenumerate(coefs):
+ y = y + coef * x**idx
+
+
+
+
+
+
Making Choices with Conditional Logic
+
+
+
+
How can we use Python to automatically recognize different situations
+we encounter with our data and take a different action for each? In this
+lesson, we’ll learn how to write code that runs only when certain
+conditions are true.
+
+
Conditionals
+
+
We can ask Python to take different actions, depending on a
+condition, with an if statement:
+
+
PYTHON
+
+
num =37
+if num >100:
+print('greater')
+else:
+print('not greater')
+print('done')
+
+
+
OUTPUT
+
+
not greater
+done
+
+
The second line of this code uses the keyword if to tell
+Python that we want to make a choice. If the test that follows the
+if statement is true, the body of the if
+(i.e., the set of lines indented underneath it) is executed, and
+“greater” is printed. If the test is false, the body of the
+else is executed instead, and “not greater” is printed.
+Only one or the other is ever executed before continuing on with program
+execution to print “done”:
+
Conditional
+statements don’t have to include an else. If there
+isn’t one, Python simply does nothing if the test is false:
+
+
PYTHON
+
+
num =53
+print('before conditional...')
+if num >100:
+print(num, 'is greater than 100')
+print('...after conditional')
+
+
+
OUTPUT
+
+
before conditional...
+...after conditional
+
+
We can also chain several tests together using elif,
+which is short for “else if”. The following Python code uses
+elif to print the sign of a number.
+
+
PYTHON
+
+
num =-3
+
+if num >0:
+print(num, 'is positive')
+elif num ==0:
+print(num, 'is zero')
+else:
+print(num, 'is negative')
+
+
+
OUTPUT
+
+
-3 is negative
+
+
Note that to test for equality we use a double equals sign
+== rather than a single equals sign = which is
+used to assign values.
+
+
+
+
+
+
Comparing in Python
+
+
+
Along with the > and == operators we
+have already used for comparing values in our conditionals, there are a
+few more options to know about:
+
+
+>: greater than
+
+<: less than
+
+==: equal to
+
+!=: does not equal
+
+>=: greater than or equal to
+
+<=: less than or equal to
+
+
+
+
+
We can also combine tests using and and or.
+and is only true if both parts are true:
+
+
PYTHON
+
+
if (1>0) and (-1>=0):
+print('both parts are true')
+else:
+print('at least one part is false')
+
+
+
OUTPUT
+
+
at least one part is false
+
+
while or is true if at least one part is true:
+
+
PYTHON
+
+
if (1<0) or (1>=0):
+print('at least one test is true')
+
+
+
OUTPUT
+
+
at least one test is true
+
+
+
+
+
+
+
+True and False
+
+
+
True and False are special words in Python
+called booleans, which represent truth values. A statement
+such as 1 < 0 returns the value False,
+while -1 < 0 returns the value True.
+
+
+
+
+
+
Checking Our Data
+
+
Now that we’ve seen how conditionals work, we can use them to check
+for the suspicious features we saw in our inflammation data. We are
+about to use functions provided by the numpy module again.
+Therefore, if you’re working in a new Python session, make sure to load
+the module with:
+
+
PYTHON
+
+
import numpy
+
+
From the first couple of plots, we saw that maximum daily
+inflammation exhibits a strange behavior and raises one unit a day.
+Wouldn’t it be a good idea to detect such behavior and report it as
+suspicious? Let’s do that! However, instead of checking every single day
+of the study, let’s merely check if maximum inflammation in the
+beginning (day 0) and in the middle (day 20) of the study are equal to
+the corresponding day numbers.
We also saw a different problem in the third dataset; the minima per
+day were all zero (looks like a healthy person snuck into our study). We
+can also check for this with an elif condition:
+
+
PYTHON
+
+
elif numpy.sum(numpy.amin(data, axis=0)) ==0:
+print('Minima add up to zero!')
+
+
And if neither of these conditions are true, we can use
+else to give the all-clear:
In this way, we have asked Python to do something different depending
+on the condition of our data. Here we printed messages in all cases, but
+we could also imagine not using the else catch-all so that
+messages are only printed when something is wrong, freeing us from
+having to manually examine every plot for features we’ve seen
+before.
Which of the following would be printed if you were to run this code?
+Why did you pick this answer?
+
+
A
+
B
+
C
+
B and C
+
+
+
+
+
+
+
+
+
+
C gets printed because the first two conditions,
+4 > 5 and 4 == 5, are not true, but
+4 < 5 is true. In this case, only one of these
+conditions can be true for at a time, but in other scenarios multiple
+elif conditions could be met. In these scenarios, only the
+action associated with the first true elif condition will
+occur, starting from the top of the conditional section.
+
This contrasts with the case of multiple if statements,
+where every action can occur as long as their condition is met.
+
+
+
+
+
+
+
+
+
+
+
What Is Truth?
+
+
+
True and False booleans are not the only
+values in Python that are true and false. In fact, any value
+can be used in an if or elif. After reading
+and running the code below, explain what the rule is for which values
+are considered true and which are > considered false.
+
+
PYTHON
+
+
if'':
+print('empty string is true')
+if'word':
+print('word is true')
+if []:
+print('empty list is true')
+if [1, 2, 3]:
+print('non-empty list is true')
+if0:
+print('zero is true')
+if1:
+print('one is true')
+
+
+
+
+
+
+
+
+
+
That’s Not Not What I Meant
+
+
+
Sometimes it is useful to check whether some condition is
+not true. The Boolean operator not can do this
+explicitly. After reading and running the code below, write some
+if statements that use not to test the rule
+that you formulated in the previous challenge.
+
+
PYTHON
+
+
ifnot'':
+print('empty string is not true')
+ifnot'word':
+print('word is not true')
+ifnotnotTrue:
+print('not not True is true')
+
+
+
+
+
+
+
+
+
+
Close Enough
+
+
+
Write some conditions that print True if the variable
+a is within 10% of the variable b and
+False otherwise. Compare your implementation with your
+partner’s. Do you get the same answer for all possible pairs of
+numbers?
a =5
+b =5.1
+
+ifabs(a - b) <=0.1*abs(b):
+print('True')
+else:
+print('False')
+
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
print(abs(a - b) <=0.1*abs(b))
+
+
This works because the Booleans True and
+False have string representations which can be printed.
+
+
+
+
+
+
+
+
+
+
In-Place Operators
+
+
+
Python (and most other languages in the C family) provides in-place operators that
+work like this:
+
+
PYTHON
+
+
x =1# original value
+x +=1# add one to x, assigning result back to x
+x *=3# multiply x by 3
+print(x)
+
+
+
OUTPUT
+
+
6
+
+
Write some code that sums the positive and negative numbers in a list
+separately, using in-place operators. Do you think the result is more or
+less readable than writing the same without in-place operators?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
positive_sum =0
+negative_sum =0
+test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
+for num in test_list:
+if num >0:
+ positive_sum += num
+elif num ==0:
+pass
+else:
+ negative_sum += num
+print(positive_sum, negative_sum)
+
+
Here pass means “don’t do anything”. In this particular
+case, it’s not actually needed, since if num == 0 neither
+sum needs to change, but it illustrates the use of elif and
+pass.
+
+
+
+
+
+
+
+
+
+
Sorting a List Into Buckets
+
+
+
In our data folder, large data sets are stored in files
+whose names start with “inflammation-” and small data sets – in files
+whose names start with “small-”. We also have some other files that we
+do not care about at this point. We’d like to break all these files into
+three lists called large_files, small_files,
+and other_files, respectively.
+
Add code to the template below to do this. Note that the string
+method startswith
+returns True if and only if the string it is called on
+starts with the string passed as an argument, that is:
+
+
PYTHON
+
+
'String'.startswith('Str')
+
+
+
OUTPUT
+
+
True
+
+
But
+
+
PYTHON
+
+
'String'.startswith('str')
+
+
+
OUTPUT
+
+
False
+
+
Use the following Python code as your starting point:
Write a loop that counts the number of vowels in a character
+string.
+
Test it on a few individual words and full sentences.
+
Once you are done, compare your solution to your neighbor’s. Did you
+make the same decisions about how to handle the letter ‘y’ (which some
+people think is a vowel, and some do not)?
+
+
+
Solution
+
+
vowels = 'aeiouAEIOU'
+sentence = 'Mary had a little lamb.'
+count = 0
+for char in sentence:
+ if char in vowels:
+ count += 1
+
+print('The number of vowels in this string is ' + str(count))
+
{.challenge}
+
+
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
+
Use for variable in sequence to process the elements of
+a sequence one at a time.
+
The body of a for loop must be indented.
+
Use len(thing) to determine the length of something
+that contains other values.
+
Use if condition to start a conditional statement,
+elif condition to provide additional tests, and
+else to provide a default.
+
The bodies of the branches of conditional statements must be
+indented.
+
Use == to test for equality.
+
+X and Y is only true if both X and
+Y are true.
+
+X or Y is true if either X or
+Y, or both, are true.
+
Zero, the empty string, and the empty list are considered false; all
+other numbers, strings, and lists are considered true.
What are functions, and how can I use them in Python?
+
How can I define new functions?
+
What’s the difference between defining and calling a function?
+
What happens when I call a function?
+
+
+
+
+
+
+
+
Objectives
+
+
identify what a function is
+
create new functions
+
Set default values for function parameters.
+
Explain why we should divide programs into small, single-purpose
+functions.
+
+
+
+
+
+
+
At this point, we’ve seen that code can have Python make decisions
+about what it sees in our data. What if we want to convert some of our
+data, like taking a temperature in Fahrenheit and converting it to
+Celsius. We could write something like this for converting a single
+number
But we would be in trouble as soon as we had to do this more than a
+couple times. Cutting and pasting it is going to make our code get very
+long and very repetitive, very quickly. We’d like a way to package our
+code so that it is easier to reuse, a shorthand way of re-executing
+longer pieces of code. In Python we can use ‘functions’. Let’s start by
+defining a function fahr_to_celsius that converts
+temperatures from Fahrenheit to Celsius:
+
+
PYTHON
+
+
def explicit_fahr_to_celsius(temp):
+# Assign the converted value to a variable
+ converted = ((temp -32) * (5/9))
+# Return the value of the new variable
+return converted
+
+def fahr_to_celsius(temp):
+# Return converted value more efficiently using the return
+# function without creating a new variable. This code does
+# the same thing as the previous function but it is more explicit
+# in explaining how the return command works.
+return ((temp -32) * (5/9))
+
+
The function definition opens with the keyword def
+followed by the name of the function (fahr_to_celsius) and
+a parenthesized list of parameter names (temp). The body of the function — the statements
+that are executed when it runs — is indented below the definition line.
+The body concludes with a return keyword followed by the
+return value.
+
When we call the function, the values we pass to it are assigned to
+those variables so that we can use them inside the function. Inside the
+function, we use a return
+statement to send a result back to whoever asked for it.
+
Let’s try running our function.
+
+
PYTHON
+
+
fahr_to_celsius(32)
+
+
This command should call our function, using “32” as the input and
+return the function value.
+
In fact, calling our own function is no different from calling any
+other function:
+
+
PYTHON
+
+
print('freezing point of water:', fahr_to_celsius(32), 'C')
+print('boiling point of water:', fahr_to_celsius(212), 'C')
+
+
+
OUTPUT
+
+
freezing point of water: 0.0 C
+boiling point of water: 100.0 C
+
+
We’ve successfully called the function that we defined, and we have
+access to the value that we returned.
+
Composing Functions
+
+
+
+
Now that we’ve seen how to turn Fahrenheit into Celsius, we can also
+write the function to turn Celsius into Kelvin:
+
+
PYTHON
+
+
def celsius_to_kelvin(temp_c):
+return temp_c +273.15
+
+print('freezing point of water in Kelvin:', celsius_to_kelvin(0.))
+
+
+
OUTPUT
+
+
freezing point of water in Kelvin: 273.15
+
+
What about converting Fahrenheit to Kelvin? We could write out the
+formula, but we don’t need to. Instead, we can compose the two functions we have
+already created:
+
+
PYTHON
+
+
def fahr_to_kelvin(temp_f):
+ temp_c = fahr_to_celsius(temp_f)
+ temp_k = celsius_to_kelvin(temp_c)
+return temp_k
+
+print('boiling point of water in Kelvin:', fahr_to_kelvin(212.0))
+
+
+
OUTPUT
+
+
boiling point of water in Kelvin: 373.15
+
+
This is our first taste of how larger programs are built: we define
+basic operations, then combine them in ever-larger chunks to get the
+effect we want. Real-life functions will usually be larger than the ones
+shown here — typically half a dozen to a few dozen lines — but they
+shouldn’t ever be much longer than that, or the next person who reads it
+won’t be able to understand what’s going on.
+
Variable Scope
+
+
+
+
In composing our temperature conversion functions, we created
+variables inside of those functions, temp,
+temp_c, temp_f, and temp_k. We
+refer to these variables as local variables because they no
+longer exist once the function is done executing. If we try to access
+their values outside of the function, we will encounter an error:
+
+
PYTHON
+
+
print('Again, temperature in Kelvin was:', temp_k)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-1-eed2471d229b> in <module>
+----> 1 print('Again, temperature in Kelvin was:', temp_k)
+
+NameError: name 'temp_k' is not defined
+
+
If you want to reuse the temperature in Kelvin after you have
+calculated it with fahr_to_kelvin, you can store the result
+of the function call in a variable:
+
+
PYTHON
+
+
temp_kelvin = fahr_to_kelvin(212.0)
+print('temperature in Kelvin was:', temp_kelvin)
+
+
+
OUTPUT
+
+
temperature in Kelvin was: 373.15
+
+
The variable temp_kelvin, being defined outside any
+function, is said to be global.
+
Inside a function, one can read the value of such global
+variables:
+
+
PYTHON
+
+
def print_temperatures():
+print('temperature in Fahrenheit was:', temp_fahr)
+print('temperature in Kelvin was:', temp_kelvin)
+
+temp_fahr =212.0
+temp_kelvin = fahr_to_kelvin(temp_fahr)
+
+print_temperatures()
+
+
+
OUTPUT
+
+
temperature in Fahrenheit was: 212.0
+temperature in Kelvin was: 373.15
+
+
By giving our functions human-readable names, we can more easily read
+and understand what is happening in the for loop. Even
+better, if at some later date we want to use either of those pieces of
+code again, we can do so in a single line.
+
Testing and Documenting
+
+
+
+
Once we start putting things in functions so that we can re-use them,
+we need to start testing that those functions are working correctly. To
+see how to do this, let’s write a function to offset a dataset so that
+it’s mean value shifts to a user-defined value:
We could test this on our actual data, but since we don’t know what
+the values ought to be, it will be hard to tell if the result was
+correct. Instead, let’s use NumPy to create a matrix of 0’s and then
+offset its values to have a mean value of 3:
+
+
PYTHON
+
+
z = numpy.zeros((2,2))
+print(offset_mean(z, 3))
+
+
+
OUTPUT
+
+
[[ 3. 3.]
+ [ 3. 3.]]
+
+
That looks right, so let’s try offset_mean on our real
+data:
+
+
PYTHON
+
+
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
+print(offset_mean(data, 0))
It’s hard to tell from the default output whether the result is
+correct, but there are a few tests that we can run to reassure us:
+
+
PYTHON
+
+
print('original min, mean, and max are:', numpy.amin(data), numpy.mean(data), numpy.amax(data))
+offset_data = offset_mean(data, 0)
+print('min, mean, and max of offset data are:',
+ numpy.amin(offset_data),
+ numpy.mean(offset_data),
+ numpy.amax(offset_data))
+
+
+
OUTPUT
+
+
original min, mean, and max are: 0.0 6.14875 20.0
+min, mean, and and max of offset data are: -6.14875 2.84217094304e-16 13.85125
+
+
That seems almost right: the original mean was about 6.1, so the
+lower bound from zero is now about -6.1. The mean of the offset data
+isn’t quite zero — we’ll explore why not in the challenges — but it’s
+pretty close. We can even go further and check that the standard
+deviation hasn’t changed:
+
+
PYTHON
+
+
print('std dev before and after:', numpy.std(data), numpy.std(offset_data))
+
+
+
OUTPUT
+
+
std dev before and after: 4.61383319712 4.61383319712
+
+
Those values look the same, but we probably wouldn’t notice if they
+were different in the sixth decimal place. Let’s do this instead:
+
+
PYTHON
+
+
print('difference in standard deviations before and after:',
+ numpy.std(data) - numpy.std(offset_data))
+
+
+
OUTPUT
+
+
difference in standard deviations before and after: -3.5527136788e-15
+
+
Again, the difference is very small. It’s still possible that our
+function is wrong, but it seems unlikely enough that we should probably
+get back to doing our analysis.
+
Documentation
+
+
+
+
We have one more task first, though: we should write some documentation for our function
+to remind ourselves later what it’s for and how to use it.
+
The usual way to put documentation in software is to add comments like this:
+
+
PYTHON
+
+
# offset_mean(data, target_mean_value):
+# return a new array containing the original data with its mean offset to match the desired value.
+def offset_mean(data, target_mean_value):
+return (data - numpy.mean(data)) + target_mean_value
+
+
There’s a better way, though. If the first thing in a function is a
+string that isn’t assigned to a variable, that string is attached to the
+function as its documentation:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value."""
+return (data - numpy.mean(data)) + target_mean_value
+
+
This is better because we can now ask Python’s built-in help system
+to show us the documentation for the function:
+
+
PYTHON
+
+
help(offset_mean)
+
+
+
OUTPUT
+
+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+ Return a new array containing the original data with its mean offset to match the desired value.
+
+
A string like this is called a docstring. We don’t need to use
+triple quotes when we write one, but if we do, we can break the string
+across multiple lines:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value.
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3], 0)
+ array([-1., 0., 1.])
+ """
+return (data - numpy.mean(data)) + target_mean_value
+
+help(offset_mean)
+
+
+
OUTPUT
+
+
Help on function offset_mean in module __main__:
+
+offset_mean(data, target_mean_value)
+ Return a new array containing the original data
+ with its mean offset to match the desired value.
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3], 0)
+ array([-1., 0., 1.])
+
+
Defining Defaults
+
+
+
+
We have passed parameters to functions in two ways: directly, as in
+type(data), and by name, as in
+numpy.loadtxt(fname='something.csv', delimiter=','). In
+fact, we can pass the filename to loadtxt without the
+fname=:
Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1041, in loa
+dtxt
+ dtype = np.dtype(dtype)
+ File "/Users/username/anaconda3/lib/python3.6/site-packages/numpy/core/_internal.py", line 199, in
+_commastring
+ newitem = (dtype, eval(repeats))
+ File "<string>", line 1
+ ,
+ ^
+SyntaxError: unexpected EOF while parsing
+
+
To understand what’s going on, and make our own functions easier to
+use, let’s re-define our offset_mean function like
+this:
+
+
PYTHON
+
+
def offset_mean(data, target_mean_value=0.0):
+"""Return a new array containing the original data
+ with its mean offset to match the desired value, (0 by default).
+
+ Examples
+ --------
+ >>> offset_mean([1, 2, 3])
+ array([-1., 0., 1.])
+ """
+return (data - numpy.mean(data)) + target_mean_value
+
+
The key change is that the second parameter is now written
+target_mean_value=0.0 instead of just
+target_mean_value. If we call the function with two
+arguments, it works as it did before:
But we can also now call it with just one parameter, in which case
+target_mean_value is automatically assigned the default value of 0.0:
+
+
PYTHON
+
+
more_data =5+ numpy.zeros((2, 2))
+print('data before mean offset:')
+print(more_data)
+print('offset data:')
+print(offset_mean(more_data))
+
+
+
OUTPUT
+
+
data before mean offset:
+[[ 5. 5.]
+ [ 5. 5.]]
+offset data:
+[[ 0. 0.]
+ [ 0. 0.]]
+
+
This is handy: if we usually want a function to work one way, but
+occasionally need it to do something else, we can allow people to pass a
+parameter when they need to but provide a default to make the normal
+case easier. The example below shows how Python matches values to
+parameters:
As this example shows, parameters are matched up from left to right,
+and any that haven’t been given a value explicitly get their default
+value. We can override this behavior by naming the value as we pass it
+in:
+
+
PYTHON
+
+
print('only setting the value of c')
+display(c=77)
+
+
+
OUTPUT
+
+
only setting the value of c
+a: 1 b: 2 c: 77
+
+
With that in hand, let’s look at the help for
+numpy.loadtxt:
+
+
PYTHON
+
+
help(numpy.loadtxt)
+
+
+
OUTPUT
+
+
Help on function loadtxt in module numpy.lib.npyio:
+
+loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, use
+cols=None, unpack=False, ndmin=0, encoding='bytes')
+ Load data from a text file.
+
+ Each row in the text file must have the same number of values.
+
+ Parameters
+ ----------
+...
+
+
There’s a lot of information here, but the most important part is the
+first couple of lines:
This tells us that loadtxt has one parameter called
+fname that doesn’t have a default value, and eight others
+that do. If we call the function like this:
+
+
PYTHON
+
+
numpy.loadtxt('inflammation-01.csv', ',')
+
+
then the filename is assigned to fname (which is what we
+want), but the delimiter string ',' is assigned to
+dtype rather than delimiter, because
+dtype is the second parameter in the list. However
+',' isn’t a known dtype so our code produced
+an error message when we tried to run it. When we call
+loadtxt we don’t have to provide fname= for
+the filename because it’s the first item in the list, but if we want the
+',' to be assigned to the variable delimiter,
+we do have to provide delimiter= for the second
+parameter since delimiter is not the second parameter in
+the list.
+
Readable functions
+
+
+
+
Consider these two functions:
+
+
PYTHON
+
+
def s(p):
+ a =0
+for v in p:
+ a += v
+ m = a /len(p)
+ d =0
+for v in p:
+ d += (v - m) * (v - m)
+return numpy.sqrt(d / (len(p) -1))
+
+def std_dev(sample):
+ sample_sum =0
+for value in sample:
+ sample_sum += value
+
+ sample_mean = sample_sum /len(sample)
+
+ sum_squared_devs =0
+for value in sample:
+ sum_squared_devs += (value - sample_mean) * (value - sample_mean)
+
+return numpy.sqrt(sum_squared_devs / (len(sample) -1))
+
+
The functions s and std_dev are
+computationally equivalent (they both calculate the sample standard
+deviation), but to a human reader, they look very different. You
+probably found std_dev much easier to read and understand
+than s.
+
As this example illustrates, both documentation and a programmer’s
+coding style combine to determine how easy it is for others to
+read and understand the programmer’s code. Choosing meaningful variable
+names and using blank spaces to break the code into logical “chunks” are
+helpful techniques for producing readable code. This is useful
+not only for sharing code with others, but also for the original
+programmer. If you need to revisit code that you wrote months ago and
+haven’t thought about since then, you will appreciate the value of
+readable code!
+
+
+
+
+
+
Combining Strings
+
+
+
“Adding” two strings produces their concatenation:
+'a' + 'b' is 'ab'. Write a function called
+fence that takes two parameters called
+original and wrapper and returns a new string
+that has the wrapper character at the beginning and end of the original.
+A call to your function should look like this:
+
+
PYTHON
+
+
print(fence('name', '*'))
+
+
+
OUTPUT
+
+
*name*
+
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def fence(original, wrapper):
+return wrapper + original + wrapper
+
+
+
+
+
+
+
+
+
+
+
Return versus print
+
+
+
Note that return and print are not
+interchangeable. print is a Python function that
+prints data to the screen. It enables us, users, see
+the data. return statement, on the other hand, makes data
+visible to the program. Let’s have a look at the following function:
+
+
PYTHON
+
+
def add(a, b):
+print(a + b)
+
+
Question: What will we see if we execute the
+following commands?
+
+
PYTHON
+
+
A = add(7, 3)
+print(A)
+
+
+
+
+
+
+
+
+
+
Python will first execute the function add with
+a = 7 and b = 3, and, therefore, print
+10. However, because function add does not
+have a line that starts with return (no return
+“statement”), it will, by default, return nothing which, in Python
+world, is called None. Therefore, A will be
+assigned to None and the last line (print(A))
+will print None. As a result, we will see:
+
+
OUTPUT
+
+
10
+None
+
+
+
+
+
+
+
+
+
+
+
Selecting Characters From Strings
+
+
+
If the variable s refers to a string, then
+s[0] is the string’s first character and s[-1]
+is its last. Write a function called outer that returns a
+string made up of just the first and last characters of its input. A
+call to your function should look like this:
Write a function rescale that takes an array as input
+and returns a corresponding array of values scaled to lie in the range
+0.0 to 1.0. (Hint: If L and H are the lowest
+and highest values in the original array, then the replacement for a
+value v should be (v-L) / (H-L).)
Run the commands help(numpy.arange) and
+help(numpy.linspace) to see how to use these functions to
+generate regularly-spaced values, then use those values to test your
+rescale function. Once you’ve successfully tested your
+function, add a docstring that explains what it does.
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
"""Takes an array as input, and returns a corresponding array scaled so
+that 0 corresponds to the minimum and 1 to the maximum value of the input array.
+
+Examples:
+>>> rescale(numpy.arange(10.0))
+array([ 0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
+ 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ])
+>>> rescale(numpy.linspace(0, 100, 5))
+array([ 0. , 0.25, 0.5 , 0.75, 1. ])
+"""
+
+
+
+
+
+
+
+
+
+
+
Defining Defaults
+
+
+
Rewrite the rescale function so that it scales data to
+lie between 0.0 and 1.0 by default, but will
+allow the caller to specify lower and upper bounds if they want. Compare
+your implementation to your neighbor’s: do the two functions always
+behave the same way?
+
+
+
+
+
+
+
+
+
+
PYTHON
+
+
def rescale(input_array, low_val=0.0, high_val=1.0):
+"""rescales input array values to lie between low_val and high_val"""
+ L = numpy.amin(input_array)
+ H = numpy.amax(input_array)
+ intermed_array = (input_array - L) / (H - L)
+ output_array = intermed_array * (high_val - low_val) + low_val
+return output_array
+
+
+
+
+
+
+
+
+
+
+
Variables Inside and Outside Functions
+
+
+
What does the following piece of code display when run — and why?
+
+
PYTHON
+
+
f =0
+k =0
+
+def f2k(f):
+ k = ((f -32) * (5.0/9.0)) +273.15
+return k
+
+print(f2k(8))
+print(f2k(41))
+print(f2k(32))
+
+print(k)
+
+
+
+
+
+
+
+
+
+
+
OUTPUT
+
+
259.81666666666666
+278.15
+273.15
+0
+
+
k is 0 because the k inside the function
+f2k doesn’t know about the k defined outside
+the function. When the f2k function is called, it creates a
+local variable
+k. The function does not return any values and does not
+alter k outside of its local copy. Therefore the original
+value of k remains unchanged. Beware that a local
+k is created because f2k internal statements
+affect a new value to it. If k was only
+read, it would simply retrieve the global k
+value.
+
+
+
+
+
+
+
+
+
+
Mixing Default and Non-Default Parameters
+
+
+
Given the following code:
+
+
PYTHON
+
+
def numbers(one, two=2, three, four=4):
+ n =str(one) +str(two) +str(three) +str(four)
+return n
+
+print(numbers(1, three=3))
+
+
what do you expect will be printed? What is actually printed? What
+rule do you think Python is following?
+
+
1234
+
one2three4
+
1239
+
SyntaxError
+
+
Given that, what does the following piece of code display when
+run?
+
+
PYTHON
+
+
def func(a, b=3, c=6):
+print('a: ', a, 'b: ', b, 'c:', c)
+
+func(-1, 2)
+
+
+
a: b: 3 c: 6
+
a: -1 b: 3 c: 6
+
a: -1 b: 2 c: 6
+
a: b: -1 c: 2
+
+
+
+
+
+
+
+
+
+
Attempting to define the numbers function results in
+4. SyntaxError. The defined parameters two and
+four are given default values. Because one and
+three are not given default values, they are required to be
+included as arguments when the function is called and must be placed
+before any parameters that have default values in the function
+definition.
+
The given call to func displays
+a: -1 b: 2 c: 6. -1 is assigned to the first parameter
+a, 2 is assigned to the next parameter b, and
+c is not passed a value, so it uses its default value
+6.
+
+
+
+
+
+
+
+
+
+
Readable Code
+
+
+
Revise a function you wrote for one of the previous exercises to try
+to make the code more readable. Then, collaborate with one of your
+neighbors to critique each other’s functions and discuss how your
+function implementations could be further improved to make them more
+readable.
+
+
+
+
+
+
+
+
+
Key Points
+
+
+
+
Define a function using
+def function_name(parameter).
+
The body of a function must be indented.
+
Call a function using function_name(value).
+
Numbers are stored as integers or floating-point numbers.
+
Variables defined within a function can only be seen and used within
+the body of the function.
+
Variables created outside of any function are called global
+variables.
+
Within a function, we can access global variables.
+
Variables created within a function override global variables if
+their names match.
+
Use help(thing) to view help for something.
+
Put docstrings in functions to provide help for that function.
+
Specify default values for parameters when defining a function using
+name=value in the parameter list.
+
Parameters can be passed by matching based on name, by position, or
+by omitting them (in which case the default value is used).
+
Put code whose parameters change frequently in a function, then call
+it with different parameter values to customize its behavior.
identify different errors and correct bugs associated with them
+
+
+
+
+
+
+
Every programmer encounters errors, both those who are just
+beginning, and those who have been programming for years. Encountering
+errors and exceptions can be very frustrating at times, and can make
+coding feel like a hopeless endeavour. However, understanding what the
+different types of errors are and when you are likely to encounter them
+can help a lot. Once you know why you get certain types of
+errors, they become much easier to fix.
+
Errors in Python have a very specific form, called a traceback. Let’s examine one:
+
+
PYTHON
+
+
# This code has an intentional error. You can type it directly or
+# use it for reference to understand the error message below.
+def favorite_ice_cream():
+ ice_creams = [
+'chocolate',
+'vanilla',
+'strawberry'
+ ]
+print(ice_creams[3])
+
+favorite_ice_cream()
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-1-70bd89baa4df> in <module>()
+ 9 print(ice_creams[3])
+ 10
+----> 11 favorite_ice_cream()
+
+<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
+ 7 'strawberry'
+ 8 ]
+----> 9 print(ice_creams[3])
+ 10
+ 11 favorite_ice_cream()
+
+IndexError: list index out of range
+
+
This particular traceback has two levels. You can determine the
+number of levels by looking for the number of arrows on the left hand
+side. In this case:
+
+
The first shows code from the cell above, with an arrow pointing
+to Line 11 (which is favorite_ice_cream()).
+
The second shows some code in the function
+favorite_ice_cream, with an arrow pointing to Line 9 (which
+is print(ice_creams[3])).
+
+
The last level is the actual place where the error occurred. The
+other level(s) show what function the program executed to get to the
+next level down. So, in this case, the program first performed a function call to the function
+favorite_ice_cream. Inside this function, the program
+encountered an error on Line 6, when it tried to run the code
+print(ice_creams[3]).
+
+
+
+
+
+
Long Tracebacks
+
+
+
Sometimes, you might see a traceback that is very long -- sometimes
+they might even be 20 levels deep! This can make it seem like something
+horrible happened, but the length of the error message does not reflect
+severity, rather, it indicates that your program called many functions
+before it encountered the error. Most of the time, the actual place
+where the error occurred is at the bottom-most level, so you can skip
+down the traceback to the bottom.
+
+
+
+
So what error did the program actually encounter? In the last line of
+the traceback, Python helpfully tells us the category or type of error
+(in this case, it is an IndexError) and a more detailed
+error message (in this case, it says “list index out of range”).
+
If you encounter an error and don’t know what it means, it is still
+important to read the traceback closely. That way, if you fix the error,
+but encounter a new one, you can tell that the error changed.
+Additionally, sometimes knowing where the error occurred is
+enough to fix it, even if you don’t entirely understand the message.
+
If you do encounter an error you don’t recognize, try looking at the
+official
+documentation on errors. However, note that you may not always be
+able to find the error there, as it is possible to create custom errors.
+In that case, hopefully the custom error message is informative enough
+to help you figure out what went wrong. Libraries like pandas and numpy
+have these custom errors, but the procedure to figure them out is the
+same: go to the earliest line in the error, and look at the error
+message for it. The documentation for these libraries will often provide
+the information you need about any functions you are using. There are
+also large communities of users for data libraries that can help as
+well!
+
+
+
+
+
+
Reading Error Messages
+
+
+
Read the Python code and the resulting traceback below, and answer
+the following questions:
+
+
How many levels does the traceback have?
+
What is the function name where the error occurred?
+
On which line number in this function did the error occur?
+
What is the type of error?
+
What is the error message?
+
+
+
PYTHON
+
+
# This code has an intentional error. Do not type it directly;
+# use it for reference to understand the error message below.
+def print_message(day):
+ messages = [
+'Hello, world!',
+'Today is Tuesday!',
+'It is the middle of the week.',
+'Today is Donnerstag in German!',
+'Last day of the week!',
+'Hooray for the weekend!',
+'Aw, the weekend is almost over.'
+ ]
+print(messages[day])
+
+def print_sunday_message():
+ print_message(7)
+
+print_sunday_message()
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-7-3ad455d81842> in <module>
+ 16 print_message(7)
+ 17
+---> 18 print_sunday_message()
+ 19
+
+<ipython-input-7-3ad455d81842> in print_sunday_message()
+ 14
+ 15 def print_sunday_message():
+---> 16 print_message(7)
+ 17
+ 18 print_sunday_message()
+
+<ipython-input-7-3ad455d81842> in print_message(day)
+ 11 'Aw, the weekend is almost over.'
+ 12 ]
+---> 13 print(messages[day])
+ 14
+ 15 def print_sunday_message():
+
+IndexError: list index out of range
+
+
+
+
+
+
+
+
+
+
+
3 levels
+
print_message
+
13
+
IndexError
+
+list index out of range You can then infer that
+7 is not the right index to use with
+messages.
+
+
+
+
+
+
+
+
+
+
+
Better errors on newer Pythons
+
+
+
Newer versions of Python have improved error printouts. If you are
+debugging errors, it is often helpful to use the latest Python version,
+even if you support older versions of Python.
+
+
+
+
Type Errors
+
+
+
+
One of the most common types of errors in Python are called type
+errors. These errors occur when you try to perform an operation on
+an object in python that cannot support it. This happens easily when
+working with large datasets where there are expected value types like
+either strings or integers. When we write a function expecting integers,
+we will not get an error until we encounter an operation that cannot
+handle strings. For example:
File "<ipython-input-3-6bb841ea1423>", line 3
+ letter=my_string["e"]
+ ^
+TypeError: string indices must be integers
+
+
We get this error because we are trying to use an index to access
+part of our string, which requires an integer. Instead, we entered a
+character and received a type error. This is fixed by replacing “e” with
+2.
+
In the case of datasets, we often see type errors when a mathematical
+operation, such as taking a mean, is performed on a column that contains
+characters, either as a result of formatting or introduced through
+error. As a result, correcting the error can involve simply removing the
+characters from the strings using regular expressions, or if the
+characters have resulted in incorrect data, removing those observations
+from the dataset.
+
Syntax Errors
+
+
+
+
When you forget a colon at the end of a line, accidentally add one
+space too many when indenting under an if statement, or
+forget a parenthesis, you will encounter a syntax error. This means that
+Python couldn’t figure out how to read your program. This is similar to
+forgetting punctuation in English: for example, this text is difficult
+to read there is no punctuation there is also no capitalization why is
+this hard because you have to figure out where each sentence ends you
+also have to figure out where each sentence begins to some extent it
+might be ambiguous if there should be a sentence break or not
+
People can typically figure out what is meant by text with no
+punctuation, but people are much smarter than computers. If Python
+doesn’t know how to read the program, it will give up and inform you
+with an error. For example:
Here, Python tells us that there is a SyntaxError on
+line 1, and even puts a little arrow in the place where there is an
+issue. In this case the problem is that the function definition is
+missing a colon at the end.
+
Actually, the function above has two issues with syntax. If
+we fix the problem with the colon, we see that there is also an
+IndentationError, which means that the lines in the
+function definition do not all have the same indentation:
Both SyntaxError and IndentationError
+indicate a problem with the syntax of your program, but an
+IndentationError is more specific: it always means
+that there is a problem with how your code is indented.
+
+
+
+
+
+
Tabs and Spaces
+
+
+
Some indentation errors are harder to spot than others. In
+particular, mixing spaces and tabs can be difficult to spot because they
+are both whitespace. In the
+example below, the first two lines in the body of the function
+some_function are indented with tabs, while the third line
+— with spaces. If you’re working in a Jupyter notebook, be sure to copy
+and paste this example rather than trying to type it in manually because
+Jupyter automatically replaces tabs with spaces.
Visually it is impossible to spot the error. Fortunately, Python does
+not allow you to mix tabs and spaces.
+
+
ERROR
+
+
File "<ipython-input-5-653b36fbcd41>", line 4
+ return msg
+ ^
+TabError: inconsistent use of tabs and spaces in indentation
+
+
+
+
+
Variable Name Errors
+
+
+
+
Another very common type of error is called a NameError,
+and occurs when you try to use a variable that does not exist. For
+example:
+
+
PYTHON
+
+
print(a)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-7-9d7b17ad5387> in <module>()
+----> 1 print(a)
+
+NameError: name 'a' is not defined
+
+
Variable name errors come with some of the most informative error
+messages, which are usually of the form “name ‘the_variable_name’ is not
+defined”.
+
Why does this error message occur? That’s a harder question to
+answer, because it depends on what your code is supposed to do. However,
+there are a few very common reasons why you might have an undefined
+variable. The first is that you meant to use a string, but forgot to put quotes around
+it:
+
+
PYTHON
+
+
print(hello)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-8-9553ee03b645> in <module>()
+----> 1 print(hello)
+
+NameError: name 'hello' is not defined
+
+
The second reason is that you might be trying to use a variable that
+does not yet exist. In the following example, count should
+have been defined (e.g., with count = 0) before the for
+loop:
+
+
PYTHON
+
+
for number inrange(10):
+ count = count + number
+print('The count is:', count)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-9-dd6a12d7ca5c> in <module>()
+ 1 for number in range(10):
+----> 2 count = count + number
+ 3 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+
Finally, the third possibility is that you made a typo when you were
+writing your code. Let’s say we fixed the error above by adding the line
+Count = 0 before the for loop. Frustratingly, this actually
+does not fix the error. Remember that variables are case-sensitive, so the variable
+count is different from Count. We still get
+the same error, because we still have not defined
+count:
+
+
PYTHON
+
+
Count =0
+for number inrange(10):
+ count = count + number
+print('The count is:', count)
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+NameError Traceback (most recent call last)
+<ipython-input-10-d77d40059aea> in <module>()
+ 1 Count = 0
+ 2 for number in range(10):
+----> 3 count = count + number
+ 4 print('The count is:', count)
+
+NameError: name 'count' is not defined
+
+
Index Errors
+
+
+
+
Next up are errors having to do with containers (like lists and
+strings) and the items within them. If you try to access an item in a
+list or a string that does not exist, then you will get an error. This
+makes sense: if you asked someone what day they would like to get
+coffee, and they answered “caturday”, you might be a bit annoyed. Python
+gets similarly annoyed if you try to ask it for an item that doesn’t
+exist:
---------------------------------------------------------------------------
+IndexError Traceback (most recent call last)
+<ipython-input-11-d817f55b7d6c> in <module>()
+ 3 print('Letter #2 is', letters[1])
+ 4 print('Letter #3 is', letters[2])
+----> 5 print('Letter #4 is', letters[3])
+
+IndexError: list index out of range
+
+
Here, Python is telling us that there is an IndexError
+in our code, meaning we tried to access a list index that did not
+exist.
+
File Errors
+
+
+
+
The last type of error we’ll cover today are the most common type of
+error when using Python with data, those associated with reading and
+writing files: FileNotFoundError. If you try to read a file
+that does not exist, you will receive a FileNotFoundError
+telling you so. If you attempt to write to a file that was opened
+read-only, Python 3 returns an UnsupportedOperationError.
+More generally, problems with input and output manifest as
+OSErrors, which may show up as a more specific subclass;
+you can see the
+list in the Python docs. They all have a unique UNIX
+errno, which is you can see in the error message.
+
+
PYTHON
+
+
file_handle =open('myfile.txt', 'r')
+
+
+
ERROR
+
+
---------------------------------------------------------------------------
+FileNotFoundError Traceback (most recent call last)
+<ipython-input-14-f6e1ac4aee96> in <module>()
+----> 1 file_handle = open('myfile.txt', 'r')
+
+FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
+
+
One reason for receiving this error is that you specified an
+incorrect path to the file. For example, if I am currently in a folder
+called myproject, and I have a file in
+myproject/writing/myfile.txt, but I try to open
+myfile.txt, this will fail. The correct path would be
+writing/myfile.txt. It is also possible that the file name
+or its path contains a typo. There may also be specific settings based
+on your organization if you are using shared, networked, or cloud-based
+drives. It is best to check with your IT administrators if you are still
+encountering issues reading in a file after troubleshooting.
+
A related issue can occur if you use the “read” flag instead of the
+“write” flag. Python will not give you an error if you try to open a
+file for writing when the file does not exist. However, if you meant to
+open a file for reading, but accidentally opened it for writing, and
+then try to read from it, you will get an
+UnsupportedOperation error telling you that the file was
+not opened for reading:
If you are getting a read or write error on file or folder that you
+are able to open and/or edit with other programs, you may need to
+contact an IT administrator to check the permissions granted to you and
+any programs you are using.
+
These are the most common errors with files, though many others
+exist. If you get an error that you’ve never seen before, searching the
+Internet for that error type often reveals common reasons why you might
+get that error.
+
+
+
+
+
+
Identifying Syntax Errors
+
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. Is it a
+SyntaxError or an IndentationError?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
+
PYTHON
+
+
def another_function
+print('Syntax errors are annoying.')
+print('But at least Python tells us about them!')
+print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+
+
+
SyntaxError for missing (): at end of first
+line, IndentationError for mismatch between second and
+third lines. A fixed version is:
+
+
PYTHON
+
+
def another_function():
+print('Syntax errors are annoying.')
+print('But at least Python tells us about them!')
+print('So they are usually not too hard to fix.')
+
+
+
+
+
+
+
+
+
+
+
Identifying Variable Name Errors
+
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. What type of
+NameError do you think this is? In other words, is it a
+string with no quotes, a misspelled variable, or a variable that should
+have been defined but was not?
+
Fix the error.
+
Repeat steps 2 and 3, until you have fixed all the errors.
+
+
+
PYTHON
+
+
for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (Number %3) ==0:
+ message = message + a
+else:
+ message = message +'b'
+print(message)
+
+
+
+
+
+
+
+
+
+
3 NameErrors for number being misspelled,
+for message not defined, and for a not being
+in quotes.
+
Fixed version:
+
+
PYTHON
+
+
message =''
+for number inrange(10):
+# use a if the number is a multiple of 3, otherwise use b
+if (number %3) ==0:
+ message = message +'a'
+else:
+ message = message +'b'
+print(message)
+
+
+
+
+
+
+
+
+
+
+
Identifying Index Errors
+
+
+
+
Read the code below, and (without running it) try to identify what
+the errors are.
+
Run the code, and read the error message. What type of error is
+it?
+
Fix the error.
+
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[4])
+
+
+
+
+
+
+
+
+
+
IndexError; the last entry is seasons[3],
+so seasons[4] doesn’t make sense. A fixed version is:
+
+
PYTHON
+
+
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
+print('My favorite season is ', seasons[-1])
+
+
+
+
+
+
A Final Note About Correcting Errors
+
+
+
+
There are a lot of very helpful answers for many error messages,
+however when working with official statistics, we need to also exercise
+some caution. Be aware and be wary of any answers that ask you to
+download a package from someone’s personal GitHub repository or other
+file sharing service. Try to find the type of error first and understand
+what the issue is before downloading anything claiming to fix the error.
+If the error is the result of an issue with a version of a package,
+check if there are any security vulnerabilities with that version, and
+use a package manager to move between package versions.
+
+
+
+
diff --git a/instructor/index.html b/instructor/index.html
new file mode 100644
index 0000000..b8b7fe1
--- /dev/null
+++ b/instructor/index.html
@@ -0,0 +1,555 @@
+
+Python for Official Statistics: Summary and Schedule
+ Skip to main content
+
+
+
+
+
+
+
+ Pre-Alpha
+
+ This lesson is in the pre-alpha phase, which means that it is in early development, but has not yet been taught.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Python for Official Statistics
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Summary and Schedule
+
+
+
Python for Official Statistics will teach participants the basics of
+Python for its use in creating Official Statistics. Participants will
+learn basic programming principles, and employ them in the manipulation
+of data and data structures.
+What basic data types can I work with in Python? How can I create a
+new variable in Python? How do I use a function? Can I change
+the value associated with a variable after I create it?
+
+What are functions, and how can I use them in Python? How can I
+define new functions? What’s the difference between defining and
+calling a function? What happens when I call a function?
+
+A value given to a function or program when it runs. The term is often
+used interchangeably (and inconsistently) with parameter.
+
+
assertion
+
+An expression which is supposed to be true at a particular point in a
+program. Programmers typically put assertions in their code to check for
+errors; if the assertion fails (i.e., if the expression evaluates as
+false), the program halts and produces an error message. See also: invariant, precondition, postcondition.
+
+
assign
+
+To give a value a name by associating a variable with it.
+
+
body
+
+(of a function): the statements that are executed when a function runs.
+
+
call stack
+
+A data structure inside a running program that keeps track of active
+function calls.
+
+
case-insensitive
+
+Treating text as if upper and lower case characters of the same letter
+were the same. See also: case-sensitive.
+
+
case-sensitive
+
+Treating text as if upper and lower case characters of the same letter
+are different. See also: case-insensitive.
+
+
comment
+
+A remark in a program that is intended to help human readers understand
+what is going on, but is ignored by the computer. Comments in Python, R,
+and the Unix shell start with a # character and run to the
+end of the line; comments in SQL start with --, and other
+languages have other conventions.
+
+
compose
+
+To apply one function to the result of another, such as
+f(g(x)).
+
+
conditional statement
+
+A statement in a program that might or might not be executed depending
+on whether a test is true or false.
+
+
comma-separated values
+
+(CSV) A common textual representation for tables in which the values in
+each row are separated by commas.
+
+
default value
+
+A value to use for a parameter if nothing is
+specified explicitly.
+
+
defensive programming
+
+The practice of writing programs that check their own operation to catch
+errors as early as possible.
+
+
delimiter
+
+A character or characters used to separate individual values, such as
+the commas between columns in a CSV file.
+
+
docstring
+
+Short for “documentation string”, this refers to textual documentation
+embedded in Python programs. Unlike comments, docstrings are preserved
+in the running program and can be examined in interactive sessions.
+
+
documentation
+
+Human-language text written to explain what software does, how it works,
+or how to use it.
+
+
dotted notation
+
+A two-part notation used in many programming languages in which
+thing.component refers to the component
+belonging to thing.
+
+
empty string
+
+A character string containing no characters, often thought of as the
+“zero” of text.
+
+
encapsulation
+
+The practice of hiding something’s implementation details so that the
+rest of a program can worry about what it does rather than
+how it does it.
+
+
floating-point number
+
+A number containing a fractional part and an exponent. See also: integer.
+
+
for loop
+
+A loop that is executed once for each value in some kind of set, list,
+or range. See also: while loop.
+
+
function
+
+A named group of instructions that is executed when the function’s name
+is used in the code. Occurrence of a function name in the code is a function call. Functions may process input arguments and return the result back. Functions may
+also be used for logically grouping together pieces of code. In such
+cases, they don’t need to return any meaningful value and can be written
+without the return
+statement completely. Such functions return a special value
+None, which is a way of saying “nothing” in Python.
+
+
function call
+
+A use of a function in another piece of software.
+
+
global variable
+
+A variable defined outside of a function. It can be used in global
+statements, and read inside functions.
+
+
heat map
+
+A graphical representation of two-dimensional data in which colors,
+ranging on a scale of hue or intensity, represent the data values.
+
+
Integrated Development
+Environment (IDE)
+
+the place where you write your code.
+
+
immutable
+
+Unchangeable. The value of immutable data cannot be altered after it has
+been created. See also: mutable.
+
+An operator such as += that provides a shorthand notation
+for the common case in which the variable being assigned to is also an
+operand on the right hand side of the assignment. For example, the
+statement x += 3 means the same thing as
+x = x + 3.
+
+
index
+
+A subscript that specifies the location of a single value in a
+collection, such as a single pixel in an image.
+
+
inner loop
+
+A loop that is inside another loop. See also: outer loop.
+
+An expression whose value doesn’t change during the execution of a
+program, typically used in an assertion. See
+also: precondition, postcondition.
+
+
library
+
+A family of code units (functions, classes, variables) that implement a
+set of related tasks.
+
+
local variable
+
+A variable defined inside of a function, that exists only in the scope
+of that function, meaning it cannot be accessed by code outside of the
+function.
+
+
loop variable
+
+The variable that keeps track of the progress of the loop.
+
+A function which is tied to a particular object.
+Each of an object’s methods typically implements one of the things it
+can do, or one of the questions it can answer.
+
+
mutable
+
+Changeable. The value of mutable data can be altered after it has been
+created. See immutable.”
+
+
notebook
+
+Interactive computational environment accessed via your web browser, in
+which you can write and execute Python code and combine it with
+explanatory text, mathematics and visualizations. Examples are IPython
+or Jupyter notebooks.
+
+
object
+
+A collection of conceptually related variables (members) and functions using those variables (methods).
+
+
outer loop
+
+A loop that contains another loop. See also: inner
+loop.
+
+
parameter
+
+A variable named in the function’s declaration that is used to hold a
+value passed into the call. The term is often used interchangeably (and
+inconsistently) with argument.
+
+
pipe
+
+A connection from the output of one program to the input of another.
+When two or more programs are connected in this way, they are called a
+“pipeline”.
+
+
postcondition
+
+A condition that a function (or other block of code) guarantees is true
+once it has finished running. Postconditions are often represented using
+assertions.
+
+
precondition
+
+A condition that must be true in order for a function (or other block of
+code) to run correctly.
+
+
regression
+
+To re-introduce a bug that was once fixed.
+
+
return statement
+
+A statement that causes a function to stop executing and return a value
+to its caller immediately.
+
+
RGB
+
+An additive model that represents
+colors as combinations of red, green, and blue. Each color’s value is
+typically in the range 0..255 (i.e., a one-byte integer).
+
+
sequence
+
+A collection of information that is presented in a specific order. For
+example, in Python, a string is a sequence of
+characters, while a list is a sequence of any variable.
+
+
shape
+
+An array’s dimensions, represented as a vector. For example, a 5×3
+array’s shape is (5,3).
+
+
silent failure
+
+Failing without producing any warning messages. Silent failures are hard
+to detect and debug.
+
+
slice
+
+A regular subsequence of a larger sequence, such as the first five
+elements or every second element.
+
+
stack frame
+
+A data structure that provides storage for a function’s local variables.
+Each time a function is called, a new stack frame is created and put on
+the top of the call stack. When the function
+returns, the stack frame is discarded.
+
+
standard input
+
+A process’s default input stream. In interactive command-line
+applications, it is typically connected to the keyboard; in a pipe, it receives data from the standard output of the preceding process.
+
+
standard output
+
+A process’s default output stream. In interactive command-line
+applications, data sent to standard output is displayed on the screen;
+in a pipe, it is passed to the standard input of the next process.
+
+
string
+
+Short for “character string”, a sequence of zero
+or more characters.
+
+
syntax
+
+The rules that define how code must be written for a computer to
+understand.
+
+
syntax error
+
+A programming error that occurs when statements are in an order or
+contain characters not expected by the programming language.
+
+
tab completion
+
+A feature of command-line interpreters, in which the program
+automatically fills in partially typed commands upon pressing the
+Tab key.
+
+
test oracle
+
+A program, device, data set, or human being against which the results of
+a test can be compared.
+
+
test-driven
+development
+
+The practice of writing unit tests before writing the code they
+test.
+
+
traceback
+
+The sequence of function calls that led to an error.
+
+The classification of something in a program (for example, the contents
+of a variable) as a kind of number (e.g. floating-point, integer), string, or something else.
+
+
type of error
+
+Indicates the nature of an error in a program. For example, in Python,
+an IOError to problems with file input/output. See also: syntax error.
+
+
variable
+
+A value that has a name associated with it.
+
+
while loop
+
+A loop that keeps executing as long as some condition is true. See also:
+for loop.
+
+A value given to a function or program when it runs. The term is often
+used interchangeably (and inconsistently) with parameter.
+
+
assertion
+
+An expression which is supposed to be true at a particular point in a
+program. Programmers typically put assertions in their code to check for
+errors; if the assertion fails (i.e., if the expression evaluates as
+false), the program halts and produces an error message. See also: invariant, precondition, postcondition.
+
+
assign
+
+To give a value a name by associating a variable with it.
+
+
body
+
+(of a function): the statements that are executed when a function runs.
+
+
call stack
+
+A data structure inside a running program that keeps track of active
+function calls.
+
+
case-insensitive
+
+Treating text as if upper and lower case characters of the same letter
+were the same. See also: case-sensitive.
+
+
case-sensitive
+
+Treating text as if upper and lower case characters of the same letter
+are different. See also: case-insensitive.
+
+
comment
+
+A remark in a program that is intended to help human readers understand
+what is going on, but is ignored by the computer. Comments in Python, R,
+and the Unix shell start with a # character and run to the
+end of the line; comments in SQL start with --, and other
+languages have other conventions.
+
+
compose
+
+To apply one function to the result of another, such as
+f(g(x)).
+
+
conditional statement
+
+A statement in a program that might or might not be executed depending
+on whether a test is true or false.
+
+
comma-separated values
+
+(CSV) A common textual representation for tables in which the values in
+each row are separated by commas.
+
+
default value
+
+A value to use for a parameter if nothing is
+specified explicitly.
+
+
defensive programming
+
+The practice of writing programs that check their own operation to catch
+errors as early as possible.
+
+
delimiter
+
+A character or characters used to separate individual values, such as
+the commas between columns in a CSV file.
+
+
docstring
+
+Short for “documentation string”, this refers to textual documentation
+embedded in Python programs. Unlike comments, docstrings are preserved
+in the running program and can be examined in interactive sessions.
+
+
documentation
+
+Human-language text written to explain what software does, how it works,
+or how to use it.
+
+
dotted notation
+
+A two-part notation used in many programming languages in which
+thing.component refers to the component
+belonging to thing.
+
+
empty string
+
+A character string containing no characters, often thought of as the
+“zero” of text.
+
+
encapsulation
+
+The practice of hiding something’s implementation details so that the
+rest of a program can worry about what it does rather than
+how it does it.
+
+
floating-point number
+
+A number containing a fractional part and an exponent. See also: integer.
+
+
for loop
+
+A loop that is executed once for each value in some kind of set, list,
+or range. See also: while loop.
+
+
function
+
+A named group of instructions that is executed when the function’s name
+is used in the code. Occurrence of a function name in the code is a function call. Functions may process input arguments and return the result back. Functions may
+also be used for logically grouping together pieces of code. In such
+cases, they don’t need to return any meaningful value and can be written
+without the return
+statement completely. Such functions return a special value
+None, which is a way of saying “nothing” in Python.
+
+
function call
+
+A use of a function in another piece of software.
+
+
global variable
+
+A variable defined outside of a function. It can be used in global
+statements, and read inside functions.
+
+
heat map
+
+A graphical representation of two-dimensional data in which colors,
+ranging on a scale of hue or intensity, represent the data values.
+
+
Integrated Development
+Environment (IDE)
+
+the place where you write your code.
+
+
immutable
+
+Unchangeable. The value of immutable data cannot be altered after it has
+been created. See also: mutable.
+
+An operator such as += that provides a shorthand notation
+for the common case in which the variable being assigned to is also an
+operand on the right hand side of the assignment. For example, the
+statement x += 3 means the same thing as
+x = x + 3.
+
+
index
+
+A subscript that specifies the location of a single value in a
+collection, such as a single pixel in an image.
+
+
inner loop
+
+A loop that is inside another loop. See also: outer loop.
+
+An expression whose value doesn’t change during the execution of a
+program, typically used in an assertion. See
+also: precondition, postcondition.
+
+
library
+
+A family of code units (functions, classes, variables) that implement a
+set of related tasks.
+
+
local variable
+
+A variable defined inside of a function, that exists only in the scope
+of that function, meaning it cannot be accessed by code outside of the
+function.
+
+
loop variable
+
+The variable that keeps track of the progress of the loop.
+
+A function which is tied to a particular object.
+Each of an object’s methods typically implements one of the things it
+can do, or one of the questions it can answer.
+
+
mutable
+
+Changeable. The value of mutable data can be altered after it has been
+created. See immutable.”
+
+
notebook
+
+Interactive computational environment accessed via your web browser, in
+which you can write and execute Python code and combine it with
+explanatory text, mathematics and visualizations. Examples are IPython
+or Jupyter notebooks.
+
+
object
+
+A collection of conceptually related variables (members) and functions using those variables (methods).
+
+
outer loop
+
+A loop that contains another loop. See also: inner
+loop.
+
+
parameter
+
+A variable named in the function’s declaration that is used to hold a
+value passed into the call. The term is often used interchangeably (and
+inconsistently) with argument.
+
+
pipe
+
+A connection from the output of one program to the input of another.
+When two or more programs are connected in this way, they are called a
+“pipeline”.
+
+
postcondition
+
+A condition that a function (or other block of code) guarantees is true
+once it has finished running. Postconditions are often represented using
+assertions.
+
+
precondition
+
+A condition that must be true in order for a function (or other block of
+code) to run correctly.
+
+
regression
+
+To re-introduce a bug that was once fixed.
+
+
return statement
+
+A statement that causes a function to stop executing and return a value
+to its caller immediately.
+
+
RGB
+
+An additive model that represents
+colors as combinations of red, green, and blue. Each color’s value is
+typically in the range 0..255 (i.e., a one-byte integer).
+
+
sequence
+
+A collection of information that is presented in a specific order. For
+example, in Python, a string is a sequence of
+characters, while a list is a sequence of any variable.
+
+
shape
+
+An array’s dimensions, represented as a vector. For example, a 5×3
+array’s shape is (5,3).
+
+
silent failure
+
+Failing without producing any warning messages. Silent failures are hard
+to detect and debug.
+
+
slice
+
+A regular subsequence of a larger sequence, such as the first five
+elements or every second element.
+
+
stack frame
+
+A data structure that provides storage for a function’s local variables.
+Each time a function is called, a new stack frame is created and put on
+the top of the call stack. When the function
+returns, the stack frame is discarded.
+
+
standard input
+
+A process’s default input stream. In interactive command-line
+applications, it is typically connected to the keyboard; in a pipe, it receives data from the standard output of the preceding process.
+
+
standard output
+
+A process’s default output stream. In interactive command-line
+applications, data sent to standard output is displayed on the screen;
+in a pipe, it is passed to the standard input of the next process.
+
+
string
+
+Short for “character string”, a sequence of zero
+or more characters.
+
+
syntax
+
+The rules that define how code must be written for a computer to
+understand.
+
+
syntax error
+
+A programming error that occurs when statements are in an order or
+contain characters not expected by the programming language.
+
+
tab completion
+
+A feature of command-line interpreters, in which the program
+automatically fills in partially typed commands upon pressing the
+Tab key.
+
+
test oracle
+
+A program, device, data set, or human being against which the results of
+a test can be compared.
+
+
test-driven
+development
+
+The practice of writing unit tests before writing the code they
+test.
+
+
traceback
+
+The sequence of function calls that led to an error.
+
+The classification of something in a program (for example, the contents
+of a variable) as a kind of number (e.g. floating-point, integer), string, or something else.
+
+
type of error
+
+Indicates the nature of an error in a program. For example, in Python,
+an IOError to problems with file input/output. See also: syntax error.
+
+
variable
+
+A value that has a name associated with it.
+
+
while loop
+
+A loop that keeps executing as long as some condition is true. See also:
+for loop.
+