-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
24 changed files
with
1,397 additions
and
1 deletion.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
title: 'Spring 2024 Midterm Exam' | ||
instructors: Sam Lau | ||
context: This exam was administered in-person. The exam was closed-notes, except students were allowed to bring a single two-sided notes sheet. No calculators were allowed. Students had **80 minutes** to take this exam. | ||
show_solution: true | ||
problems: | ||
- sp24-midterm/sp24-mid-q01 | ||
- sp24-midterm/sp24-mid-q02 | ||
- sp24-midterm/sp24-mid-q03 | ||
- sp24-midterm/sp24-mid-q04 | ||
- sp24-midterm/sp24-mid-q05 | ||
- sp24-midterm/sp24-mid-q06 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
# BEGIN PROB | ||
|
||
Fill in Python code below so that the last line of each part evaluates to each desired result using the tables `h`, `o`, and `j` as shown on the Reference Sheet. | ||
|
||
# BEGIN SUBPROB | ||
|
||
Find the median duration of outages that happened in the early morning (before 8am). | ||
|
||
```python | ||
o.loc[__(a)__,__(b)__].median() | ||
``` | ||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** | ||
|
||
(a): `o['time'].dt.hour < 8` | ||
|
||
(b): `'duration'` | ||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
|
||
|
||
# BEGIN SUBPROB | ||
|
||
A Series containing the mean outage duration for outages that happened on the weekend and outages that happened on weekdays. | ||
|
||
*Hint: If `s` is a Series of timestamps, `s.dt.dayofweek` returns a Series of integers where 0 is Monday and 6 is Sunday.* | ||
|
||
```python | ||
(o.assign(__(a)__) | ||
.groupby(__(b)__)[__(c)__].mean()) | ||
``` | ||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** | ||
|
||
(a): `is\_weekend=o['time'].dt.dayofweek >= 5` | ||
|
||
(b): `'is\_weekend'`, (c): `'duration'` | ||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
|
||
|
||
# BEGIN SUBPROB | ||
|
||
A DataFrame containing the proportion of 4-digit address numbers for each unique street in `h`. | ||
|
||
```python | ||
def foo(x): | ||
lengths = __(a)__ | ||
return (lengths == 4).mean() | ||
|
||
h.groupby(__(b)__).__(c)__(foo) | ||
``` | ||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** | ||
|
||
(a): `x.astype(str).str.len()` | ||
|
||
(b): `'street'` | ||
|
||
(c): `agg` | ||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
|
||
|
||
# BEGIN SUBPROB | ||
|
||
What does the following code compute? | ||
|
||
```python | ||
a = h.merge(j, left_index=True, right_on='hid', how='left') | ||
a.loc[a['oid'].isna(), 'hid'].shape[0] | ||
``` | ||
|
||
( ) The number of addresses with exactly one outage. | ||
( ) The number of addresses with at least one outage. | ||
( ) The number of addresses with no outages. | ||
( ) The total number of addresses affected by all power outages. | ||
( ) The number of power outages. | ||
( ) The number of power outages that affected exactly one address. | ||
( ) The number of power outages that affected at least one address. | ||
( ) The number of power outages that affected no addresses. | ||
( ) 0 | ||
( ) The code will raise an error. | ||
( ) None of the above. | ||
|
||
|
||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** The number of addresses with no outages. | ||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
# END PROB |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
# BEGIN PROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
Consider the following code: | ||
|
||
```python | ||
whoa = (h.merge(j, left_index=True, right_on='hid', how='left') | ||
.merge(o, left_on='oid', right_index=True, how='right') | ||
.reset_index(drop=True)) | ||
``` | ||
|
||
Consider the following variables: | ||
|
||
```python | ||
a = j['hid'] <= 50 | ||
b = j['hid'] > 50 | ||
c = j['oid'] <= 100 | ||
d = j['oid'] > 100 | ||
e = (j[j['hid'] <= 50] | ||
.groupby('hid') | ||
.filter(lambda x: all(x['oid'] > 100)) | ||
['hid'] | ||
.nunique()) | ||
f = (j[j['oid'] <= 100] | ||
.groupby('oid') | ||
.filter(lambda x: all(x['hid'] > 50)) | ||
['oid'] | ||
.nunique()) | ||
g = len(set(h.index) - set(j['hid'])) | ||
i = len(set(o.index) - set(j['oid'])) | ||
``` | ||
|
||
Write a **single expression** that evaluates to the number of rows in `whoa`. In your code, you may only use the variables `a`, `b`, `c`, `d`, `e`, `f`, `g`, `i` as defined above, arithmetic and bitwise operators (`+`, `-`, `/`, `*`, `&`, `|`), and the `np.sum()` function. **You may not use any other variables or functions.** Your code might not need to use all of the variables defined above. | ||
|
||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** `np.sum(a & c) + f + i` | ||
|
||
We know that `h` has the numbers 1-50 as unique integers in its index, and `o` has the numbers 1-100 as unique integers in its index. However, the `hid` and `oid` columns in `j` have values outside these ranges. To approach this problem, it's easiest to come up with smaller versions of `h`, `j`, and `o`, then perform the join by hand. For example, consider the following example `h`, `j`, and `o` tables: | ||
|
||
<center> | ||
| **hid** | | ||
|---------| | ||
| 1 | | ||
| 2 | | ||
| 3 | | ||
</center> | ||
|
||
<center> | ||
| **hid** | **oid** | | ||
|---------|---------| | ||
| 1 | 1 | | ||
| 2 | 1 | | ||
| 2 | 10 | | ||
| 2 | 11 | | ||
| 10 | 3 | | ||
| 11 | 3 | | ||
</center> | ||
|
||
<center> | ||
| **oid** | | ||
|---------| | ||
| 1 | | ||
| 2 | | ||
| 3 | | ||
</center> | ||
|
||
In this example, `whoa` would look like the following (omitting other columns besides `hid` and `oid` for brevity): | ||
|
||
<center> | ||
| **hid** | **oid** | | ||
|---------|---------| | ||
| 1 | 1 | | ||
| 2 | 1 | | ||
| NaN | 2 | | ||
| NaN | 3 | | ||
</center> | ||
|
||
There are 3 cases where rows will be kept for `whoa`: | ||
|
||
1. When both `hid` and `oid` match in the three tables (when `a` and `c` are both true). In the example above, this corresponds to the first two rows of `whoa`. | ||
2. When the `oid` in `o` doesn't appear at all in `j` (calculated by `i`). In the example above, this corresponds to the third row of `whoa`. | ||
3. When the `oid` in `o` does appear in `j`, but none of the `hid` values appear in `h` (calculated by `f`). In the example above, this corresponds to the fourth row of `whoa`. | ||
|
||
Therefore, the number of rows in `whoa` is: | ||
|
||
```python | ||
np.sum(a & c) + f + i | ||
``` | ||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
# END PROB |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# BEGIN PROB | ||
|
||
Consider the following code which defines a DataFrame named `df`: | ||
|
||
```python | ||
def hour(df): return df.assign(hour=df['time'].dt.hour) | ||
def is_morning(df): return df.assign(is_morning=df['hour'] < 12) | ||
|
||
df = (h.merge(j, left_index=True, right_on='hid', how='inner') | ||
.merge(o, left_on='oid', right_index=True, how='inner') | ||
.reset_index(drop=True) | ||
.pipe(hour) | ||
.pipe(is_morning)) | ||
``` | ||
|
||
The first few rows of df are shown below. | ||
|
||
<center><img src="../../assets/images/sp24-midterm/df.png" width=750></center> | ||
|
||
Suppose we define a DataFrame `p` and functions `a`, `b`, `c`, and `d` as follows: | ||
|
||
```python | ||
p = df.pivot_table(index='street', columns='hour', values='duration', | ||
aggfunc='count', fill_value=0) | ||
|
||
def a(n): return p[n].sum() | ||
def b(s): return p.loc[s].sum() | ||
def c(): return p.sum().sum() | ||
def d(s, n): return p.loc[s, n] | ||
``` | ||
|
||
Write a single expression to compute each of the probabilities below. **Your code can only use the functions `a`, `b`, `c`, `d`, and arithmetic operators (`+`, `-`, `/`, `*`).** | ||
|
||
# BEGIN SUBPROB | ||
|
||
The probability that a randomly selected row from `df` has the street `Mission Blvd`. | ||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** `b('Mission Blvd') / c()` | ||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
|
||
|
||
# BEGIN SUBPROB | ||
|
||
The probability that a randomly selected row from `df` has the street `Gilman Dr` given that its hour is `21`. | ||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** `d('Gilman Dr', 21) / a(21)` | ||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
|
||
|
||
# BEGIN SUBPROB | ||
|
||
The probability that a randomly selected row from `df` either has the street `Mission Blvd` or the hour `12`. | ||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** `(b('Mission Blvd') + a(12) - d('Mission Blvd', 12)) / c()` | ||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
# END PROB |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# BEGIN PROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
Consider the following pivot table created using the `df` table from Question~\ref{q:pivoting} which shows the average duration of power outages split by street name and whether the outage happened before 12pm. | ||
|
||
<center><img src="../../assets/images/sp24-midterm/q4a.png" width=750></center> | ||
|
||
Given only the information in this pivot table and the Reference Sheet, is it possible to observe Simpson's paradox for this data if we don't split by street? In other words, is it possible that the average duration of power outages before 12pm is lower than the average duration of power outages after 12pm? | ||
|
||
( ) Yes | ||
( ) No | ||
( ) Need more information to determine | ||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** Yes | ||
|
||
Notice that the overall average of the durations when `is_morning=True` is a weighted average of the values in the `is_morning=True` column of the pivot table. This means that the overall average when `is_morning=True` must be between (44.93, 59.29). Likewise, the overall average when `is_morning=False` must be between (40.62, 52.78). This implies that it's possible for Simpson's paradox to happen, since the overall average when `is_morning=False` can be higher than the average when `is_morning=True`. | ||
|
||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
|
||
|
||
# BEGIN SUBPROB | ||
|
||
Consider the following pivot table created using the `o` table, which shows the average duration of power outages split by whether the outage happened on the weekend and whether the outage happened before 12pm. | ||
|
||
<center><img src="../../assets/images/sp24-midterm/q4b.png" width=750></center> | ||
|
||
Given only the information in this pivot table and the Reference Sheet, is it possible to observe Simpson's paradox for this data if we don't split by `is_weekend`? In other words, is it possible that the average duration of power outages before 12pm is lower than the average duration of power outages after 12pm? | ||
|
||
( ) Yes | ||
( ) No | ||
( ) Need more information to determine | ||
|
||
# BEGIN SOLN | ||
|
||
**Answer:** No | ||
|
||
By the same logic as the previous part, the overall average when `is_morning=True` must be between (53.09, 58.64). The overall average when `is_morning=False` must be between (43.40, 51.67). This implies that Simpson's paradox cannot happen, since the overall average when `is_morning=False` will never be greater than the overall average when `is_morning=True`. | ||
|
||
|
||
# END SOLN | ||
|
||
# END SUBPROB | ||
|
||
# END PROB |
Oops, something went wrong.