Skip to content

Commit

Permalink
added sp24 midterm
Browse files Browse the repository at this point in the history
  • Loading branch information
ylesia-wu committed Aug 15, 2024
1 parent 29fd485 commit 48fbbab
Show file tree
Hide file tree
Showing 24 changed files with 1,397 additions and 1 deletion.
Binary file added assets/images/sp24-midterm/df.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/sp24-midterm/h.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/sp24-midterm/j.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/sp24-midterm/o.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/sp24-midterm/q4a.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/sp24-midterm/q4b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/sp24-midterm/q5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sp24-midterm/df.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sp24-midterm/h.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sp24-midterm/j.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sp24-midterm/o.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sp24-midterm/q4a.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sp24-midterm/q4b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sp24-midterm/q5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 13 additions & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -87,14 +87,26 @@ <h3>
<tbody>
<tr>
<th scope="row">
Spring 2024
</th>
<td>
Sam Lau
</td>
<td>
<a href='sp24-midterm/index.html'>Midterm 🆕</a> <br>
<!-- <a href='sp24-final/index.html'>Final 🆕</a> -->
</td>
</tr>
<tr>
<th scope="row">
Winter 2024
</th>
<td>
Suraj Rampure
</td>
<td>
<a href='wi24-midterm/index.html'>Midterm</a> <br>
<a href='wi24-final/index.html'>Final 🆕</a>
<a href='wi24-final/index.html'>Final</a>
</td>
</tr>
<tr>
Expand Down
833 changes: 833 additions & 0 deletions docs/sp24-midterm/index.html

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions pages/exams/sp24-midterm.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
title: 'Spring 2024 Midterm Exam'
instructors: Sam Lau
context: This exam was administered in-person. The exam was closed-notes, except students were allowed to bring a single two-sided notes sheet. No calculators were allowed. Students had **80 minutes** to take this exam.
show_solution: true
problems:
- sp24-midterm/sp24-mid-q01
- sp24-midterm/sp24-mid-q02
- sp24-midterm/sp24-mid-q03
- sp24-midterm/sp24-mid-q04
- sp24-midterm/sp24-mid-q05
- sp24-midterm/sp24-mid-q06
111 changes: 111 additions & 0 deletions problems/sp24-midterm/sp24-mid-q01.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# BEGIN PROB

Fill in Python code below so that the last line of each part evaluates to each desired result using the tables `h`, `o`, and `j` as shown on the Reference Sheet.

# BEGIN SUBPROB

Find the median duration of outages that happened in the early morning (before 8am).

```python
o.loc[__(a)__,__(b)__].median()
```

# BEGIN SOLN

**Answer:**

(a): `o['time'].dt.hour < 8`

(b): `'duration'`

# END SOLN

# END SUBPROB



# BEGIN SUBPROB

A Series containing the mean outage duration for outages that happened on the weekend and outages that happened on weekdays.

*Hint: If `s` is a Series of timestamps, `s.dt.dayofweek` returns a Series of integers where 0 is Monday and 6 is Sunday.*

```python
(o.assign(__(a)__)
.groupby(__(b)__)[__(c)__].mean())
```

# BEGIN SOLN

**Answer:**

(a): `is\_weekend=o['time'].dt.dayofweek >= 5`

(b): `'is\_weekend'`, (c): `'duration'`

# END SOLN

# END SUBPROB



# BEGIN SUBPROB

A DataFrame containing the proportion of 4-digit address numbers for each unique street in `h`.

```python
def foo(x):
lengths = __(a)__
return (lengths == 4).mean()

h.groupby(__(b)__).__(c)__(foo)
```

# BEGIN SOLN

**Answer:**

(a): `x.astype(str).str.len()`

(b): `'street'`

(c): `agg`

# END SOLN

# END SUBPROB



# BEGIN SUBPROB

What does the following code compute?

```python
a = h.merge(j, left_index=True, right_on='hid', how='left')
a.loc[a['oid'].isna(), 'hid'].shape[0]
```

( ) The number of addresses with exactly one outage.
( ) The number of addresses with at least one outage.
( ) The number of addresses with no outages.
( ) The total number of addresses affected by all power outages.
( ) The number of power outages.
( ) The number of power outages that affected exactly one address.
( ) The number of power outages that affected at least one address.
( ) The number of power outages that affected no addresses.
( ) 0
( ) The code will raise an error.
( ) None of the above.



# BEGIN SOLN

**Answer:** The number of addresses with no outages.

# END SOLN

# END SUBPROB

# END PROB
97 changes: 97 additions & 0 deletions problems/sp24-midterm/sp24-mid-q02.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# BEGIN PROB

# BEGIN SUBPROB

Consider the following code:

```python
whoa = (h.merge(j, left_index=True, right_on='hid', how='left')
.merge(o, left_on='oid', right_index=True, how='right')
.reset_index(drop=True))
```

Consider the following variables:

```python
a = j['hid'] <= 50
b = j['hid'] > 50
c = j['oid'] <= 100
d = j['oid'] > 100
e = (j[j['hid'] <= 50]
.groupby('hid')
.filter(lambda x: all(x['oid'] > 100))
['hid']
.nunique())
f = (j[j['oid'] <= 100]
.groupby('oid')
.filter(lambda x: all(x['hid'] > 50))
['oid']
.nunique())
g = len(set(h.index) - set(j['hid']))
i = len(set(o.index) - set(j['oid']))
```

Write a **single expression** that evaluates to the number of rows in `whoa`. In your code, you may only use the variables `a`, `b`, `c`, `d`, `e`, `f`, `g`, `i` as defined above, arithmetic and bitwise operators (`+`, `-`, `/`, `*`, `&`, `|`), and the `np.sum()` function. **You may not use any other variables or functions.** Your code might not need to use all of the variables defined above.


# BEGIN SOLN

**Answer:** `np.sum(a & c) + f + i`

We know that `h` has the numbers 1-50 as unique integers in its index, and `o` has the numbers 1-100 as unique integers in its index. However, the `hid` and `oid` columns in `j` have values outside these ranges. To approach this problem, it's easiest to come up with smaller versions of `h`, `j`, and `o`, then perform the join by hand. For example, consider the following example `h`, `j`, and `o` tables:

<center>
| **hid** |
|---------|
| 1 |
| 2 |
| 3 |
</center>

<center>
| **hid** | **oid** |
|---------|---------|
| 1 | 1 |
| 2 | 1 |
| 2 | 10 |
| 2 | 11 |
| 10 | 3 |
| 11 | 3 |
</center>

<center>
| **oid** |
|---------|
| 1 |
| 2 |
| 3 |
</center>

In this example, `whoa` would look like the following (omitting other columns besides `hid` and `oid` for brevity):

<center>
| **hid** | **oid** |
|---------|---------|
| 1 | 1 |
| 2 | 1 |
| NaN | 2 |
| NaN | 3 |
</center>

There are 3 cases where rows will be kept for `whoa`:

1. When both `hid` and `oid` match in the three tables (when `a` and `c` are both true). In the example above, this corresponds to the first two rows of `whoa`.
2. When the `oid` in `o` doesn't appear at all in `j` (calculated by `i`). In the example above, this corresponds to the third row of `whoa`.
3. When the `oid` in `o` does appear in `j`, but none of the `hid` values appear in `h` (calculated by `f`). In the example above, this corresponds to the fourth row of `whoa`.

Therefore, the number of rows in `whoa` is:

```python
np.sum(a & c) + f + i
```

# END SOLN

# END SUBPROB

# END PROB
74 changes: 74 additions & 0 deletions problems/sp24-midterm/sp24-mid-q03.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# BEGIN PROB

Consider the following code which defines a DataFrame named `df`:

```python
def hour(df): return df.assign(hour=df['time'].dt.hour)
def is_morning(df): return df.assign(is_morning=df['hour'] < 12)

df = (h.merge(j, left_index=True, right_on='hid', how='inner')
.merge(o, left_on='oid', right_index=True, how='inner')
.reset_index(drop=True)
.pipe(hour)
.pipe(is_morning))
```

The first few rows of df are shown below.

<center><img src="../../assets/images/sp24-midterm/df.png" width=750></center>

Suppose we define a DataFrame `p` and functions `a`, `b`, `c`, and `d` as follows:

```python
p = df.pivot_table(index='street', columns='hour', values='duration',
aggfunc='count', fill_value=0)

def a(n): return p[n].sum()
def b(s): return p.loc[s].sum()
def c(): return p.sum().sum()
def d(s, n): return p.loc[s, n]
```

Write a single expression to compute each of the probabilities below. **Your code can only use the functions `a`, `b`, `c`, `d`, and arithmetic operators (`+`, `-`, `/`, `*`).**

# BEGIN SUBPROB

The probability that a randomly selected row from `df` has the street `Mission Blvd`.

# BEGIN SOLN

**Answer:** `b('Mission Blvd') / c()`

# END SOLN

# END SUBPROB



# BEGIN SUBPROB

The probability that a randomly selected row from `df` has the street `Gilman Dr` given that its hour is `21`.

# BEGIN SOLN

**Answer:** `d('Gilman Dr', 21) / a(21)`

# END SOLN

# END SUBPROB



# BEGIN SUBPROB

The probability that a randomly selected row from `df` either has the street `Mission Blvd` or the hour `12`.

# BEGIN SOLN

**Answer:** `(b('Mission Blvd') + a(12) - d('Mission Blvd', 12)) / c()`

# END SOLN

# END SUBPROB

# END PROB
51 changes: 51 additions & 0 deletions problems/sp24-midterm/sp24-mid-q04.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# BEGIN PROB

# BEGIN SUBPROB

Consider the following pivot table created using the `df` table from Question~\ref{q:pivoting} which shows the average duration of power outages split by street name and whether the outage happened before 12pm.

<center><img src="../../assets/images/sp24-midterm/q4a.png" width=750></center>

Given only the information in this pivot table and the Reference Sheet, is it possible to observe Simpson's paradox for this data if we don't split by street? In other words, is it possible that the average duration of power outages before 12pm is lower than the average duration of power outages after 12pm?

( ) Yes
( ) No
( ) Need more information to determine

# BEGIN SOLN

**Answer:** Yes

Notice that the overall average of the durations when `is_morning=True` is a weighted average of the values in the `is_morning=True` column of the pivot table. This means that the overall average when `is_morning=True` must be between (44.93, 59.29). Likewise, the overall average when `is_morning=False` must be between (40.62, 52.78). This implies that it's possible for Simpson's paradox to happen, since the overall average when `is_morning=False` can be higher than the average when `is_morning=True`.


# END SOLN

# END SUBPROB



# BEGIN SUBPROB

Consider the following pivot table created using the `o` table, which shows the average duration of power outages split by whether the outage happened on the weekend and whether the outage happened before 12pm.

<center><img src="../../assets/images/sp24-midterm/q4b.png" width=750></center>

Given only the information in this pivot table and the Reference Sheet, is it possible to observe Simpson's paradox for this data if we don't split by `is_weekend`? In other words, is it possible that the average duration of power outages before 12pm is lower than the average duration of power outages after 12pm?

( ) Yes
( ) No
( ) Need more information to determine

# BEGIN SOLN

**Answer:** No

By the same logic as the previous part, the overall average when `is_morning=True` must be between (53.09, 58.64). The overall average when `is_morning=False` must be between (43.40, 51.67). This implies that Simpson's paradox cannot happen, since the overall average when `is_morning=False` will never be greater than the overall average when `is_morning=True`.


# END SOLN

# END SUBPROB

# END PROB
Loading

0 comments on commit 48fbbab

Please sign in to comment.