-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
pallavisprabhu
authored and
pallavisprabhu
committed
Nov 26, 2024
1 parent
f3516d8
commit 8291ca7
Showing
12 changed files
with
975 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
title: 'Summer 2024 Final Exam' | ||
instructors: Nishant Kheterpal | ||
context: This exam was administered in-person. The exam was closed-notes, except students were provided a copy of the <a href='https://drive.google.com/file/d/1ky0Np67HS2O4LO913P-ing97SJG0j27n/view'>DSC 10 Reference Sheet</a>. No calculators were allowed. Students had **3 hours** to take this exam. | ||
show_solution: true | ||
data_info: su24-final/data-info | ||
problems: | ||
- su24-final/q01 | ||
- su24-final/q02 | ||
- su24-final/q03 | ||
- su24-final/q04 | ||
- su24-final/q05 | ||
- su24-final/q06 | ||
- su24-final/q07 | ||
- su24-final/q08 | ||
- su24-final/q09 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
In this exam, you’ll work with a data set representing the results of the Tour de France, a | ||
multi-stage, weeks-long cycling race. The Tour de France takes place over many days each | ||
year, and on each day, the riders compete in individual races called `stages`. Each `stage` is | ||
a standalone race, and the winner of the entire tour is determined by who performs the best | ||
across all of the individual `stages` combined. Each row represents one stage of the Tour (or | ||
equivalently, one day of racing). This dataset will be called `stages`. | ||
|
||
The columns of `stages` are as follows: | ||
- `"Stage" (int):` The stage number for the respective year. | ||
- `"Date" (str):` The day that the stage took place, formatted as ”YYYY-MM-DD.” | ||
- `"Distance" (float):` The distance of the stage in kilometers. | ||
- `"Origin" (str):` The name of the city in which the stage starts. | ||
- `"Destination" (str):` The name of the city in which the stage ends. | ||
- `"Type" (str):` The type of the stage. | ||
- `"Winner" (str):` The name of the rider who won the stage | ||
- `"Winner Country" (str):` The country from which the winning rider of the stage is from | ||
|
||
The first few rows of `stages` are shown below, though `stages` has many more rows than | ||
pictured. | ||
|
||
<center><img src='../assets/images/su24-final/tour_df.png' width=800></center> | ||
<br> | ||
|
||
Throughout this exam, we will refer to `stages` repeatedly. | ||
Assume that we have already run `import babypandas as bpd `and `import numpy as np`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# BEGIN PROB | ||
|
||
\[(23 pts)\] | ||
|
||
# BEGIN SUBPROB | ||
|
||
Fill in the blanks so that the expression below evaluates to the | ||
*proportion* of stages won by the country with the most stage wins. | ||
|
||
stages.groupby(__(i)__).__(ii)__.get("Type").__(iii)__ / stages.shape[0] | ||
|
||
`(i)` : | ||
|
||
`(ii)` : | ||
|
||
`(iii)` : | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
The distance of a stage alone does not encapsulate its difficulty, as | ||
riders feel more tired as the tour goes on. Because of this, we want to | ||
consider "real distance,\" a measurement of the length of a stage that | ||
takes into account how far into the tour the riders are. The "real | ||
distance\" is calculated with the following process: | ||
|
||
(i) Add one to the stage number. | ||
|
||
(ii) Take the square root of the result of (i). | ||
|
||
(iii) Multiply the result of (ii) by the raw distance of the stage. | ||
|
||
Complete the implementation of the function `real_distance`, which takes | ||
in `stages` (a DataFrame), `stage` (a string, the name of the column | ||
containing stage numbers), and `distance` (a string, the name of the | ||
column containing stage distances). `real_distance` returns a Series | ||
containing all of the "real distances\" of the stages, as calculated | ||
above. | ||
|
||
def real_distance(stages, stage, distance): | ||
________ | ||
|
||
::: responsebox | ||
1in `return stages.get(distance) * np.sqrt(stages.get(stage) + 1)` | ||
::: | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
Sometimes, stages are repeated in different editions of the Tour de | ||
France, meaning that there are some pairs of `"Origin"` and | ||
`"Destination"` that appear more than once in `stages`. Fill in the | ||
blanks so that the expression below evaluates how often the most common | ||
`"Origin"` and `"Destination"` pair in the `stages` DataFrame appears. | ||
|
||
``` {xleftmargin="-1.5cm"} | ||
stages.groupby(__(i)__).__(ii)__.sort_values(by = "Date").get("Type").iloc[__(iii)__] | ||
``` | ||
|
||
`(i)` : | ||
|
||
`(ii)` : | ||
|
||
`(iii)` : | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
Fill in the blanks so that the value of `mystery_three` is the | ||
`"Destination"` of the longest stage before Stage 12. | ||
|
||
mystery = stages[stages.get(__(i)__) < 12] | ||
mystery_two = mystery.sort_values(by = "Distance", ascending = __(ii)__) | ||
mystery_three = mystery_two.get(__(iii)__).iloc[-1] | ||
|
||
`(i)` : | ||
|
||
`(ii)` : | ||
|
||
`(iii)` : | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# END PROB |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# BEGIN PROB | ||
|
||
Suppose we run the following code to simulate the winners of the Tour de | ||
France.\ | ||
|
||
evenepoel_wins = 0 | ||
vingegaard_wins = 0 | ||
pogacar_wins = 0 | ||
for i in np.arange(4): | ||
result = np.random.multinomial(1, [0.3, 0.3, 0.4]) | ||
if result[0] == 1: | ||
evenepoel_wins = evenepoel_wins + 1 | ||
elif result[1] == 1: | ||
vingegaard_wins = vingegaard_wins + 1 | ||
elif result[2] == 1: | ||
pogacar_wins = pogacar_wins + 1 | ||
|
||
# BEGIN SUBPROB | ||
|
||
What is the probability that `pogacar_wins` is equal to 4 when the code | ||
finishes running? Do not simplify your answer. | ||
|
||
::: center | ||
::: | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
What is the probability that `evenepoel_wins` is at least 1 when the | ||
code finishes running? Do not simplify your answer. | ||
|
||
::: center | ||
::: | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# END PROB |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# BEGIN PROB | ||
|
||
\[(12 pts)\] We want to estimate the mean distance of Tour de France | ||
stages by bootstrapping 10,000 times and constructing a 90% confidence | ||
interval for the mean. In this question, suppose `random_stages` is a | ||
random sample of size 500 drawn with replacement from `stages`. Identify | ||
the line numbers with errors in the code below. In the adjacent box, | ||
point out the error by describing the mistake in less than 10 words or | ||
writing a code snippet (correct only the part you think is wrong). You | ||
may or may not need all the spaces provided below to identify errors. | ||
|
||
line 1: means = np.array([]) | ||
line 2: | ||
line 3: for i in 10000: | ||
line 4: resample = random_stages.sample(10000) | ||
line 5: resample_mean = resample.get("Distance").mean() | ||
line 6: np.append(means, resample_mean) | ||
line 7: | ||
line 8: left_bound = np.percentile(means, 0) | ||
line 9: right_bound = np.percentile(means, 90) | ||
|
||
`a) ` | ||
|
||
`b) ` | ||
|
||
`c) ` | ||
|
||
`d) ` | ||
|
||
`e) ` | ||
|
||
`f) ` | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END PROB |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
# BEGIN PROB | ||
|
||
\[(16.5 pts)\] | ||
|
||
Below is a density histogram representing the distribution of randomly | ||
sampled stage distances. | ||
|
||
::: center | ||
![image](final_images/histogram.png) | ||
::: | ||
|
||
# BEGIN SUBPROB | ||
|
||
Which statement below correctly describes the relationship between the | ||
mean and the median of the sampled stage distances? | ||
|
||
( ) The mean is significantly larger than the median. | ||
|
||
( ) The mean is significantly smaller than the median. | ||
|
||
( ) The mean is approximately equal to the median. | ||
|
||
( ) It is impossible to know the relationship between the mean and the | ||
median. | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB # BEGIN SUBPROB | ||
|
||
Assume there are 100 stages in the random sample that generated this | ||
plot. If there are 5 stages in the bin `[275, 300)`, approximately how | ||
many stages are in the bin `[200, 225)`? | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
Assume the mean distance is 200 km and the standard deviation is 50 km. | ||
At least what proportion of stage distances are guaranteed to lie | ||
between 0 km and 400 km? Do not simplify your answer. | ||
|
||
::: responsebox | ||
1in Using Chebyshev's inequality, we know at least $1 - \frac{1}{z^2}$ | ||
of the data lies within $z$ SDs. Here, $z = 4$ so we know | ||
$1 - \frac{1}{16} = \frac{15}{16}$ of the data lie in that range. | ||
::: | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
Again, assume the mean stage distance is 200 km and the standard | ||
deviation is 50 km. Now, suppose we take a random sample of size 25 from | ||
the stage distances, calculate the mean stage distance of this sample, | ||
and repeat this process 500 times. What proportion of the means that we | ||
calculate will fall between 190 km and 210 km? Do not simplify your | ||
answer. | ||
|
||
::: responsebox | ||
0.82in We know about 68% of values lie within 1 standard deviation of | ||
the mean of any normal distribution. The distribution of means of | ||
samples of size 25 from this dataset is normally distributed with mean | ||
200km and SD $\frac{50}{\sqrt{25}} = 10$, so 190km to 210km contains 68% | ||
of the values. | ||
::: | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
(3.5 pts) Assume the mean distance is 200 km and the standard deviation | ||
is 50 km. Suppose we use the Central Limit Theorem to generate a 95% | ||
confidence interval for the true mean distance of all Tour de France | ||
stages, and get the interval $[190\text{ km}, 210\text{ km}]$. Which of | ||
the following interpretations of this confidence interval are correct? | ||
|
||
[ ] 95% of Tour de France stage distances fall between 190 km and 210 | ||
km. | ||
|
||
[ ] There is a 95% chance that the true mean distance of all Tour de | ||
France stages is\ | ||
between 190 km and 210 km. | ||
|
||
[ ] We are 95% confident that the true mean distance of all Tour de | ||
France stages is\ | ||
between 190 km and 210 km. | ||
|
||
[ ] Our sample is of size 100. | ||
|
||
[ ] Our sample is of size 25. | ||
|
||
[ ] If we collected many original samples and constructed many 95% | ||
confidence inter-\ | ||
vals, then exactly 95% of those intervals would contain the true mean | ||
distance. | ||
|
||
[ ] If we collected many original samples and constructed many 95% | ||
confidence inter-\ | ||
vals, then roughly 95% of those intervals would contain the true mean | ||
distance. | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# BEGIN SUBPROB | ||
|
||
Suppose we take 500 random samples of size 100 from the stage distances, | ||
calculate their means, and draw a histogram of the distribution of these | ||
sample means. We label this Histogram A. Then, we take 500 random | ||
samples of size 1000 from the stage distances, calculate their means, | ||
and draw a histogram of the distribution of these sample means. We label | ||
this Histogram B. Fill in the blanks so that the sentence below | ||
correctly describes how Histogram B looks in comparison to Histogram A. | ||
|
||
::: center | ||
"Relative to Histogram A, Histogram B would appear [ (i) | ||
]{.underline} and shifted [ (ii) ]{.underline} due to the [ | ||
(iii) ]{.underline} mean and the [ (iv) ]{.underline} standard | ||
deviation.\" | ||
::: | ||
|
||
(i): ( ) thinner ( ) wider ( ) the same width ( ) unknown | ||
|
||
(ii): ( ) left ( ) right ( ) not at all ( ) unknown | ||
|
||
(iii): ( ) larger ( ) smaller ( ) unchanged ( ) unknown | ||
|
||
(iv): ( ) larger ( ) smaller ( ) unchanged ( ) unknown | ||
|
||
# BEGIN SOLUTION | ||
|
||
# END SOLUTION | ||
|
||
# END SUBPROB | ||
|
||
# END PROB |
Oops, something went wrong.