forked from guindilla/coursera-statistics-002
-
Notifications
You must be signed in to change notification settings - Fork 0
/
unit1-quiz.R
51 lines (37 loc) · 5.46 KB
/
unit1-quiz.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
## Question 1: Consider the table below describing a data set of individuals who have registered to volunteer at a public school. Which of the choices below lists categorical variables?
# phone number and name
## Question 2: A study is designed to test the effect of type of light on exam performance of students. 180 students are randomly assigned to three classrooms: one that is dimly lit, another with yellow lighting, and a third with white fluorescent lighting, and given the same exam. Which of the following correctly identifies the variables used in the study as explanatory and response?
# explanatory: type of light (categorical with 3 levels)
# response: exam performance
## Question 3: Past research suggests that students who study with fewer distractions (internet, cell phone, etc.) tend to get higher grades. Which of the following is the best scenario for being able to generalize this finding to the population of all students?
# WRONG: There is at least one student in the sample from each year and major that is represented in the general student body.
## Question 3: True or False: If subjects are randomly assigned to treatments, conclusions can be generalized to the population.
# False
## Question 4: An extraneous variable that is related to the explanatory and response variables and that prevents us from deducing causal relationships based on observational studies is called a
# confounding variable
## Question 5: As part of a statistics project, Andrea would like to collect data on household size in her city. To do so, she asks each person in her statistics class for the size of their household, and reports that her sample is a simple random sample. However, this is not a simple random sample. Which of the following is the best reasoning for why this is not a random sample that is appropriate for this research question?
# Andrea did not use any randomization; she took a convenience sample.
## Question 6: True or False: Stratified sampling allows for controlling for possible confounders in the sampling stage, while blocking allows for controlling for such variables during random assignment.
# WRONG: False
# True
## Question 7: Which of the below data sets has the highest standard deviation? You do not need to calculate the exact standard deviations to answer this question.
# WRONG: 0,1,1,1,1,1,2
## Question 8: True or False: The statistic mean/median (mean divided by median) can be used as a measure of skewness (either right or left). If this statistic is less than 1, the distribution is most likely left skewed.
# True
## Question 8: The distribution of housing prices in a country where 25% of the houses cost below $350,000, 50% of the houses cost below $450,000, 75% of the houses cost below $1,000,000 and there are a meaningful number of houses that cost more than $6,000,000 is most likely
# right skewed
## Question 9: Two distributions (A and B) are shown on the box plot below. Which of the following statements is not supported by the plot?
# Both distributions are unimodal.
## Question 10: True or False: You are going to collect income data from a right-skewed distribution of incomes of politicians. If you take a large enough sample from that distribution, the sample mean and the sample median will always have the same value.
# False
## Question 10: The midrange is defined as the average of the maximum and the minimum.
## True or False: This statistic is robust to outliers.
# False
## Question 11: It is relatively common for fish to be mislabeled in supermarkets and even in restaurants. The table below shows the results of a study where a random sample of 156 fish for sale were collected and genetically tested. The researchers classified each sample as being labeled properly or being mislabeled. What fraction of smoked fish in the sample were mislabeled? Choose the closest answer.
# 78%
## Question 12: In 1948, Austin Bradford Hill, designed a study to test a new treatment for tuberculosis that at the beginning of the study there was no evidence whether it would be any better or worse than bed rest. He randomly assigned some patients who volunteered to be a part of this study to receive the treatment Streptomycin, an antibiotic. The other patients received only bed rest as the control group. Hill then observed the patients’ outcomes: which patients died and which recovered. The results of the study are shown below.
## We use the following simulation test if there is a difference between the recovery rates under the two treatments: We write “died” on 18 index cards and “survived” on 89 index cards to indicate whether or not a patient died. Next, we shuffle the cards and deal them into two groups of 52 and 55, for control and treatment, respectively. We then calculate the simulated difference between the recovery rates in Streptomycin and control groups (p̂Streptomycin − p̂Control), and record this value. We repeat this simulation 100 times. The histogram below shows the distribution simulated difference between the recovery rates in these 100 simationul
## Which of the following is correct? Choose all that apply (there are multiple correct answers).
# WRONG: The alternative hypothesis is that the Streptomycin treatment is more effective than bed rest.
# Based on this study we can conclude a causal relationship between Streptomycin and better tuberculosis recovery rate.
# Streptomycin treatment appears to be effective in treating tuberculosis since the observed difference in recovery rates would be considered unusual based on the simulation results.