-
Notifications
You must be signed in to change notification settings - Fork 0
/
win.rmd
298 lines (198 loc) · 10.8 KB
/
win.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
---
title: "Geog533 Lab 5"
author: "Winnie Ngare"
output:
html_document:
toc: yes
html_notebook:
toc: yes
toc_float: yes
---
Complete the following exercises in Chapter 5 (Inferential Statistics) of the textbook [R] pages 172-176.
## Question 1
This is Exercise 1 of the textbook.
A planner wishes to estimate average household size for a community within 0.2. The planner desires a 95% confidence level. A small survey indicates that the standard deviation of household size is 2.0. How large should the sample be?
```{r}
width <-0.2
sd <- 2
C.I <- 95
n <- {qnorm(0.975)*sd/width}^2
print(n)
```
## Question 2
This is Exercise 3 of the textbook.
The tolerable level of a certain pollutant is 16 mg/l. A researcher takes a sample of size n = 50, and finds that the mean level of the pollutant is 18.5 mg/l, with a standard deviation of 7 mg/l. Construct a 95% confidence interval around the sample mean, and determine whether the tolerable level is within this interval.
```{r}
library(MASS)
x <- mvrnorm(n=50,mu=18.5,Sigma = 49,empirical = TRUE)
result <- t.test(x,mu=16)
print(result)
if(result$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("True mean of the pollutant is above the tolerable level of 16")
###therefore true mean is not equal to 16.
```
## Question 3
This is Exercise 5 of the textbook.
The proportion of people changing residence in the USA each year is 0.165. A researcher believes that the proportion may be different in the town of Amherst. She surveys 50 individuals in the town of Amherst and finds that the proportion who moved last year is 0.24. Is there evidence to conclude that the town has a mobility rate that is different from the national average? Use α = 0.05 and find a 90% confidence interval around the sample proportion, and state your conclusion.
```{r}
n=50
p <- 0.24 ####therefore, x <- 0.24*50
prop95 <- prop.test(x=12,n=50,p=0.165,alternative ="two.sided")
print(prop95)
if(prop95$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("The town's mobility rate is significantly the same as the national average.")
prop90 <- prop.test(x=12,n=50,p=0.165,alternative ="two.sided",conf.level = 0.90)
print(prop90)
if(prop90$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("The town's mobility rate is significantly the same as the national average.")
```
## Question 4
This is Exercise 7 of the textbook.
A survey of the white and nonwhite population in a local area reveals the following annual trip frequencies to the nearest state park:
<center>$\bar{x_{1}}=4.1$, $s_{1}^{2} = 14.3$, $n_{1} = 20$</center>
<center>$\bar{x_{2}}=3.1$, $s_{2}^{2} = 12.0$, $n_{1} = 16$</center>
where the subscript ‘1’ denotes the white population and the subscript ‘2’ denotes the nonwhite population.
<ol type="a">
<li>Assume that the variances are equal, and test the null hypothesis that there is no difference between the park-going frequencies of whites and nonwhites. </li>
<li>Repeat the exercise, assuming that the variances are unequal. </li>
<li>Find the p-value associated with the tests in parts (a) and (b). </li>
<li>Associated with the test in part (a), find a 95% confidence interval for the difference in means. </li>
<li>Repeat parts (a)–(d), assuming sample sizes of n<sub>1</sub> = 24 and n<sub>2</sub> = 12. </li>
</ol>
```{r}
###variances are equal
white <- mvrnorm(n=20,mu=4.1,Sigma = 14.3,empirical = TRUE)
n.white <- mvrnorm(n=16,mu=3.1,Sigma = 12,empirical = TRUE)
a<- t.test(white,n.white,var.equal = TRUE)
a
if(a$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("Therefore there is no difference between the park-going frequencies of whites and nonwhites.")
###variances are unequal
white <- mvrnorm(n=20,mu=4.1,Sigma = 14.3,empirical = TRUE)
n.white <- mvrnorm(n=16,mu=3.1,Sigma = 12,empirical = TRUE)
b <- t.test(white,n.white,var.equal = FALSE)
print(b)
if(b$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("Therefore there is no difference between the park-going frequencies of whites and nonwhites.")
###the p-values;
print(a$p.value)
print(b$p.value)
### 95% confidence interval for part a;
print(a$conf.int)
###part e
####equal variances
white <- mvrnorm(n=24,mu=4.1,Sigma = 14.3,empirical = TRUE)
n.white <- mvrnorm(n=12,mu=3.1,Sigma = 12,empirical = TRUE)
ae<- t.test(white,n.white,var.equal = TRUE)
ae
if(ae$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("Therefore there is no difference between the park-going frequencies of whites and nonwhites.")
####unequal variances;
white <- mvrnorm(n=24,mu=4.1,Sigma = 14.3,empirical = TRUE)
n.white <- mvrnorm(n=12,mu=3.1,Sigma = 12,empirical = TRUE)
be <- t.test(white,n.white,var.equal = FALSE)
print(be)
if(be$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("Therefore there is no difference between the park-going frequencies of whites and nonwhites.")
####p.values
print(ae$p.value)
print(be$p.value)
####95% confidence interval
print(ae$conf.int)
```
## Question 5
This is Exercise 9 of the textbook.
A researcher suspects that the level of a particular stream’s pollutant is higher than the allowable limit of 4.2 mg/l. A sample of n = 17 reveals a mean pollutant level of = 6.4 mg/l, with a standard deviation of 4.4 mg/l. Is there sufficient evidence that the stream’s pollutant level exceeds the allowable limit? What is the p-value?
```{r}
x <- mvrnorm(n=17,mu=6.4,Sigma = 4.4^2,empirical = TRUE)
result <- t.test(x,mu=4.2,alternative = "greater")
print(result)
if(result$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("Therefore the stream's pollutant is higher than the allowable limit.")
```
## Question 6
This is Exercise 13 of the textbook.
Suppose we want to know whether the mean length of unemployment differs among the residents of two local communities. Sample information is as follows:
Community A: sample mean = 3.4 months, s = 1.1 month, n = 52
Community B: sample mean = 2.8 months, s = 0.8 month, n = 62
Set up the null and alternative hypotheses. Use α = 0.05. Choose a particular test, and show the rejection regions on a diagram. Calculate the test statistic, and decide whether to reject the null hypothesis. (Do not assume that the two standard deviations are equal to one another – therefore a pooled estimate of s should not be found.)
```{r}
a <- mvrnorm(n=52, mu=3.4, Sigma = 1.1^2,empirical = TRUE )
b <- mvrnorm(n=62, mu=2.8, Sigma = 0.8^2,empirical = TRUE )
results <- t.test(a,b,var.equal = FALSE)
results
if(result$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("mean length of unemployment differs among residents")
q95 <- qnorm(0.975) # 95% confidence interval
curve(dnorm,from = -3,to = 3,main="95% confidence interval")
x <- c(-q95,seq(-q95,q95,by = 0.01),q95)
y <- c(0,dnorm(seq(-q95,q95,by = 0.01)),0)
polygon(x,y,col="blue")####the unshaded area is the rejection region.
```
## Question 7
This is Exercise 15 of the textbook.
Find the 90% and 95% confidence intervals for the following mean stream link lengths:
100, 426, 322, 466, 112, 155, 388, 1155, 234, 324, 556, 221, 18, 133, 177, 441.
```{r}
x <- c(100,426,322,466,112,155,388,1155,234,324,556,221,18,133,177,441)
q90 <- t.test(x,conf.level =0.90)
print(q90)
print(q90$conf.int)
q95 <- t.test(x,conf.level =0.95)
q95
print(q95$conf.int)
```
## Question 8
This is Exercise 16 of the textbook.
A researcher surveys 50 individuals in Smithville and 40 in Amherst, finding that 30% of Smithville residents moved last year, while only 22% of Amherst residents did. Is there enough evidence to conclude that mobility rates in the two communities differ? Use a two-tailed alternative, and α = 0.10. Again, find the p-value and a 90% confidence interval for the difference in proportions.
```{r}
###smithville;30%=15residents,...amherst;22%=11 residents
mov <- prop.test(x=c(15,8.8),n=c(50,40),alternative = "two.sided",conf.level = 0.90)
mov
if(mov$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("mobility rates in the two communities are the same")
```
## Question 9
This is Exercise 17 of the textbook.
A survey of two towns is carried out to see whether there are differences in levels of education. Town A has a mean of 12.4 years of education among its residents; Town B has a mean of 14.4 years. Fifteen residents were surveyed in each town. The sample standard deviation was 3.0 in Town A, and 4.0 in Town B. Is there a significant difference in education between the two towns?
<ol type="a">
<li>Assume the variances are equal. </li>
<li>Assume the variances are not equal. </li>
</ol>
In each case, state the null and alternative hypotheses, and test the null hypothesis, using α = 0.05. Find the p-values and a 95% confidence interval for the difference.
```{r}
###equal variances
a<- mvrnorm(n=15, mu=12.4, Sigma = 9,empirical = TRUE )
b<- mvrnorm(n=15, mu=14.4, Sigma = 16,empirical = TRUE )
ab1<- t.test(a,b,var.equal = TRUE)
print(ab1)
if(ab1$p.value>0.05){print ("we cannot reject the null hypothesis,therefore there are no differences in education")
}else{print("we reject the null hypothesis;Therefore there are differences in education")}
###when variances are not equal;
a<- mvrnorm(n=15, mu=12.4, Sigma =9,empirical = TRUE )
b<- mvrnorm(n=15, mu=14.4, Sigma = 16,empirical = TRUE )
ab2 <- t.test(a,b,var.equal = FALSE)
print(ab2)
if(ab2$p.value>0.05){print ("we cannot reject the null hypothesis,therefore there are no differences in education")
}else{print("we reject the null hypothesis;Therefore there are differences in education")}
```
## Question 10
This is Exercise 20 of the textbook.
A survey of n = 50 people reveals that the proportion of residents in a community who take the bus to work is 0.15. Is this significantly different from the statewide average of 0.10? Use a Type I error probability of 0.05.
```{r}
prop95 <- prop.test(x=7.5,n=50,p=0.10,alternative ="two.sided")
print(prop95)
if(prop95$p.value>0.05){print ("we cannot reject the null hypothesis")
}else{print("we reject the null hypothesis")}
print("True p is significantly equal to the statewide average of 0.10")
```