-
Notifications
You must be signed in to change notification settings - Fork 0
/
readme.Rmd
240 lines (202 loc) · 11.6 KB
/
readme.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
---
title: "Analysis of US change in performance on the PISA assessment 2000-2018 as an application of slope"
output:
html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning=F,message=F)
```
> Analysis by [Matt Wilkins, PhD](https://www.mattwilkinsbio.com)
Data downloaded from the [National Center for Education Statistics](https://nces.ed.gov/surveys/pisa/idepisa/report.aspx?p=1%C3%81RMS%C3%811%C3%8120183%C3%8020153%C3%8020123%C3%8020093%C3%8020063%C3%8020033%C3%8020003%C3%81PVMATH%C3%81TOTAL%C3%81IN3%C3%80AUS%C3%80AUT%C3%80BEL%C3%80CAN%C3%80CHL%C3%80COL%C3%80CZE%C3%80DNK%C3%80EST%C3%80FIN%C3%80FRA%C3%80DEU%C3%80GRC%C3%80HUN%C3%80ISL%C3%80IRL%C3%80ISR%C3%80ITA%C3%80JPN%C3%80KOR%C3%80LVA%C3%80LUX%C3%80MEX%C3%80NLD%C3%80NZL%C3%80NOR%C3%80POL%C3%80PRT%C3%80SVK%C3%80SVN%C3%80ESP%C3%80SWE%C3%80CHE%C3%80TUR%C3%80GBR%C3%80USA%C3%80RUS%C3%80QCN%C3%80SGP%C3%80ARE%C3%80VNM%C3%81MN%C3%82MN%C3%81Y%C3%82J%C3%810%C3%810%C3%8137%C3%81N&Lang=1033)
##### An interactive plot of [PISA (Programme for International Student Assessment)](https://nces.ed.gov/surveys/pisa/index.asp) statistics for the US, 41 other countries, and the OECD average, showing slopes and intercepts from a linear model fit.
<div style="position: relative; overflow: hidden; overflow-y: hidden; padding-top:101%;">
<iframe src="PISA_interactive.html"
style="border: 0; position: absolute; top:0; left:0; width: 100%; height:100%; overflow: hidden;"
title="Data visual showing PISA combined subject scores as ranks for the 39 jurisdictions"/>
</div>
### See a step by step analysis (with R code) below:
1. First, we'll plot math, reading, and science scores for available PISA data from 2000-2018 for the US and a few "jurisdictions" of interest.
Below, I include the [R](https://www.r-project.org/) code to wrangle data and generate figures.
<details>
<summary style="color: #7A7A7A"> Click for code </summary>
```{r manage-data, results='hide'}
require(remotes)
install_github("galacticpolymath/galacticPubs") #custom theme stuff
require(pacman)
p_load(ggplot2,readxl,dplyr,galacticPubs,reshape2,ggiraph,widgetframe,htmlwidgets)
d<-read_xlsx("data/pisa-data_2000-2018_tidy.xlsx")
#Rename Long name(s)
d$jurisdiction <- recode(d$jurisdiction,`International Average (OECD Countries)`="Avg of OECD Countries")
# Add ranks for each subject-year
d$sub_yr<-paste(d$subject,d$year)
d2<-lapply(unique(d$sub_yr),function(i) {
d_i <- subset(d,sub_yr==i)
d_i$rank<-rank(-d_i$average,na.last="keep",ties.method="average")
d_i
}) %>% bind_rows()
d2$year <- as.numeric(d2$year)
#define which jurisdictions to compare
for_comparison<-c("Avg of OECD Countries","United States","Canada","Shanghai - China","Russian Federation")
#plot score by year, grouped by subject
G<-d2 %>% subset(.,jurisdiction%in%for_comparison) %>%
ggplot(.,aes(x=year,y=average,col=jurisdiction))+
geom_point()+geom_smooth(method="lm",se=F,formula='y~x')+
facet_grid(~subject)+ggGalactic(font.cex =.8)+
theme(strip.text=element_text(size=16))+
scale_colour_manual(values=gpPal[[1]]$hex[c(6,5,2,4,1,3)])+
scale_linetype_manual(values=c(3,1,1,1,2))+ylab("PISA average")
```
</details>
```{r plot-it, echo=F, fig.width=12, fig.height=4}
G
```
##### US scores are about the same as the average for [the 38 OECD peer countries](https://en.wikipedia.org/wiki/OECD). Meanwhile, Shanghai, China is far above everyone else; our Northern neighbor, Canada, is consistently performing higher; and Russia has overtaken the US in math and is rapidly closing the gap in reading.
----
2. Let's see what it looks like if we combine the subject scores into a single value.
<details>
<summary style="color: #7A7A7A"> Click for code </summary>
```{r analysis-2, results='hide' }
require(reshape2)
# Sum all the scores for each country for each year
d3 = melt(d2[,c("year","jurisdiction","average")],id=c("year","jurisdiction")) %>% dcast(.,formula=jurisdiction+year~variable,fun.aggregate=sum)
names(d3)[3]<-"combined_average"
#Now plot the result
G2= d3 %>% subset(.,jurisdiction%in%for_comparison) %>%
ggplot(.,aes(x=year,y=combined_average,col=jurisdiction))+ylim(1350,1800)+
geom_point()+geom_smooth(method="lm",se=F,formula='y~x')+
ggGalactic(font.cex =.8)+
theme(strip.text=element_text(size=16))+
scale_colour_manual(values=gpPal[[1]]$hex[c(6,5,2,4,1,3)])+
scale_linetype_manual(values=c(3,1,1,1,2))+ylab("Mean Combined PISA score")
```
</details>
``` {r echo=F, fig.width=12, fig.height=6}
G2
```
##### Because the science test wasn't offered initially (or in the same years), this reduces the range of our data. But the combined US PISA scores are incredibly flat for the last decade.
-----
3. Next, let's zoom in on the US line and calculate the slope.
<details>
<summary style="color: #7A7A7A"> Click for code </summary>
```{r analysis-3, results='hide' }
us<-subset(d3,jurisdiction=="United States")
us
#fit a line (it's just 4 points, but hey, we're not testing for significance)
mod<-lm(combined_average~year,data=us,na.rm=T)
mod_form<-paste0("y = ",round(coef(mod)[2],2),"x + ",round(coef(mod)[1],2))
#make graph and add formula to it
G3 <- us %>% ggplot(.,aes(x=year,y=combined_average,col=jurisdiction))+
geom_point(size=3)+geom_smooth(method="lm",se=F,formula='y~x')+
ggGalactic(font.cex =.8)+ylim(1350,1800)+
theme(strip.text=element_text(size=16))+
scale_colour_manual(values=gpPal[[1]]$hex[1])+
ylab("Mean Combined PISA score")+annotate("text",label=mod_form,col=gpColors("hydro"),size=10,x=2013,y=1530)
```
</details>
```{r plot-3, echo=F, fig.width=12, fig.height=6}
# Data: combined average scores for the US
us
G3
```
##### US average combined PISA scores have declined by almost a point per year between 2009 and 2018. That is to say, the aggregate scores have not changed meaningfully, despite [numerous expensive, ongoing educational reforms across the country](https://time.com/5775795/education-reform-failed-america/).
----
#### But maybe raw points are not very meaningful. How have ranks changed?
These are the subset of 41 countries (and the OECD average) included in the ranks:
<details>
<summary style="color: #7A7A7A"> Click for code & country/jurisdiction list </summary>
```{r results='show'}
#recalculate ranks based on combined PISA scores (all subject scores added together)
d4 <- lapply(sort(unique(d3$year)),function(i) {
d_i <- subset(d3,year==i)
d_i$rank<-rank(-d_i$combined_average,na.last="keep",ties.method="average")
d_i
}) %>% bind_rows()
d4$jurisdiction %>% unique()
```
</details>
And here is the plot of country ranks by combined PISA averages for science, math, and reading.
<details>
<summary style="color: #7A7A7A"> Click for code </summary>
```{r analysis-4-ranks, results='hide'}
#get models for all countries & make hover text summary
hoverText<-sapply(unique(d4$jurisdiction),function(x){
d_x<-subset(d4,jurisdiction==x&complete.cases(rank))
if(nrow(d_x)<3){mod_form=status= "Not enough data yet"
}else{
mod_x<-lm(-rank~year,data=d_x,na.rm=T)
mod_form<-paste0("y = ",round(coef(mod_x)[2],2),"x + ",round(coef(mod_x)[1],2))
#make status statement based on slope
#Use slope of .25 as threshold (improving, declining or staying the same within 4 yrs)
status<-{if(coef(mod_x)[2]>.25){"improving"
}else if(coef(mod_x)[2]<(-.25)){"declining"
}else if(coef(mod_x)[2]>-.25 & coef(mod_x)[2]<.25){"steady"}}
}
maxRank<-max(d_x$rank,na.rm=T)
maxYear<-d_x$year[which.max(d_x$rank)]
minRank<-min(d_x$rank,na.rm=T)
minYear<-d_x$year[which.min(d_x$rank)]
#switched max/min in the output due to interpretation of 1 as a higher rank
return(assign(x,paste0(toupper(x),"\nStatus: ",status,"\n",mod_form,"\nHighest: ",minRank," (",minYear,")\nLowest: ",maxRank," (",maxYear,")")))
})
#Add formulas to the data frame
d4$hoverText<-hoverText[match(d4$jurisdiction,names(hoverText))]
#Make certain countries stand out a little
focal_jur<-c("United States","Canada","Avg of OECD Countries")
d4$colCode<-0
#change focal jurisdictions to number codes 1=US, 2=Canada, 3=avg oecd
for(i in 1:length(focal_jur)){d4$colCode[which(d4$jurisdiction==focal_jur[i])]<-i}
d4$colCode<-as.factor(ifelse(is.na(d4$rank),NA,d4$colCode))
#make a literal vector of colors, as well. We'll still use colCode to style shapes,etc
colVec<-ifelse(d4$colCode==0,"gray80","gray30") #make 0s light gray, and the highlighted countries darker gray
#I seem to need yet another vector of just the country colors
colVec_jur<-ifelse(d4$jurisdiction %>% unique()%in%focal_jur,"gray30","gray80")
names(colVec_jur)<-d4$jurisdiction %>% unique()
######
#make graph and add formula to it
G4 <- d4 %>% subset(.,!jurisdiction%in%focal_jur) %>% ggplot(.,aes(x=year,y=rank,group=jurisdiction),stroke=.5)+
#First add nonfocal countries in a less obtrusive color
geom_point_interactive(show.legend=F,size=1.5,color="gray80",shape=19,
aes(data_id=jurisdiction,tooltip=paste0(toupper(jurisdiction), "\nYear: ",year,"\nRank: ",rank)),
hover_css="stroke: #6812d1;fill: #6812d1;")+
geom_smooth_interactive(show.legend=F,method="lm",se=F,size=.5,formula='y~x',color="gray80",size=.8,
aes(data_id=jurisdiction,tooltip=hoverText),
hover_css="stroke: #6812d1;")+
#Add custom styling for Galactic Polymath
ggGalactic(font.cex =.6)+
#Scale axes
scale_x_continuous(breaks=seq(2006,2018,3),limits=c(2006,2018))+scale_y_reverse(limits=c(44,0),breaks=c(1,seq(45,1,-5)))+
#Now add on your focal countries so they look distinct
geom_point_interactive(inherit.aes=F, data=subset(d4,jurisdiction%in%focal_jur&complete.cases(rank)),
show.legend=T,size=1.85,color="gray30",
aes(shape=colCode,data_id=jurisdiction,x=year,y=rank,
tooltip=paste0(toupper(jurisdiction), "\nYear: ",year,"\nRank: ",rank)),
hover_css="stroke: #6812d1;fill: #6812d1;")+
geom_smooth_interactive(inherit.aes=F, data=subset(d4,jurisdiction%in%focal_jur&complete.cases(rank)),
show.legend=T,method="lm",se=F,size=1,formula='y~x',size=.8,color="gray30",
aes(x=year,y=rank,data_id=jurisdiction,tooltip=hoverText,linetype=colCode),
hover_css="stroke: #6812d1;")+
scale_shape_manual(name="Focal Jurisdictions:",values=c(24,22,23),labels=sort(focal_jur,decreasing = T))+
scale_linetype_manual(name="Focal Jurisdictions:",values=1:3,labels=sort(focal_jur,decreasing = T))+
ylab("Overall PISA Rank")+
labs(title="Change in PISA scores for 41 countries and the OECD average",
subtitle="Mouse over a trendline to get the line equation and other info")+
theme(legend.position="bottom",legend.background = element_rect(fill="gray94",color="transparent" ),
legend.key=element_rect("gray98"),legend.key.width=unit(22,"pt"),
legend.text=element_text(size=8),
plot.title=element_text(size=11),plot.subtitle=element_text(size=9,color="#6812d1"),
axis.title=element_text(size=10))+ xlab("")
# Data: combined average scores for the US
girafe(print(G4),width_svg=6,height_svg=6,options = list(
opts_hover_inv(css = "opacity:0.5;"),
opts_hover(css = "stroke-width:2;")
))
```
</details>
```{r plot-4, results="show",echo=T}
w<-girafe(print(G4),width_svg=6,height_svg=6,options = list(
opts_hover_inv(css = "opacity:0.5;"),
opts_hover(css = "stroke-width:2;")
))
w
w %>% frameableWidget() %>% saveWidget(.,"interactive_PISA_ranks.html")
```