-
Notifications
You must be signed in to change notification settings - Fork 0
A simple linear regression analysis of US performance on the Program for International Student Assessment (PISA) exam.
License
galacticpolymath/PISA-analysis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
--- title: "Analysis of US change in performance on the PISA assessment 2000-2018 as an application of slope" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, warning=F,message=F) ``` > Analysis by [Matt Wilkins, PhD](https://www.mattwilkinsbio.com) Data downloaded from the [National Center for Education Statistics](https://nces.ed.gov/surveys/pisa/idepisa/report.aspx?p=1%C3%81RMS%C3%811%C3%8120183%C3%8020153%C3%8020123%C3%8020093%C3%8020063%C3%8020033%C3%8020003%C3%81PVMATH%C3%81TOTAL%C3%81IN3%C3%80AUS%C3%80AUT%C3%80BEL%C3%80CAN%C3%80CHL%C3%80COL%C3%80CZE%C3%80DNK%C3%80EST%C3%80FIN%C3%80FRA%C3%80DEU%C3%80GRC%C3%80HUN%C3%80ISL%C3%80IRL%C3%80ISR%C3%80ITA%C3%80JPN%C3%80KOR%C3%80LVA%C3%80LUX%C3%80MEX%C3%80NLD%C3%80NZL%C3%80NOR%C3%80POL%C3%80PRT%C3%80SVK%C3%80SVN%C3%80ESP%C3%80SWE%C3%80CHE%C3%80TUR%C3%80GBR%C3%80USA%C3%80RUS%C3%80QCN%C3%80SGP%C3%80ARE%C3%80VNM%C3%81MN%C3%82MN%C3%81Y%C3%82J%C3%810%C3%810%C3%8137%C3%81N&Lang=1033) ##### An interactive plot of [PISA (Programme for International Student Assessment)](https://nces.ed.gov/surveys/pisa/index.asp) statistics for the US, 41 other countries, and the OECD average, showing slopes and intercepts from a linear model fit. <div style="position: relative; overflow: hidden; overflow-y: hidden; padding-top:101%;"> <iframe src="PISA_interactive.html" style="border: 0; position: absolute; top:0; left:0; width: 100%; height:100%; overflow: hidden;" title="Data visual showing PISA combined subject scores as ranks for the 39 jurisdictions"/> </div> ### See a step by step analysis (with R code) below: 1. First, we'll plot math, reading, and science scores for available PISA data from 2000-2018 for the US and a few "jurisdictions" of interest. Below, I include the [R](https://www.r-project.org/) code to wrangle data and generate figures. <details> <summary style="color: #7A7A7A"> Click for code </summary> ```{r manage-data, results='hide'} require(remotes) install_github("galacticpolymath/galacticPubs") #custom theme stuff require(pacman) p_load(ggplot2,readxl,dplyr,galacticPubs,reshape2,ggiraph,widgetframe,htmlwidgets) d<-read_xlsx("data/pisa-data_2000-2018_tidy.xlsx") #Rename Long name(s) d$jurisdiction <- recode(d$jurisdiction,`International Average (OECD Countries)`="Avg of OECD Countries") # Add ranks for each subject-year d$sub_yr<-paste(d$subject,d$year) d2<-lapply(unique(d$sub_yr),function(i) { d_i <- subset(d,sub_yr==i) d_i$rank<-rank(-d_i$average,na.last="keep",ties.method="average") d_i }) %>% bind_rows() d2$year <- as.numeric(d2$year) #define which jurisdictions to compare for_comparison<-c("Avg of OECD Countries","United States","Canada","Shanghai - China","Russian Federation") #plot score by year, grouped by subject G<-d2 %>% subset(.,jurisdiction%in%for_comparison) %>% ggplot(.,aes(x=year,y=average,col=jurisdiction))+ geom_point()+geom_smooth(method="lm",se=F,formula='y~x')+ facet_grid(~subject)+ggGalactic(font.cex =.8)+ theme(strip.text=element_text(size=16))+ scale_colour_manual(values=gpPal[[1]]$hex[c(6,5,2,4,1,3)])+ scale_linetype_manual(values=c(3,1,1,1,2))+ylab("PISA average") ``` </details> ```{r plot-it, echo=F, fig.width=12, fig.height=4} G ``` ##### US scores are about the same as the average for [the 38 OECD peer countries](https://en.wikipedia.org/wiki/OECD). Meanwhile, Shanghai, China is far above everyone else; our Northern neighbor, Canada, is consistently performing higher; and Russia has overtaken the US in math and is rapidly closing the gap in reading. ---- 2. Let's see what it looks like if we combine the subject scores into a single value. <details> <summary style="color: #7A7A7A"> Click for code </summary> ```{r analysis-2, results='hide' } require(reshape2) # Sum all the scores for each country for each year d3 = melt(d2[,c("year","jurisdiction","average")],id=c("year","jurisdiction")) %>% dcast(.,formula=jurisdiction+year~variable,fun.aggregate=sum) names(d3)[3]<-"combined_average" #Now plot the result G2= d3 %>% subset(.,jurisdiction%in%for_comparison) %>% ggplot(.,aes(x=year,y=combined_average,col=jurisdiction))+ylim(1350,1800)+ geom_point()+geom_smooth(method="lm",se=F,formula='y~x')+ ggGalactic(font.cex =.8)+ theme(strip.text=element_text(size=16))+ scale_colour_manual(values=gpPal[[1]]$hex[c(6,5,2,4,1,3)])+ scale_linetype_manual(values=c(3,1,1,1,2))+ylab("Mean Combined PISA score") ``` </details> ``` {r echo=F, fig.width=12, fig.height=6} G2 ``` ##### Because the science test wasn't offered initially (or in the same years), this reduces the range of our data. But the combined US PISA scores are incredibly flat for the last decade. ----- 3. Next, let's zoom in on the US line and calculate the slope. <details> <summary style="color: #7A7A7A"> Click for code </summary> ```{r analysis-3, results='hide' } us<-subset(d3,jurisdiction=="United States") us #fit a line (it's just 4 points, but hey, we're not testing for significance) mod<-lm(combined_average~year,data=us,na.rm=T) mod_form<-paste0("y = ",round(coef(mod)[2],2),"x + ",round(coef(mod)[1],2)) #make graph and add formula to it G3 <- us %>% ggplot(.,aes(x=year,y=combined_average,col=jurisdiction))+ geom_point(size=3)+geom_smooth(method="lm",se=F,formula='y~x')+ ggGalactic(font.cex =.8)+ylim(1350,1800)+ theme(strip.text=element_text(size=16))+ scale_colour_manual(values=gpPal[[1]]$hex[1])+ ylab("Mean Combined PISA score")+annotate("text",label=mod_form,col=gpColors("hydro"),size=10,x=2013,y=1530) ``` </details> ```{r plot-3, echo=F, fig.width=12, fig.height=6} # Data: combined average scores for the US us G3 ``` ##### US average combined PISA scores have declined by almost a point per year between 2009 and 2018. That is to say, the aggregate scores have not changed meaningfully, despite [numerous expensive, ongoing educational reforms across the country](https://time.com/5775795/education-reform-failed-america/). ---- #### But maybe raw points are not very meaningful. How have ranks changed? These are the subset of 41 countries (and the OECD average) included in the ranks: <details> <summary style="color: #7A7A7A"> Click for code & country/jurisdiction list </summary> ```{r results='show'} #recalculate ranks based on combined PISA scores (all subject scores added together) d4 <- lapply(sort(unique(d3$year)),function(i) { d_i <- subset(d3,year==i) d_i$rank<-rank(-d_i$combined_average,na.last="keep",ties.method="average") d_i }) %>% bind_rows() d4$jurisdiction %>% unique() ``` </details> And here is the plot of country ranks by combined PISA averages for science, math, and reading. <details> <summary style="color: #7A7A7A"> Click for code </summary> ```{r analysis-4-ranks, results='hide'} #get models for all countries & make hover text summary hoverText<-sapply(unique(d4$jurisdiction),function(x){ d_x<-subset(d4,jurisdiction==x&complete.cases(rank)) if(nrow(d_x)<3){mod_form=status= "Not enough data yet" }else{ mod_x<-lm(-rank~year,data=d_x,na.rm=T) mod_form<-paste0("y = ",round(coef(mod_x)[2],2),"x + ",round(coef(mod_x)[1],2)) #make status statement based on slope #Use slope of .25 as threshold (improving, declining or staying the same within 4 yrs) status<-{if(coef(mod_x)[2]>.25){"improving" }else if(coef(mod_x)[2]<(-.25)){"declining" }else if(coef(mod_x)[2]>-.25 & coef(mod_x)[2]<.25){"steady"}} } maxRank<-max(d_x$rank,na.rm=T) maxYear<-d_x$year[which.max(d_x$rank)] minRank<-min(d_x$rank,na.rm=T) minYear<-d_x$year[which.min(d_x$rank)] #switched max/min in the output due to interpretation of 1 as a higher rank return(assign(x,paste0(toupper(x),"\nStatus: ",status,"\n",mod_form,"\nHighest: ",minRank," (",minYear,")\nLowest: ",maxRank," (",maxYear,")"))) }) #Add formulas to the data frame d4$hoverText<-hoverText[match(d4$jurisdiction,names(hoverText))] #Make certain countries stand out a little focal_jur<-c("United States","Canada","Avg of OECD Countries") d4$colCode<-0 #change focal jurisdictions to number codes 1=US, 2=Canada, 3=avg oecd for(i in 1:length(focal_jur)){d4$colCode[which(d4$jurisdiction==focal_jur[i])]<-i} d4$colCode<-as.factor(ifelse(is.na(d4$rank),NA,d4$colCode)) #make a literal vector of colors, as well. We'll still use colCode to style shapes,etc colVec<-ifelse(d4$colCode==0,"gray80","gray30") #make 0s light gray, and the highlighted countries darker gray #I seem to need yet another vector of just the country colors colVec_jur<-ifelse(d4$jurisdiction %>% unique()%in%focal_jur,"gray30","gray80") names(colVec_jur)<-d4$jurisdiction %>% unique() ###### #make graph and add formula to it G4 <- d4 %>% subset(.,!jurisdiction%in%focal_jur) %>% ggplot(.,aes(x=year,y=rank,group=jurisdiction),stroke=.5)+ #First add nonfocal countries in a less obtrusive color geom_point_interactive(show.legend=F,size=1.5,color="gray80",shape=19, aes(data_id=jurisdiction,tooltip=paste0(toupper(jurisdiction), "\nYear: ",year,"\nRank: ",rank)), hover_css="stroke: #6812d1;fill: #6812d1;")+ geom_smooth_interactive(show.legend=F,method="lm",se=F,size=.5,formula='y~x',color="gray80",size=.8, aes(data_id=jurisdiction,tooltip=hoverText), hover_css="stroke: #6812d1;")+ #Add custom styling for Galactic Polymath ggGalactic(font.cex =.6)+ #Scale axes scale_x_continuous(breaks=seq(2006,2018,3),limits=c(2006,2018))+scale_y_reverse(limits=c(44,0),breaks=c(1,seq(45,1,-5)))+ #Now add on your focal countries so they look distinct geom_point_interactive(inherit.aes=F, data=subset(d4,jurisdiction%in%focal_jur&complete.cases(rank)), show.legend=T,size=1.85,color="gray30", aes(shape=colCode,data_id=jurisdiction,x=year,y=rank, tooltip=paste0(toupper(jurisdiction), "\nYear: ",year,"\nRank: ",rank)), hover_css="stroke: #6812d1;fill: #6812d1;")+ geom_smooth_interactive(inherit.aes=F, data=subset(d4,jurisdiction%in%focal_jur&complete.cases(rank)), show.legend=T,method="lm",se=F,size=1,formula='y~x',size=.8,color="gray30", aes(x=year,y=rank,data_id=jurisdiction,tooltip=hoverText,linetype=colCode), hover_css="stroke: #6812d1;")+ scale_shape_manual(name="Focal Jurisdictions:",values=c(24,22,23),labels=sort(focal_jur,decreasing = T))+ scale_linetype_manual(name="Focal Jurisdictions:",values=1:3,labels=sort(focal_jur,decreasing = T))+ ylab("Overall PISA Rank")+ labs(title="Change in PISA scores for 41 countries and the OECD average", subtitle="Mouse over a trendline to get the line equation and other info")+ theme(legend.position="bottom",legend.background = element_rect(fill="gray94",color="transparent" ), legend.key=element_rect("gray98"),legend.key.width=unit(22,"pt"), legend.text=element_text(size=8), plot.title=element_text(size=11),plot.subtitle=element_text(size=9,color="#6812d1"), axis.title=element_text(size=10))+ xlab("") # Data: combined average scores for the US girafe(print(G4),width_svg=6,height_svg=6,options = list( opts_hover_inv(css = "opacity:0.5;"), opts_hover(css = "stroke-width:2;") )) ``` </details> ```{r plot-4, results="show",echo=T} w<-girafe(print(G4),width_svg=6,height_svg=6,options = list( opts_hover_inv(css = "opacity:0.5;"), opts_hover(css = "stroke-width:2;") )) w w %>% frameableWidget() %>% saveWidget(.,"interactive_PISA_ranks.html") ```
About
A simple linear regression analysis of US performance on the Program for International Student Assessment (PISA) exam.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published