This document contains the proposed outline for OpenIntro - Introductory Statistics with Randomization and Simulation, Second Edition.
- Case Study (capable of extending to MLR or 2 by 2 table)
- Taxonomy of Data
- Overview of data collection principles
- Observational studies and sampling strategies
- Experimental design and causality
- Revisit case study with new terminology we learned
- Cat vs. cat - segmented plots / contingency tables
- Conditional probability from contingency tables
- Bayes Theorem (law of total probability?)
- Num vs. cat - side-by-side box plots / comparing distributions
- Mention univariate - center, skew, shape, spread
- Mention conditional probabilities as well
- Visual summaries of data: scatterplot, side-by-side boxplots, histogram, density plot, box plot (lead out with multivariate, follow with univariate)
- Describing distributions: correlation, central tendency, variability, skew, modality
- Num vs. num - SLR
- correlation
- Line fitting, residuals, and correlation
- Fitting a line by least squares regression
- Types of outliers in linear regression
- Num vs. whatever - MLR
- Introduction to multiple regression
- Parallel slopes
- Hint at interaction, planes, and parallel planes but not quantify
- Visualization of higher-dimensional models (rgl demo)
- Logistic regression
- Binary vs. num/whatever
- Three scales interpretation (e.g. probability, odds, log-odds)
- “parallel” logistic curves?
- Understanding inference through simulation
- Randomization case study: gender discrimination
- Randomization case study: opportunity cost
- Hypothesis testing
- Confidence intervals
- Simulation case studies
- Inference for a single proportion
- Simulation
- Exact (if we include course on probability)
- CLT and Normal approximation
- Difference of two proportions
- Testing for goodness of fit using chi-square (special topic, include simulation version)
- Testing for independence in two-way tables (special topic)
- One-sample means
- Bootstrap (for means, medians)
- t-distribution
- Paired data
- Difference of two means
- Comparing many means with ANOVA (special topic, include simulation version)
- Inference for linear regression
- Bootstrap for regression coefficients
- t-distribution for regression coefficients
- Model Comparison: Occam’s Razor and R^2 > R^2_adj
- Checking model assumptions using graphs
- L-I-N-E
- Inference for multiple regression
- residuals vs. fitted instead of residuals vs. x
- Inference for logistic regression
(Keep same content as before, minus the bit of probability that got moved to categorical EDA)