Econometrics, statistics, and data science: Reinstein notes with a Micro, Behavioural, and Experimental focus
- Focus on the practical tools I use and the challenges I (David Reinstein) face
Microeconomics, behavioral economics, focus on charitable giving and 'returns to education' type of straightforward problems. (Limited to no focus on structural approaches.)
- Where we can add value to real econometric practice??
\
Data:
-
Observational (esp. web-scraped and API data and national surveys/admin data}
-
Experimental: esp. where with multiple crossed arms, and where the 'cleanest design' may not be possible
\
Assume familiarity with most basic statistical concepts like 'bias', 'consistency', and 'null hypothesis testing.' However, I will focus on some concepts that seem to often be misunderstood and mis-applied.
Folder: bayesian Notes: bayes_notes
- DAGs and Potential outcomes
-
Organizing a project
-
Dynamic documents (esp Rmd/bookdown)
-
Style and consistency
- Indenting, snake-case,etc
-
Using functions, variable lists, etc., for clean, concise, readable code
'Identification' of causal effects with a control strategy not credible Identification Essentially a 'control strategy' is "control for all or most of the reasonable determinants of the independent variable so as to make the remaining unobservable component very small, minimizing the potential for bias in the coefficient of interest". All of the controls must still be exogenous, otherwise this itself can lead to a bias. There is some discussion of how to validate this approach; see, e.g., [@oster2019unobservable].
Note: These are organized in an Airtable database here. Many of these are also covered in my 'Research and Writing' book
Peer effects: Self-selection, Common environment, simultaneity/reflection (Manski paper) Identification
Random effects estimators show a lack of robustness Specification Clustering SE is more standard practice
OLS/IV estimators not 'mean effect' in presence of heterogeneity
Power calculations/underpowered
Selection bias due to attrition
Selection bias due to missing variables -- impute these as a solution
Signs of p-hacking and specification-hunting
Weak diagnostic/identification tests
Dropping zeroes in a "loglinear" model is problematic Random effects estimators show a lack of robustness
Dropping zeroes in a "loglinear" model is problematic
Random effects estimators show a lack of robustness
With heterogeneity the simple OLS estimator is not the 'mean effect'
P_augmented may overstate type-1 error rate
Impact size from regression of "log 1+gift amount"
Lagged dependent variable and fixed effects --> 'Nickel bias'
Peer effects: Self-selection, Common environment, simultaneity/reflection (Manski paper)
Weak IV bias
Bias from selecting instruments and estimating using the same data
Endogenous control: Are the control variables you use endogenous? (E.g., because FDI may itself affect GDP per capita)
- Missing data
- Choice of control variables and interactions
- Which outcome variable/variables
-
Logs and exponentials
-
Nonlinear modeling (and interpreting coefficients)
-
'Testing for nonlinear terms'
Quadratic regressions are not diagnostic regarding u-shapedness: Simonsohn18
- OLS does not identify the ATE
- Modeling heterogeneity: the limits of Quantile re regression
"While the classical statistical framework is not terribly clear about when one should ""accept"" a null hypothesis, we clearly should distinguish strong evidence for a small or zero effect from the evidence and consequent imprecise estimates. If our technique and identification strategy is valid, and we find estimates with confidence intervals closely down around zero, we may have some confidence that any effect, if it exists, is small, at least in this context. To more robustly assert a ""zero or minimal effect"" one would want to find these closely bounded around zero under a variety of conditions for generalizability.
In general it is important to distinguish a lack of statistical power from a “tight” and informative null result; essentially by considering confidence intervals (or Bayesian credible intervals). See, e.g., Harms and Lakens (2018), “Making 'null effects' informative: statistical techniques and inferential frameworks”." Harms-lakens-18
-
Confidence intervals and Bayesian credible intervals
-
Comparing relative parameters
E.g., "the treatment had a heterogeneous effect... we see a statistically significant positive effect for women but not for men". This doesn't cut it: we need to see a statistical test for the difference in these effects. (And also see caveat about multiple hypothesis testing and ex-post fishing).
See [@verkaik2016]
- 'Moderators' Confusion with nonlinearity
Moderators: Heterogeneity mixed with nonlinearity/corners
In the presence of nonlinearity, e.g., diminishing returns, if outcome 'starts' at a higher level for one group (e.g., women), it is hard to disentangle a heterogeneous response to the treatment from 'the diminishing returns kicking in'. Related to https://datacolada.org/57 [57] Interactions in Logit Regressions: Why Positive May Mean Negative
- MHT
(Or get to this in the experimetrics section)
Where a particular assumption is critical to identification and inference ...Failure to reject the violation of an assumptionis not sufficient to give us confidence that it is satisfied and the results are credible. At several points the authors cite insignificant statistical tests as evidence in support of a substantive model, or of evidence that they do not need to worry about certain confounds. Although the problem of induction is difficult, I find this approach inadequate. Where a negative finding is given as an important result, the authors should also show that their parameter estimate is tightly bounded around zero. Where it is cited as evidence they can ignore a confound, they should provide evidence that they can statistically bound that effect is small enough that it should not reasonably cause an issue (e.g., as using Lee or McNemar bounds for selective attrition/hurdles).
- Exogeneity vs. exclusion
- Very hard to 'powerfully test'
IV not credible Identification Note that for an instrument to be valid it needs to both be exogenously determined (i.e., not selected in a way related to the outcome of interest) and to also not have a direct effect on the outcome (only an indirect effect through the endogenous variable
"Conditional on positive"/"intensive margin" analysis ignores selection
"Conditional on positive"/"intensive margin" analysis ignores selection Identification See Angrist and Pischke on "Good CoP, bad CoP". See also bounding approaches such as [@Lee2018] AngristJ.D.2008a,
- Bounding approaches (Lee, Manski, etc)
FE/DiD does not rule out a correlated dynamic unobservable, causing a bias
- Lagged dependent variable and fixed effects --> 'Nickel bias'
- The 'harm to science' from running underpowered studies
(Experimental) Study design: Identifying meaningful and useful (causal) relationships and parameters
- Sugden and Sitzia critique here, give more motivation
- Ruling out alternative hypotheses, etc
- The hazards of specification-searching
Needs to adjust significance tests for augmenting data/sequential analysis/peeking Statistics/econometrics new-statistics sagarin_2014 http://www.paugmented.com/ resubmit_letterJpube.tex, http://andrewgelman.com/2014/02/13/stopping-rules-bayesian-analysis/
Yet ...
P_augmented may overstate type-1 error rate Statistics/econometrics response to referees, new-statistics "
A process involving stopping ""whenever the nominal
Considering the calculations in \ref{sagarin2014}, it is clear that
(Links back to power analyses)
- Models to address publication biases