-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confounders #4
Comments
What does the acronym "DAG" stand for? I'm sure it'll be obvious when you say it. Thanks for the link to the paper btw, very useful. |
I should've been explicit - a DAG is a Directed Acyclic Graph
They are widely used in the causal inference literature to describe causal relationships between different variables. It's been argued (somewhere!) that DAGs should be used more often in observational studies (such as MAPS) where we want to control for confounders of the y~x relationship in a more principled way. "more principled" = not throwing in any potential confouder z such that y~z and x~z and not using stepwise variable selection. Here's a nice overview: https://doi.org/10.1093/ije/dyw341 |
Sorry - I thought I'd responded to this! I'll look into that a bit more.
|
This is really interesting point. Year of birth as a proxy for accessibility/popularity of computers. I don't think this is available in the dataset though? |
I have no idea because the public data dictionary has been deleted and I've not yet been approved for the collaborator sections of the OSF page! Is there any other variable that can act as a proxy? |
Nevermind, it looks as though they only recruited children born between 1990-1992 only so birth year is unlikely to be an issue http://www.bristol.ac.uk/alspac/researchers/cohort-profile/ |
Given the large number of variables (84) in the dataset should we consider something like specification curve analysis? I came across it recently in Amy Orban's paper: https://www.nature.com/articles/s41562-018-0506-1 It looks like it can put the magnitude of effects in context and seems quite appropriate for such a large dataset. |
Hi Andrew. I think the SCA approach is pretty much what the MAPS project is trying to do as a whole. And anyway we don't really have the time to design and run multiple analyses since we've only a few weeks left! If we just iterated through all confounder combinations it would take forever, so we'd have to be selective, and that process itself takes time. Even if we did do it, we'd only be reducing it all down to a single odds-ratio to pass on to MAPS anyway, and we wouldn't really have the opportunity to explore the variation in results, a key purpose of the SCA approach. I hope you'll agree it's best to agree on a single analysis plan and run with that. |
Yup, this sounds approach sensible to me. |
This is where it gets spicy. The wonderful thing about synthetic data is that you can’t p-hack - you have to specify your study design based on the data schema, and not the data values, since you can’t rely on the results.
An obvious important confounder is the pre-existence of depression at or before aged 16. We have variables dep_band_10, dep_band_13, and dep_band_15, which are caregiver-reported (eg by mum or dad) depression of the child aged 10, 13, and 15. We should consider at least dep_band_15.
After that, it’s a free-for-all. We can’t use everything, but we want to be sure we’re adjusting for relevant confounders. We can’t (and shouldn’t) use univariate associations or step-wise variable selection, etc. Causal inference would be helpful here – see https://doi.org/10.1007/s10654-019-00494-6
A first step might be for somebody to draw a DAG representing beliefs about causal relationships. We can reason about confounders from that starting point.
The text was updated successfully, but these errors were encountered: