Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confounders #4

Open
wjchulme opened this issue Mar 31, 2019 · 9 comments
Open

Confounders #4

wjchulme opened this issue Mar 31, 2019 · 9 comments

Comments

@wjchulme
Copy link
Owner

This is where it gets spicy. The wonderful thing about synthetic data is that you can’t p-hack - you have to specify your study design based on the data schema, and not the data values, since you can’t rely on the results.

An obvious important confounder is the pre-existence of depression at or before aged 16. We have variables dep_band_10, dep_band_13, and dep_band_15, which are caregiver-reported (eg by mum or dad) depression of the child aged 10, 13, and 15. We should consider at least dep_band_15.

After that, it’s a free-for-all. We can’t use everything, but we want to be sure we’re adjusting for relevant confounders. We can’t (and shouldn’t) use univariate associations or step-wise variable selection, etc. Causal inference would be helpful here – see https://doi.org/10.1007/s10654-019-00494-6

A first step might be for somebody to draw a DAG representing beliefs about causal relationships. We can reason about confounders from that starting point.

@jspickering
Copy link

What does the acronym "DAG" stand for? I'm sure it'll be obvious when you say it.

Thanks for the link to the paper btw, very useful.

@wjchulme
Copy link
Owner Author

wjchulme commented Apr 3, 2019

I should've been explicit - a DAG is a Directed Acyclic Graph

  • graph as in graph theory - edges and nodes
  • acyclic as in we don't allow any cycles - you can't get back to a node once you've left it
  • directed as in the relationships between nodes are not symmetrical - A to B =/= B to A

They are widely used in the causal inference literature to describe causal relationships between different variables.

It's been argued (somewhere!) that DAGs should be used more often in observational studies (such as MAPS) where we want to control for confounders of the y~x relationship in a more principled way. "more principled" = not throwing in any potential confouder z such that y~z and x~z and not using stepwise variable selection.

Here's a nice overview: https://doi.org/10.1093/ije/dyw341

@jspickering
Copy link

jspickering commented Apr 17, 2019

Sorry - I thought I'd responded to this! I'll look into that a bit more.

I think year of birth might also be an important confounder. Children of the 90s presumably covers the whole span of 1990-1999, but we might have to check. I was born in 1991, and my computer usage drastically changed with the times as things like Neopets, MySpace, Bebo, Facebook, Twitter cropped up. I didn't have a smartphone at 16, but probably a lot of kids born in 1999 did. Do we know if "computer" usage covers laptops/PCs only?

I'm just thinking out loud here really! See comment below, born between 1990-1992 only.

@wjchulme
Copy link
Owner Author

This is really interesting point. Year of birth as a proxy for accessibility/popularity of computers. I don't think this is available in the dataset though?

@jspickering
Copy link

I have no idea because the public data dictionary has been deleted and I've not yet been approved for the collaborator sections of the OSF page! Is there any other variable that can act as a proxy?

@jspickering
Copy link

Nevermind, it looks as though they only recruited children born between 1990-1992 only so birth year is unlikely to be an issue http://www.bristol.ac.uk/alspac/researchers/cohort-profile/

@ajstewartlang
Copy link
Contributor

Given the large number of variables (84) in the dataset should we consider something like specification curve analysis? I came across it recently in Amy Orban's paper:

https://www.nature.com/articles/s41562-018-0506-1

It looks like it can put the magnitude of effects in context and seems quite appropriate for such a large dataset.

@wjchulme
Copy link
Owner Author

Hi Andrew. I think the SCA approach is pretty much what the MAPS project is trying to do as a whole. And anyway we don't really have the time to design and run multiple analyses since we've only a few weeks left! If we just iterated through all confounder combinations it would take forever, so we'd have to be selective, and that process itself takes time. Even if we did do it, we'd only be reducing it all down to a single odds-ratio to pass on to MAPS anyway, and we wouldn't really have the opportunity to explore the variation in results, a key purpose of the SCA approach.

I hope you'll agree it's best to agree on a single analysis plan and run with that.

@ajstewartlang
Copy link
Contributor

Yup, this sounds approach sensible to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants