Confounders #4

wjchulme · 2019-03-31T13:27:59Z

This is where it gets spicy. The wonderful thing about synthetic data is that you can’t p-hack - you have to specify your study design based on the data schema, and not the data values, since you can’t rely on the results.

An obvious important confounder is the pre-existence of depression at or before aged 16. We have variables dep_band_10, dep_band_13, and dep_band_15, which are caregiver-reported (eg by mum or dad) depression of the child aged 10, 13, and 15. We should consider at least dep_band_15.

After that, it’s a free-for-all. We can’t use everything, but we want to be sure we’re adjusting for relevant confounders. We can’t (and shouldn’t) use univariate associations or step-wise variable selection, etc. Causal inference would be helpful here – see https://doi.org/10.1007/s10654-019-00494-6

A first step might be for somebody to draw a DAG representing beliefs about causal relationships. We can reason about confounders from that starting point.

jspickering · 2019-04-01T15:25:28Z

What does the acronym "DAG" stand for? I'm sure it'll be obvious when you say it.

Thanks for the link to the paper btw, very useful.

wjchulme · 2019-04-03T12:37:41Z

I should've been explicit - a DAG is a Directed Acyclic Graph

graph as in graph theory - edges and nodes
acyclic as in we don't allow any cycles - you can't get back to a node once you've left it
directed as in the relationships between nodes are not symmetrical - A to B =/= B to A

They are widely used in the causal inference literature to describe causal relationships between different variables.

It's been argued (somewhere!) that DAGs should be used more often in observational studies (such as MAPS) where we want to control for confounders of the y~x relationship in a more principled way. "more principled" = not throwing in any potential confouder z such that y~z and x~z and not using stepwise variable selection.

Here's a nice overview: https://doi.org/10.1093/ije/dyw341

jspickering · 2019-04-17T10:24:26Z

Sorry - I thought I'd responded to this! I'll look into that a bit more.

I think year of birth might also be an important confounder. Children of the 90s presumably covers the whole span of 1990-1999, but we might have to check. I was born in 1991, and my computer usage drastically changed with the times as things like Neopets, MySpace, Bebo, Facebook, Twitter cropped up. I didn't have a smartphone at 16, but probably a lot of kids born in 1999 did. Do we know if "computer" usage covers laptops/PCs only?

~~I'm just thinking out loud here really!~~ See comment below, born between 1990-1992 only.

wjchulme · 2019-04-17T11:38:49Z

This is really interesting point. Year of birth as a proxy for accessibility/popularity of computers. I don't think this is available in the dataset though?

jspickering · 2019-04-17T15:19:34Z

I have no idea because the public data dictionary has been deleted and I've not yet been approved for the collaborator sections of the OSF page! Is there any other variable that can act as a proxy?

jspickering · 2019-04-17T15:27:07Z

Nevermind, it looks as though they only recruited children born between 1990-1992 only so birth year is unlikely to be an issue http://www.bristol.ac.uk/alspac/researchers/cohort-profile/

ajstewartlang · 2019-04-18T08:26:29Z

Given the large number of variables (84) in the dataset should we consider something like specification curve analysis? I came across it recently in Amy Orban's paper:

https://www.nature.com/articles/s41562-018-0506-1

It looks like it can put the magnitude of effects in context and seems quite appropriate for such a large dataset.

wjchulme · 2019-04-30T15:39:32Z

Hi Andrew. I think the SCA approach is pretty much what the MAPS project is trying to do as a whole. And anyway we don't really have the time to design and run multiple analyses since we've only a few weeks left! If we just iterated through all confounder combinations it would take forever, so we'd have to be selective, and that process itself takes time. Even if we did do it, we'd only be reducing it all down to a single odds-ratio to pass on to MAPS anyway, and we wouldn't really have the opportunity to explore the variation in results, a key purpose of the SCA approach.

I hope you'll agree it's best to agree on a single analysis plan and run with that.

ajstewartlang · 2019-04-30T16:22:50Z

Yup, this sounds approach sensible to me.

wjchulme mentioned this issue May 1, 2019

Statistical modelling approach #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confounders #4

Confounders #4

wjchulme commented Mar 31, 2019

jspickering commented Apr 1, 2019

wjchulme commented Apr 3, 2019 •

edited

Loading

jspickering commented Apr 17, 2019 •

edited

Loading

wjchulme commented Apr 17, 2019

jspickering commented Apr 17, 2019

jspickering commented Apr 17, 2019

ajstewartlang commented Apr 18, 2019

wjchulme commented Apr 30, 2019

ajstewartlang commented Apr 30, 2019

Confounders #4

Confounders #4

Comments

wjchulme commented Mar 31, 2019

jspickering commented Apr 1, 2019

wjchulme commented Apr 3, 2019 • edited Loading

jspickering commented Apr 17, 2019 • edited Loading

wjchulme commented Apr 17, 2019

jspickering commented Apr 17, 2019

jspickering commented Apr 17, 2019

ajstewartlang commented Apr 18, 2019

wjchulme commented Apr 30, 2019

ajstewartlang commented Apr 30, 2019

wjchulme commented Apr 3, 2019 •

edited

Loading

jspickering commented Apr 17, 2019 •

edited

Loading