-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statistical modelling approach #2
Comments
I'm pretty keen on Bayesian. I've done it before but I'm no expert! |
I've very limited experience with Bayesian approaches - when I have tried Bayesian models in the past (and really that's been limited to tinkering with the brms package in R) I've often worried about where I get my priors from. |
I may be wrong, but I think @OliJimbo is a pretty good Bayesian |
Appropriate choice of priors is obviously the biggie for Bayes, but this shouldn't put us off - there are 13734 observations which is a fairly healthy sample size to ensure that even modest effect sizes will probably dominate an agreed set of sceptical priors. What might put us off is that in the submission guidelines it says if you're using jags/bugs or stan to run a Bayesian analysis, please contact us. I presume this is because we need to submit 'analysis chunks' (discrete functions that load data, transform data, deal with missing values, etc) which might not be particularly compatible with a Stan workflow. Can I suggest one or more of @ajstewartlang, @jspickering, and @OliJimbo (or anyone else familiar with brms/Stan) look into this and give some opinionated feedback on the pros/cons? |
I'm just reading through the instructions now - shall we just use the brms packages for any Bayesian stuff we want to do? That seems like the most straightforward option as it will allow the Bayesian chunk to sit within our final R script. |
It looks like our outcome measure is dep_score (Child's depression score on CIS-R) - which is an ordinal variable with possible values of NA, 0, 1, 2, 3, 4. I'm assuming here an NA means the child wasn't measured (so is actually a real NA missing data point) Would a cumulative link mixed model be a good starting point to explore how our predictors predict this dep_score outcome? |
Or I guess a straight cumulative link model (i.e., no mixed part) would work too - @wjchulme ? |
Thanks, Andrew. I doubt we can gain much from a mixed model (with a random effect component) since from what I can tell there's no data on clusters/hierarchies to exploit. I've just had a chance to look through the data this afternoon. As you say, the primary outcome variable is ordinal, so it makes sense to use ordinal models. In my experience, the choice of link function is both important (it materially affects the results) and arbitrary (you can rarely make a case for one function over another based on some underlying theory of the process at play). I'll leave it to the psychologists amongst us to decide if one makes more sense than another! Failing that, I'm sure a Normal distribution (probit link) is as good as any. We've been asked to produce odd-ratios to describe the computeruse-depression relationship - as I've said above, these can be recovered from an ordinal model for a given yes/no definition of depression.
In both approaches we develop an ordinal model - it's the definition of depression that differs, and with it the way to calculate our odds-ratio. This second approach is probably most faithful to the information we have available. But let's start with the ordinal model and take it from there |
Thanks Will - an ordinal model sounds good. How do we decide on what predictors/explanatory variables to add? I could see a case being made for quite a lot of those that are present in the dataset. Do we start with a model with just a few 'common sense' predictors? |
Obviously there are a lot to chose from - we can restrict to variables that behave nicely (few missing values, sufficient variance, low colinearity, plausible) but the use of DAGs here may also help us decide based on some underlying mechanisms. But I'll leave this discussion for issue #4. |
Hello, sorry for the late jump-in! I think we should do both frequentist & Bayesian- that being said I do have experience with Bayesian (my thesis) but unfortunately only in JASP. Agree with the use of the ordinal model as well. |
If you're able to put something together @lanabojanic that would be fantastic |
I've been having a play around this afternoon. I've realised that the "interesting detail" (see my comment above) only applies to the dep_band_* variables, which isn't the outcome variables we're interested in. So all that I said about marginalising over those given probabilities doesn't make any sense! So we can ignore that |
Agree with the use of Ordinal models, and this is easily justifiable with the recent tutorials on the subject. I am currently working on ordinal regression for my thesis using brms - it really hates the default priors which place too much probability on log(0) (i.e -inf) so even a general hypothesised direction would be fine (i.e. -ive or +ive). We could even do a sensitivity analysis to check. I admit that I haven't had a look at the structure yet - but will take a look asap! I also concur with using both freq and bayes methods! The dataset is so large that the false positive rate might be inflated (if using significance testing in isolation!) so some sort of bayes factors, or model selection methods would be a good idea! Oli |
Firstly, Bayesian – yes or no? It might be a good learning opportunity for people with limited experience of Bayesian inference, including me. But I don’t have a strong feeling about this either way and happy to go with majority view.
As for the actual model, an odds ratio is required for the final output so the outcome variable is necessarily binary - depression at 18, yes/no. Logistic regression is the obvious candidate but it’s possible to recover an odds ratio from any model that can (be coerced to) provide the probability of the outcome with/without the exposure - including models with non-binary outcomes.
So it will depend on how depression is represented in the dataset which we won’t know until we get access. There’s a variable for a clinical diagnosis of depression at aged 18, has_dep_diag, but also some other variables relating to depressive symptoms at 18.
I can see where the path of least resistance will take us, but if anyone has a desire to do the non-obvious thing then let’s hear it! EIther way, it would be helpful to get a consensus on these issues as early as possible.
The text was updated successfully, but these errors were encountered: