Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fit z-curve (mixture model) with all z-values rather than only statsitically significant ones #16

Open
Yefeng0920 opened this issue Oct 23, 2023 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@Yefeng0920
Copy link

@FBartos @gaborcsardi I would be grateful, if you would like to tell me how to fit a collection of z values without truncation at 1.96. I mean z-curve only uses the statistically significant z-values to fit the mixture model. But how to use all z values regardless of the statistical significance. The reason why I ask this is because I want to test if a dataset without publication bias (this can be guaranteed by Registered Reports), the EDR derived from a mixture model fitted with only statistically significant z-values should be similar to that fitted with all z-values regardless of the statistical significance.

Best,
Yefeng

@FBartos FBartos added the documentation Improvements or additions to documentation label Oct 25, 2023
@FBartos
Copy link
Owner

FBartos commented Oct 25, 2023

Hi Yefeng,

You can use the control argument to specify the lower fitting range a in the zcurve() function. See the following example:

library(zcurve)
z <- rnorm(100)
fit <- zcurve(z = z, control = list(a = 0))
summary(fit)
plot(fit)

See ?control_EM for more details.

Hope this helps!
Frantisek

@Yefeng0920
Copy link
Author

Hi Frantisek @FBartos ,
This is quite useful. So let me try to understand the so-called folded truncated distribution. Basically, the raw values are converted into absolute values or magnitude, then constrain the data within a certain range of values. By default, the range is qnorm(0.05/2,lower.tail =F) to 5. Finally, a mixture model with EM estimation is used to fit the truncated values. The reason why only fitting the z values with a nominally statistical significance is that it can account for the publication bias, although I could not quite understand the rationale why this is the case. Do I understand the whole process correctly?

@FBartos
Copy link
Owner

FBartos commented Oct 27, 2023

Yes, that's correct.
In short; under the selection for statistical significance, estimating the model only using the statistically significant results with a truncated likelihood allows us to obtain estimates that are unaffected by publication bias. Then, we use the locations of the truncated distributions to extrapolate to statistically non-significant results (which we do not use for estimation as they might be non-representative due to the selection).

@Yefeng0920
Copy link
Author

@FBartos It is really a great idea. But I am still thinking only using the average to summarize the discovery rate or replication rate is not a good way on some occasions. Therefore, it is good to present the whole distribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants