Replies: 8 comments
-
Any questions/thoughts on this? |
Beta Was this translation helpful? Give feedback.
-
Do we need to do anything here besides warn users these two asymptotic variance estimates are inequivalent and may lead to different estimates when overlap is poor? If we want to help users apply MBAR more robustly, I think our focus is elsewhere:
|
Beta Was this translation helpful? Give feedback.
-
I agree this is a low priority thing; I am resurrecting my edits from this summer, and wanted to get confirmation before making a decision by fiat. I can document the difference in the code for now, and leave it at that. point 1 above is straightforward; perhaps file an issue on the behavior you'd like to see? point 3 is straightforward, it's more deciding on what the appropriate cutoffs should be and why. point 3 is a research topic we've spent a lot of time on; the normalization constant itself is not amenable to Bayesian inference, since it is parameterless (there is only one set of normaliation constant that satisfied the equations and data). One proposal (that I am not sure has been done entirely correctly) introduces parameters in the density of states \sigma(E), but that gets a bit away from the pure probability interpretation, and would have to be thought about a fair amount before deciding if that really provided additional information, and would complicate things immensely if we have to be estimating parameters for many \sigma(E) if the energy function changes for each step. Providing a tool for automatic bootstrapping might be better, and reporting errors as confidence intervals instead. Other non-point estimates of the uncertainty might be applicable. Right now, the one we are using is essentially the implicit function version of \delta f(x) = |df/dx| \delta x. In some sense, bootstrapping would be the "right" approach, since describes the uncertainty in the empirical distribution, which is what we are using to make the estimates. |
Beta Was this translation helpful? Give feedback.
-
I like the bootstrapping idea. Do you know if alchemlyb uses our implementation of MBAR? If so, it might be good to coordinate with them as well. And with the thermotools devs from Frank Noé's lab that provide MBAR and TRAM implementations. I guess it would be useful to figure out what the most important services we can provide are in planning any future changes. Maybe this means we should also instrument the code to identify which features people use and what could use optimizing by having the code phone home by default. These ideas are all out of scope for this issue, but how about we plan a brainstorming session (maybe in person?) where we can assess possibilities and effort commitments and plan the best way forward? We might be able to apply some funding toward this project. |
Beta Was this translation helpful? Give feedback.
-
Sound good. Should be pretty easy to implement.
Yes, it does. I am working with them to port over alchemical_analysis functionality to alchemlyb, so I can manage this. We might think about what functionality should go there, but computing an uncertainty in multiple ways seems like a core functionality.
I'll let you manage that coordination.
Yes.
Ugh. Could bring up a number of issues. Perhaps put that lower on the list . . .
Brainstorming is good. I would love to finally get the 'Everything you wanted to know about MBAR but were afraid to ask' review article finally out (likely as a LiveCOMS paper) this semester, so and improved version of pymbar (drawing on what is in #255) would go along well with that. I can devote some time this semester. |
Beta Was this translation helpful? Give feedback.
-
Addressed by #286 |
Beta Was this translation helpful? Give feedback.
-
Will leave this here for now to remind that we need to figure out why the difference exists. What two approaches are being used and why are they different? |
Beta Was this translation helpful? Give feedback.
-
See #430 for how this shows up when trying to figure out what the overlap matrix is, since overlap is directly related to uncertainty. |
Beta Was this translation helpful? Give feedback.
-
Hi, all-
I've identified an interesting difference between BAR and MBAR uncertainty estimates as currently implemented in the code. If you start directly from Bennett's paper, plugging equation (8) into (7), you eventually get:
Where "_i" indicates averages over state 0 or 1, f(x) is the Fermi function, and x = M+(F1-F0)-(U1-U0), with M = ln n1/n0. This formula has been rederived a couple of times in different places, and is what is generally used for BAR -- and is what is used in the code currently. Call this method A.
If, however, you take my PRL BAR paper you get an estimator for the variance that can be reduced to:
Call this method B. This approach also exactly what MBAR gives you (as proven in the MBAR paper). This the Fisher-information-derived uncertainty obtained from the likelihood.
A and B are not actually the same thing, though in the limit of good sampling, they are both practically indistinguishable from the true variance (obtained by repeating a bunch of times). I believe they are both "correct" in that they are both asymptotic estimators of the uncertainty, they just have somewhat different asymptotic behavior.
Which one is better?
Method A turns out to be a good estimate even if your free energy is wrong, which method B doesn't d So I was using method A since I was using MBAR for other BAR-based estimators (UBAR and RBAR in the Paliwal et al. 2011 paper). Recall that Bennett showed that his approach produced an estimator of the free energy even if you didn't use the right constant, but that the variance was minimized with the right constant (Which was the free energy + log ratio of number of samples). So A works regardless of whether the constant is right.
Method B appears to be slightly closer to the right answer for moderate to small overlap, but the difference is very small.
Method B is wrong if the constant is wrong (IIRC - this is something I call from a while ago, which is why I bothered to change it from the original method B to this one; maybe there was another error that was really the problem).
Method A is somewhat of an underestimate for very poor overlap and low numbers of samples.
Method B is an overestimate (sometimes wildly) for very poor overlap and low numbers of samples.
In situations they are noticeably from each other, they are also pretty different from the true answer (obtained by calculating the standard deviation over lots of realizations).
Examples of the divergence:
Test case, offset harmonic oscillators:
O_k = numpy.array([0, 1]) # offsets for spring constants
N_k = 1*numpy.array([10, 10]) # number of samples from each state
beta = 1.0 # inverse temperature for all simulations
K_k = spring constant for each state (varied).
Repeat 1000 times with new random data to generate true standard deviation. Repeating changes the answers slightly.
SO:
What to do?
In most cases, the results are pretty much indistinguishable, so operationally, not a big deal for any real application. I (or someone else? I couldn't find the issue discussing it) noticed a difference, and I investigated.
I have not yet been able to identify exactly where the differences come from in the derivations, but likely something subtle. One is standard error propagation and one is the Fisher information approach, and it's not clear why they are different to me yet.
Beta Was this translation helpful? Give feedback.
All reactions