Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of emcee #781

Closed
wants to merge 1 commit into from

Conversation

peendebak
Copy link
Contributor

@peendebak peendebak commented Mar 15, 2022

Description

This PR improves the performance of emcee by improving the internal _lnprob method.

  • The bounds argument is passed in transposed format. This aligns better (assuming standard numpy memory layout) with the operation in _lnprob performed with the bounds.
  • Move addition of the variable c out of the sum

All changes are internal, the public api is unchanged.

There are more options to improve performance, but these require more extensive changes or changes to the public api.

Type of Changes
  • Bug fix
  • New feature
  • Refactoring / maintenance
  • Documentation / examples
Tested on

Python: 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)]

lmfit: 1.0.2+29.gcd28293.dirty, scipy: 1.7.0, numpy: 1.21.5, asteval: 0.9.25, uncertainties: 3.1.5

Verification

Have you

  • included docstrings that follow PEP 257? Public api unchanged
  • referenced existing Issue and/or provided relevant link to mailing list? None found
  • verified that existing tests pass locally?
  • verified that the documentation builds locally?
  • squashed/minimized your commits and written descriptive commit messages?
  • added or updated existing tests to cover the changes? Not required
  • updated the documentation and/or added an entry to the release notes (doc/whatsnew.rst)?
  • added an example? Only performance improvements

@newville
Copy link
Member

@peendebak Thanks, but also, really? Does that actually give a performance improvement?

Generally speaking, performance is not a primary concern for most of lmfit. And if you're in such an unfortunate situation as to be using emcee, I cannot understand why performance would be a concern. It is designed to be slow, purposefully ignoring known optimizations in order to demonstrate how hard it works.

Also, just to be clear: emcee does not do a fit. I have said this before (#601), but I would rather remove emcee completely from lmfit than pretend that it is useful or worth supporting.

@peendebak
Copy link
Contributor Author

@newville I fully agree with you can the value of emcee does not lie in fitting. We fit our data using one of the other methods in lmfit, and then use emcee for further analysis (so the term fitting in the original description was a bit unfortunate).

When profiling emcee (and other lmfit methods) I noticed quite some time is spend in the framework itself, and not in the objective function that the user supplies. I understand that the focus for lmfit is on functionality (e.g. the high-level Parameter class) instead of speed. But for PR the changes are fully internal, so I don't see any reason not to use them.

@newville
Copy link
Member

@peendebak this PR is a proposed change to the code with the explicit intent of improving performance. But you have not given any indication that performance is improved. Why not?

Making any changes in the use of emcee, especially saying that the change "improves performance/features", suggests that we support and endorse the use of emcee within lmfit.minimize. I think this code is misleading, not very useful, and should be removed.

@reneeotten
Copy link
Contributor

@peendebak @newville I agree with Matt that some benchmarking of how much faster this is would indeed be useful. I find it hard to see why changing the order of the bounds would make a significant difference.

Having said that, whenever someone submits a PR or question/issue regarding emcee Matt's first response is "let's just remove it" as it isn't intended for fitting and is often misused for that. I do think it's a useful method, but indeed not for an initial fit as we all seem to agree on. We talked in the past about removing emcee as a fitting method and move it to a dedicated space for exploring the parameter space after a successful fit - I would be very much in favor of that but just haven't been able to find the time to do it and, additionally, I am not really an expert in emcee...

So how about instead of merging this PR we do actually that work now as we might have an interested party, @peendebak, in contributing to that goal? It would also allow us to change the api much more since we don't need to be backwards compatible (we could leave emcee as a method to minimize for one othee release) and try to solve some other open issues as well. For example, I think you might gain more from actually fixing the issue that emcee doesn't seem use multiprocessing - I did fiddle around with that a bit in the past and indeed verified that it isn't working, but couldn't figure out how to solve it.

So if you guys agree I will convert #601 from an Issue to a Discussion and we can continue there to decide how to organize this change and once that is settled writing the actual code should probably not to be too hard. Thoughts?

@newville
Copy link
Member

@reneeotten I agree with all of that, including a further discussion and trying to make a plan to move use of emcee to be more clearly "post fit" analysis, similar to conf_interval().

@peendebak
Copy link
Contributor Author

@newville @reneeotten I will close the PR, as there is no support for emcee it is current form as "fitting" algorithm.

I would be happy to use or review emcee functionality in different form, but will not have the resources to drive this new changes.

@peendebak peendebak closed this Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants