Skip to content

Calling `leastsq` via generic interface

Nikolay Mayorov edited this page Jun 30, 2015 · 20 revisions

The least-squares algorithm from MINPACK wrapped by scipy.optimize.leastsq are very well designed and efficient. It's quite hard to reach their average performance (in terms of nfev) as they contain a lot of small witty tricks. So it's desirable to allow calling it from our general interface least_squares, let's call it 'lm' (Levenberg-Marquardt). But it comes with the cost of incompatibility and inconsistency.

My idea is to try unify algorithms as much as possible. I suggest to expose all common parameters to the interface and make usage of options almost unnecessary:

least_squares(fun, x0, jac='2-point', bounds=(-np.inf, np.inf), method='trf',
              ftol=EPS**0.5 xtol=EPS**0.5, gtol=EPS**0.5, max_nfev=None,
              scaling=None, diff_step=None, args=(), kwargs={}, options={})

Default parameters for kwargs and options might be {} or None. Either way is fine.

Here are some explanations:

  1. The signature of funis fun(x, *args, **kwargs). Ditto for jac, when it's a callable.
  2. jac='2-point' suggests usage of 2-point numerical approximation of Jacobian, another options is '3-point'. For leastsq '3-point' and '2-point' will work the same way (perhaps with the warning about that). Allowed values are '2-point', '3-point' or a callable.
  3. Broadcasting is allowed for bounds. You can use a single number to set all lower/upper bounds to the same number. Note, that bounds=(-np.inf, np.inf) falls within the standard bounds setting, it's not a special case. When method='lm' and some of the bounds are finite the error is raised.
  4. jac='2-point' suggests usage of 2-point numerical approximation of Jacobian, another options is '3-point'. For leastsq '3-point' and '2-point' will work the same way (perhaps with the warning about that). Allowed values are '2-point', '3-point' or a callable.
  5. method='trf', other options are 'dogbox' and 'lm'. When we extend algorithms to work with sparse inputs, then we will introduce method selection logic and change default to method=None (as we decided not to use 'auto' word).
  6. max_nfev=None suggests default value for each algorithm.
  7. scaling=1.0 determines variables scaling. MINPACK calls this parameter as diag. For simplicity no scaling is chosen as default behavior, again value 1.0 avoids introducing special cases (broadcasting is allowed). If you want to use Jacobian scaling as in MINPACK set scaling='jac'. Array-like with positive elements might be used to apply fixed scaling. It's somewhat controversial whether to treat scaling as MINPACK's diag or as its reciprocal, but we chose the first variant, to conform to classical MINPACK algorithms.
  8. diff_step=None determines step size selection for numerical differentiation, None suggests default automatic step selection. If we will agree on "Press rule" for calculation the actual step, then diff_step is a relative step size. MINPACK uses epsfcn and determines the step as sqrt(epsfcn) * x, so we just set epsfcn=diff_step**2 when calling MINPACK. Setting both diff_step and options['epsfcn'] is a TypeError.
  9. args, kwargs - parameters passed to the fun and jac. Raise a warning that leastsq doesn't support kwargs.
  10. options has limited usage to allow passing factor and col_deriv to leastsq. I consider to disable col_deriv=1 as it will only complicate documentation without any benefits (just compute Jacobian in a normal way, please!) Not sure how to handle options already presented in parameters, it will cause an error. Probably we can rely on this error and that's all. **options are passed directly to the underlying solver. Unknown options or duplicate entries raise TypeErrors.

Another point to consider is what least_squares returns. Obviously, we want it to return the same OptimizeResult object for all methods. Let's consider all fields we currently return in view whether we can return them in case of method='lm':

  1. x - returned by MINPACK.
  2. fun (residuals) - returned by MINPACK.
  3. jac - we can't compute it from MINPACK returns (Q matrix from QR factorization is missing). I think it's fine to just call jac(x) one more time.
  4. obj_value - no problem to compute.
  5. optimality - having jac and fun easy to compute.
  6. active_mask - all zeros for unbounded problem.
  7. nfev- returned by MINPACK. MINPACK counts calls of a function done for numerical Jacobian approximation. We decided not to take this approach in our algorithms.
  8. njev - returned with analytical Jacobian only (from lmder, not lmdif). Set to None when analytical jac is not provided.
  9. nit - not returned by MINPACK. Decided not to return this parameter for any of the algorithms algorithm.
  10. status - returned by MINPACK. We should translate it to one of the standard statuses returned by least_squares, it's not hard to do.
  11. success - derived from status. Decided to keep this parameter
  12. message - again return one of the standard unified messages based on status.

leastsq also returns the inverse of J.T.dot(J). It is a covariance matrix of x in a linear approximation J dx ~ dy (in least-squares sense), assuming that y has identity covariance matrix (multiply it by sigma**2 if all y_i have std of sigma). It's not connected to optimization algorithms and not always doable (consider big sparse matrices, which will appear here hopefully soon). This parameter is returned for method='lm'

So we can probably add this to OptimizeResult for MINPACK only (or not). The suggested name is x_covariance.

Clone this wiki locally