-
Notifications
You must be signed in to change notification settings - Fork 5
Calling `leastsq` via generic interface
The least-squares algorithm from MINPACK wrapped by scipy.optimize.leastsq
are very well designed and efficient. It's quite hard to reach their average performance (in terms of nfev) as they contain a lot of small witty tricks. So it's desirable to allow calling it from our general interface least_squares
, let's call it 'lm' (Levenberg-Marquardt). But it comes with the cost of incompatibility and inconsistency.
My idea is to try unify algorithms as much as possible. I suggest to expose all common parameters to the interface and make usage of options
almost unnecessary:
least_squares(fun, x0, jac='2-point', bounds=(-np.inf, np.inf), method='trf',
ftol=EPS**0.5 xtol=EPS**0.5, gtol=EPS**0.5, max_nfev=None,
scaling=None, diff_step=None, args=(), kwargs={}, options={})
Default parameters for kwargs
and options
might be {}
or None
. Either way is fine.
Here are some explanations:
- The signature of
fun
isfun(x, *args, **kwargs)
. Ditto forjac
, when it's a callable. -
jac='2-point'
suggests usage of 2-point numerical approximation of Jacobian, another options is '3-point'. Forleastsq
'3-point' and '2-point' will work the same way (perhaps with the warning about that). Allowed values are '2-point', '3-point' or a callable. - Broadcasting is allowed for bounds. You can use a single number to set all lower/upper bounds to the same number. Note, that
bounds=(-np.inf, np.inf)
falls within the standard bounds setting, it's not a special case. Whenmethod='lm'
and some of the bounds are finite the error is raised. -
jac='2-point'
suggests usage of 2-point numerical approximation of Jacobian, another options is '3-point'. Forleastsq
'3-point' and '2-point' will work the same way (perhaps with the warning about that). Allowed values are '2-point', '3-point' or a callable. -
method='trf'
, other options are 'dogbox' and 'lm'. When we extend algorithms to work with sparse inputs, then we will introduce method selection logic and change default tomethod=None
(as we decided not to use 'auto' word). -
max_nfev=None
suggests default value for each algorithm. -
scaling=1.0
determines variables scaling. MINPACK calls this parameter asdiag
. For simplicity no scaling is chosen as default behavior, again value 1.0 avoids introducing special cases (broadcasting is allowed). If you want to use Jacobian scaling as in MINPACK setscaling='jac'
. Array-like with positive elements might be used to apply fixed scaling. It's somewhat controversial whether to treatscaling
as MINPACK'sdiag
or as its reciprocal, but we chose the first variant, to conform to classical MINPACK algorithms. -
diff_step=None
determines step size selection for numerical differentiation,None
suggests default automatic step selection. If we will agree on "Press rule" for calculation the actual step, thendiff_step
is a relative step size. MINPACK usesepsfcn
and determines the step assqrt(epsfcn) * x
, so we just setepsfcn=diff_step**2
when calling MINPACK. Setting bothdiff_step
andoptions['epsfcn']
is a TypeError. -
args
,kwargs
- parameters passed to thefun
andjac
. Raise a warning thatleastsq
doesn't supportkwargs
. -
options
has limited usage to allow passingfactor
andcol_deriv
toleastsq
. I consider to disablecol_deriv=1
as it will only complicate documentation without any benefits (just compute Jacobian in a normal way, please!) Not sure how to handle options already presented in parameters, it will cause an error. Probably we can rely on this error and that's all.**options
are passed directly to the underlying solver. Unknown options or duplicate entries raise TypeErrors.
Another point to consider is what least_squares
returns. Obviously, we want it to return the same OptimizeResult
object for all methods. Let's consider all fields we currently return in view whether we can return them in case of method='lm'
:
-
x
- returned by MINPACK. -
fun
(residuals) - returned by MINPACK. -
jac
- we can't compute it from MINPACK returns (Q matrix from QR factorization is missing). I think it's fine to just calljac(x)
one more time. -
obj_value
- no problem to compute. -
optimality
- havingjac
andfun
easy to compute. -
active_mask
- all zeros for unbounded problem. -
nfev
- returned by MINPACK. MINPACK counts calls of a function done for numerical Jacobian approximation. We decided not to take this approach in our algorithms. -
njev
- returned with analytical Jacobian only (fromlmder
, notlmdif
). Set to None when analyticaljac
is not provided. -
nit
- not returned by MINPACK. Decided not to return this parameter for any of the algorithms algorithm. -
status
- returned by MINPACK. We should translate it to one of the standard statuses returned byleast_squares
, it's not hard to do. -
success
- derived from status. Decided to keep this parameter -
message
- again return one of the standard unified messages based on status.
leastsq
also returns the inverse of J.T.dot(J)
. It is a covariance matrix of x
in a linear approximation J dx ~ dy
(in least-squares sense), assuming that y
has identity covariance matrix (multiply it by sigma**2
if all y_i
have std of sigma
). It's not connected to optimization algorithms and not always doable (consider big sparse matrices, which will appear here hopefully soon).
This parameter is returned for method='lm'
So we can probably add this to OptimizeResult
for MINPACK only (or not). The suggested name is x_covariance
.