Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstructured data fitting #33

Open
sixpearls opened this issue Jun 24, 2019 · 2 comments
Open

Unstructured data fitting #33

sixpearls opened this issue Jun 24, 2019 · 2 comments
Milestone

Comments

@sixpearls
Copy link
Collaborator

sixpearls commented Jun 24, 2019

The SciPy docs have a few of Dierckx's surface fit wrappers (SmoothBivariateSpline, LSQBivariateSpline, and their spherical equivalents) listed under the "unstructured data" sub-heading for 2-D Splines. It looks like Dierckx's general API for the 2D problems is to take 3 1-D arrays for the X, Y, Z sample data. The rectangular (structured data) version seems like it does something similar to #31.

For the methods that can handle unstructured data, it seems it is using a somewhat heuristic algorithm that starts with a least squares spline with knots only on the boundaries of the sample data, then iteratively:

  1. Add a knot to a dimension (location and dimension based on heuristic criteria of the residuals of current spline).
  2. Construct the least squares spline of the data for the new set of knots.
  3. If the fit improves by a certain threshold, keep the knot. If not, remove it and mark this region not to be tested again.

I say this method is heuristic because I don't believe there are any theoretical claims about the results of this algorithm, and about tuning another parameter the author suggests "the user should examine it graphically, e.g. by looking at its contour map."

When I first saw that SciPy was listing this for "unstructured data" I thought we should include similar functionality. I didn't realize it was a fairly specific algorithm. It is also fairly tightly coupled to the "smoothing" formulation used throughout his FITPACK work (i.e., the 2-D splines linked here and 1-D class-based and functional wrappers.)

At this point, I am leaning towards saying this is out of scope, at least for "version 1.0" or whatever we want to call the first submission/release. This means we aren't able to fully replace all of SciPy's spline functionality but we do get very close with a much more cohesive approach. We could, perhaps, show an example with similar functionality with user-provided knots.

It is possible we could add similar functionality to this project or spin out a new one for a future publication(s). Just extending this heuristic to N-D might be a publication. We may also want to compare this approach to other adaptive knot selection, like MARS (the scikit-learn package inexplicably renamed the algorithm to EARTH).

@sixpearls
Copy link
Collaborator Author

@ixjlyons What do you think about the scope of this for the first release/publication?

@ixjlyons
Copy link
Member

I agree it seems out of scope and/or a good candidate for future work.

the scikit-learn package inexplicably renamed the algorithm to EARTH

Curiosity got the best of me. Apparently MARS is trademarked so they chose the nearest planet to it1 as an alternative. Seems like the original was an R implementation.

1Made that part up but seems plausible

@sixpearls sixpearls added this to the 2.0 milestone Jul 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants