Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass custom kernel to SVC #103

Open
cljord opened this issue Oct 8, 2021 · 7 comments
Open

Pass custom kernel to SVC #103

cljord opened this issue Oct 8, 2021 · 7 comments

Comments

@cljord
Copy link

cljord commented Oct 8, 2021

Not sure if I missed this in the docs or if this can be done with ScikitLearnBase, but in Scikit Learn, you can define a custom kernel function very easily and then pass it to the SVC during creation, like this:

def my_kernel(X, Y):
    """
    We create a custom kernel:

                 (2  0)
    k(X, Y) = X  (    ) Y.T
                 (0  1)
    """
    M = np.array([[2, 0], [0, 1.0]])
    return np.dot(np.dot(X, M), Y.T)

# we create an instance of SVM and fit out data.
clf = svm.SVC(kernel=my_kernel)

(from here)

I haven't been able to figure out how to do this with the Julia package and haven't found anything about it in the docs either. If it isn't possible, this would be a convenient feature (if it is, sorry for opening the issue and would be very thankful if somebody could point me in the right direction).

@cstjean
Copy link
Owner

cstjean commented Oct 9, 2021

What happens if you call SVC(kernel=some_julia_function)?

@cljord
Copy link
Author

cljord commented Oct 11, 2021

The custom kernel I wanted to pass was just a dot product, so I used dot(X, Y) from the LinearAlgebra package (that's when I opened the issue). I tried it again just now to recreate the error, and using dot(X, Y) throws an error, but weirdly, using X * Y' works fine (I'm assuming these are equivalent, haven't found any way to confirm that except for a bit of testing on my own).

Here's a minimal program showing the error

using DataFrames
using RDatasets: dataset
using ScikitLearn
using LinearAlgebra
using ScikitLearn.CrossValidation: train_test_split
@sk_import svm: SVC

iris = dataset("datasets", "iris")

X = convert(Array, select(iris, [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]))
y = convert(Array, iris[!, :Species])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4)

function this_does_not_work(X, Y)
	return dot(X, Y)
end

clf = SVC(kernel=this_does_not_work)

fit!(clf, X_train, y_train)

function this_works(X, Y)
	return X * Y'
end

clf = SVC(kernel=this_works)

fit!(clf, X_train, y_train)

Using dot(X, Y') (or any other combination) didn't work either. The error I got was this:

ERROR: PyError ($(Expr(:escape, :(ccall(#= /Users/cljord/.julia/packages/PyCall/BD546/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'IndexError'>
IndexError('tuple index out of range')
File "/Users/cljord/.julia/conda/3/lib/python3.8/site-packages/sklearn/svm/_base.py", line 226, in fit
fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
File "/Users/cljord/.julia/conda/3/lib/python3.8/site-packages/sklearn/svm/_base.py", line 268, in _dense_fit
if X.shape[0] != X.shape[1]:

@cstjean
Copy link
Owner

cstjean commented Oct 13, 2021

Looks to me like you're getting Python objects passed to your function. I would look in the PyCall documentation, but maybe dot(Array(X), Array(Y)) could work? dot does not seem to work with python arrays.

@cljord
Copy link
Author

cljord commented Oct 15, 2021

I tried dot(Array(X), Array(Y)) and that didn't work, I'll check that out more in the coming weeks.

I saw on another issue that ScikitLearn.jl is currently more of a gateway into the new ecosystem, but if it fits with the current vision for ScikitLearn.jl, I'd like to contribute a passage/page to the documentation about how to use a custom kernel for the SVC (as I would have appreciated it myself).

Basically a short example along the lines of the example from the Python sk-learn I linked above and a mention that you probably have to figure out how it works with PyCall (might go more in-depth depending on if I figure it out myself).

If you think it's unnecessay, we can also close the issue.

@cstjean
Copy link
Owner

cstjean commented Oct 19, 2021

Looking at the docs, there should probably be a short page like Relationship to PyCall, that explains how it works. That would be a good place for your example. That might be significant work, though. Hmmm. Maybe you can start it, and it can be expanded later.

Beware that ScikitLearn hasn't been super-well maintained, so making any kind of PR is a journey!

@cljord
Copy link
Author

cljord commented Oct 21, 2021

Sounds good. I'm not very familiar with PyCall so I'll have to get a bit more understanding first, but since this isn't urgent, I'll just do that soon-ish.

I was thinking that maybe 2 pages would be good, one for custom kernels with the example above that shows how easy it is (like sklearn for Python just writing a function and adding it as a parameter for SVC). Then the other page that you mentioned, Relationship to PyCall, and linking to that, to show that, depending on your kernel function, you'll have to use PyCall to get it working.

@cstjean
Copy link
Owner

cstjean commented Oct 21, 2021

I'd see more a Relationship to PyCall page, with a subsection on Callback functions. Something like "Callback functions should just work, however the objects you'll receive might be Python arrays, you'll need to do XYZ." Then provide the SVC example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants