-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sklearn-compatible interface #147
Comments
You can convert a
See tutorial. |
Thanks for your suggestion! I think this is great to add. Setting this as P2 feature, as we first want to prioritize more stype support #88. |
Is someone already working on that? |
No, as far as I know. Let us know if you are interested! |
Yes, I'm interested. Hence, you can assign this to me. How fast should this task be completed? |
@MacOS Great, thank you! It'd be good to complete this feature by the end of January. Would that be possible? |
@weihua916 As of now, yes. |
I have tried this and it seems to be very difficult.
Next, we want to pass a validation dataset as well, but if we pass them using a tuple like
|
🤔 Thank you for looking into this, @34j! I was about to start working on it.
I would have simply converted the
This seems to be very big and unrealistic because we would have to make all estimators compatible with scikit-learn, which is a lot to ask for. At the moment, May I ask you, @34j, to post a self-contained example (or examples) that what qualify pytorch-frame as being sklearn-compatible? PS: I would submit one PR today, but maybe only as a draft. |
This is an implicit request for the recently implemented
I feel like this could probably be done, I'll send a draft PR in an hour and I want to ask @MacOS to take it over and do the documentation, testing and tutorial work. dirty prototype code
from skorch import NeuralNetClassifier
from skorch.dataset import Dataset as SkorchDataset
import torch.nn as nn
from torch_frame.utils import infer_df_stype
from torch_frame.data.dataset import DataFrameToTensorFrameConverter, Dataset
def create_dataset(df, _) -> Dataset:
dataset_ = Dataset(
df, dataset.col_to_stype, split_col="split_col", target_col="target_col"
)
dataset_.materialize()
return dataset_
def split_dataset(dataset: Dataset) -> tuple[SkorchDataset, SkorchDataset]:
datasets = dataset.split()[:2]
return datasets[0].tensor_frame, datasets[1].tensor_frame
def get_iterator(dataset: SkorchDataset, **kwargs) -> DataLoader:
return DataLoader2(dataset, **kwargs)
class DataLoader2(DataLoader):
def collate_fn(
self, index: int | List[int] | range | slice | Tensor
) -> tuple[TensorFrame, Tensor | None]:
index = torch.tensor(index)
res = super().collate_fn(index).to(device)
return res, res.y
net = NeuralNetClassifier(
module=model,
max_epochs=args.epochs,
lr=args.lr,
device=device,
batch_size=6,
iterator_train=get_iterator,
dataset=create_dataset,
iterator_valid=get_iterator,
train_split=split_dataset,
classes=dataset.df["target_col"].unique(),
verbose=1,
criterion=nn.CrossEntropyLoss,
)
net.fit(dataset.df, None) |
@34j Is fine with me! So we drop the second part of your request then, correct? |
Heads up everyone, I started working on it. I already merge the PR draft of @34j into my fork. Would be nice if you guys would be available in case I have questions. :) |
|
So far none. I meant just in case. Sorry for the delay at all, but I had personal matters to deal with. I'm confident that I can submit a PR this month. |
Hi all, short update, unfortunately, I got sick, hence again a delay. Should I still work on it? |
I think it should continue. Are you still working on this part? Otherwise I can take over. |
Yes, still working on it @qychen2001! |
That's great! This feature is really important, looking forward to your PR. |
Sorry but I have almost completed this feature by myself in #375 (as MacOS seemed to be sick) and am just waiting for @weihua916 's review. However, the styling work for pre-commit by MacOS I referred certainly helped this. |
That's fantastic! But I'm still concerned about the relationship between skorch and sklearn, can your PR directly support models in sklearn such as svm? |
Excuse me but what do you mean by relationship? skorch works perfectly, trust me plz 🫠 |
sklearn models already have sklearn-compatible interface apparently |
I think it would be great to have this feature, as I think sklearn is often used for tabular data. I tried to use skorch, but skorch does not allow TensorFrames and did not work well.
(
examples/tutorial.py
)I think the following changes are needed:
I am sorry, but I cannot take much time to assist in creating this feature, so if it is not possible, please close this.
The text was updated successfully, but these errors were encountered: