Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(skorch): add an inherited class from skorch.NeuralNet that is compatible with PyTorch Frame #375

Open
wants to merge 57 commits into
base: master
Choose a base branch
from

Conversation

34j
Copy link

@34j 34j commented Mar 11, 2024

Closes #147

@MacOS Please continue from here if it helps.
Sorry for being so loud, but this took me a whole day, so I would appreciate it very much if you could make me as a co-author if you used this code.

@34j 34j marked this pull request as draft March 11, 2024 13:06
Copy link

codecov bot commented Mar 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.52%. Comparing base (ee98b87) to head (aa5484d).
Report is 6 commits behind head on master.

❗ Current head aa5484d differs from pull request most recent head cb76e8d. Consider uploading reports for the commit cb76e8d to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #375   +/-   ##
=======================================
  Coverage   93.52%   93.52%           
=======================================
  Files         124      124           
  Lines        6456     6456           
=======================================
  Hits         6038     6038           
  Misses        418      418           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MacOS
Copy link

MacOS commented Mar 11, 2024

Sure - on both!

@34j 34j force-pushed the feat/skorch-compatible branch from 1706c96 to 0b9426f Compare March 16, 2024 03:53
@34j 34j marked this pull request as ready for review March 16, 2024 05:15
@34j
Copy link
Author

34j commented Mar 16, 2024

@weihua916 Would you mind reviewing if you think this is a good way to implement it?

Also, it is strange that mypy in pre-commit does not raise errors, but mypy in CI does. I don't think there is any way to deal with this.

@34j
Copy link
Author

34j commented Apr 3, 2024

@weihua916 @zechengz @yiweny Would appreciate your review, thank you very much in advance.

Copy link
Contributor

@weihua916 weihua916 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Do you mind adding unit test similar to https://github.com/pyg-team/pytorch-frame/blob/master/test/gbdt/test_gbdt.py?

examples/revisiting.py Outdated Show resolved Hide resolved
examples/revisiting.py Outdated Show resolved Hide resolved
@weihua916
Copy link
Contributor

A kind check-in. Is there any progress here?

@34j
Copy link
Author

34j commented May 2, 2024

No progress, sorry

README.md Outdated Show resolved Hide resolved
examples/revisiting.py Outdated Show resolved Hide resolved
examples/tutorial.py Outdated Show resolved Hide resolved
test/utils/test_skorch.py Outdated Show resolved Hide resolved
@34j
Copy link
Author

34j commented Jul 11, 2024

@weihua916 Removed all tutorials and added examples/sklearn_api.py instead.
The failing tests are probably due to a pandas version update and are not related to this PR.

@34j 34j requested review from weihua916 and zechengz July 11, 2024 10:05
Copy link
Contributor

@weihua916 weihua916 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a quick question.

Comment on lines +47 to +48
# [stype.text_embedded],
# [stype.numerical, stype.numerical, stype.text_embedded],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support these stypes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently not supported at this time due to lack of time to understand how to use these dtypes.
However, since it probably only require changes in the arguments of the NeuralNet, it should have little trouble extending it in the future.

Comment on lines +144 to +149
if pass_dataset:
net.fit(dataset)
_ = net.predict(test_dataset)
else:
net.fit(X_train, y_train)
_ = net.predict(X_test)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we take tensor frame? It's also weird to sometimes take dataset and sometimes take data frame.

Copy link
Author

@34j 34j Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The main purpose of this PR is to allow DataFrame to be DIRECTLY fitted, as shown in examples/sklearn_api.py.
  • Since it is unclear how to create a Dataset from a TensorFrame, and if there is a TensorFrame, there should be also a Dataset, which means there is little need to implement this, and even to use skorch as the user might be familiar with deep learning.
  • Instead of "sometimes take dataset and sometimes take data frame", both are tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm if dataframe is directly fed, it is unclear why we need this feature within pytorch frame.
the whole point of pytorch frame is to materialize data frame into tensor frame, to be processed by pytorch.

Copy link
Author

@34j 34j Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not match your purpose but my goal is to use advanced neural networks implemented in pytorch_frame in existing sklearn pipeline.
This PR allows pytorch_frame to be used on top of existing scikit-learn code without having to heavily modify the existing code. Since many people use sklearn Pipeline, especially on Kaggle, it is easy to verify performance changes by changing or assembling the estimator in other people's code to my NeuralNetPytorchFrame. I am convinced that this will be very valuable.

@34j 34j requested a review from weihua916 July 30, 2024 08:15
@34j
Copy link
Author

34j commented Sep 16, 2024

@weihua916 Would you please reconsider merging this PR?

This change makes it easy to try out torch-frame based neural networks on code that already uses scikit-learn. (which was demonstrated in #375 (comment)).

(This PR is not intended to save training of Pytorch models that do not use scikit-learn. lightning or fastai should be used for such applications.)

Thank you in advance.

@34j
Copy link
Author

34j commented Sep 30, 2024

@weihua916 @yiweny @akihironitta Any chance that this PR could be merged?

Copy link
Member

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this integration! 🚀 Thank you for your effort :)

After reviewing the PR, I'd recommend keeping this integration as an example script rather than merging it into torch_frame/. Since this integration may not see widespread or frequent use, maintaining the code (e.g., keeping up with new APIs skorch introduces and making sure all features work in PyTorch Frame) could be challenging in the future. IMO, presenting it as an example would make more sense. Alternatively, @34j (and contributors in the linked issue together) could consider creating a separate repository to showcase this integration as a community-driven effort, and we could help highlight it via our docs or social media. (We can always promote the integration into our codebase in the future if it gains traction and the community expresses a need for it.)

Looks like PyTorch Frame doesn't need many changes to support this integration, but I'm happy to provide any extension points when it's necessary.

I'm open to any comments/feedbacks/disussions!

@34j
Copy link
Author

34j commented Dec 29, 2024

Thanks for your positive reply. In that case I would consider trying to make another package, but I am very busy right now, so please leave this as it is for a while. Note that in any case one-line change I made (in dataset.py) needs to be merged for that to work and you may want to review that specific line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sklearn-compatible interface
5 participants