-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add sample_indices_ for SMOTE/ADASYN classes #772
Comments
Thinking a bit more about it and after reading about #724, I think that we should avoid reusing |
I was thinking on the same issue because I need the sample indices for GroupKFold CV after oversampling using SMOTE. So I downloaded the repo and made some small local changes to
By calling
where the indice of the synthetic sample is the same as its "mother" real sample. One can also call
For real sample, its neighbor is 0 (itself). in mind! If you think it is implementable I can open a new branch. |
Hi! Thanks for creating this issue. I think this feature can be useful to understand datasets we are working with.
@glemaitre, IMO, semantic should be given by owners of datasets. If we use the example of #724, oversample the data and suppose we use WDYT? |
Hi, |
So as it seems that no one is currently working on it, I will do it. |
SMOTE/ADASYN classes currently do not provide a
sample_indices_
attribute since they are generating samples that do not belong to the original dataset.However, we could create a new semantic for these samplers that generate data.
sample_indices_
could expose a tuple of the sample used to generate the new point. For the samples that are not generated, it will only be a single integer.This would implement a feature requested in issues and gitter.
The text was updated successfully, but these errors were encountered: