Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbalanced datasets #59

Open
JGarciaCondado opened this issue Oct 17, 2024 · 0 comments
Open

Unbalanced datasets #59

JGarciaCondado opened this issue Oct 17, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@JGarciaCondado
Copy link
Contributor

JGarciaCondado commented Oct 17, 2024

In classification currently there is no warning if the dataset is unbalanced. A warning should be thrown and if possible the user should be given the option to balance the dataset by simple subsampling the largest group to reduce it to the smallest group. A heuristic approach for unbalanced could be 25% more data in one group than the other or something along those lines.

This option to balance the dataset will be useful for the new modules added that look at the error prediction vs classification accuracy.

If not another option instead of balancing the data is to use metrics that take this into account or learning algorithms that can deal with unbalanced datasets.

@JGarciaCondado JGarciaCondado added the enhancement New feature or request label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant